JSON in VoltDB
Join the DZone community and get the full member experience.
Join For FreeCurator's Note: The content of this article was originally written by John Piekos on the VoltDB blog.
VoltDB 3.0 introduces the use of JSON-encoded columns to allow more flexibility in how you structure and interact with your data. New SQL functions and index capabilities let you work more naturally with JSON data while maintaining the efficiency and transactional consistency of a relational database.
How? A VoltDB JSON Example
Let’s assume that you want to implement a single sign-on (SSO) application using VoltDB. You wish to store the login session for a set of different online sites under a common username. Each login session could hold different user state, simple data values or possibly more complex structures. Additionally, future sessions could hold just about anything. Because of the variability of the data, a good strategy would be to JSON-encode it. The VoltDB table schema for this application might look like the following:
CREATE TABLE user_session_table ( username varchar(200) UNIQUE NOT NULL, password varchar(100) NOT NULL, global_session_id varchar(200) UNIQUE NOT NULL, last_accessed TIMESTAMP, json_data varchar(2048) ); PARTITION TABLE user_session_table ON COLUMN username;
Common across all sessions would be the username, password, perhaps a global session ID, and a last accessed timestamp. Because we wish to support millions of simultaneous logins, we’ll partition this data across username.
This schema comes from our json-sessions sample, available in both the VoltDB community and enterprise kits, or on our GitHub here.
Ultimately, the sample inserts the JSON-encoded session into the database using a simple standard SQL statement:
INSERT INTO user_session_table (username, password, global_session_id, last_accessed, json_data) VALUES (?, ?, ?, ?, ?);
The json-sessions sample models each type of session being tracked as a plain old Java object (POJO). To simplify encoding these session types into JSON, the sample uses an open source package from Google called GSON. GSON can convert POJOs to/from JSON, greatly simplifying the JSON processing in this sample.
Note that VoltDB does not, at present time, validate that data inserted into a varchar column is properly encoded JSON. Validation of encoding occurs during query time, as described in the next section.
Querying JSON Data in VoltDB
VoltDB added a new column function, field(), to aid in interacting with JSON encoded data. Using the user_session_table schema above, let’s assume the table is populated with rows similar to the following:
SELECT username, json_data FROM user_session_table ORDER BY username LIMIT 10
USERNAME JSON_DATA
———– —————————————————————————————————-
user-1 {“read_only_user”:true,”site”:”VoltDB Management”,”props”:{“last-login”:”1356537244991″}}
user-10 {“role”:”reader”,”site”:”VoltDB Blog”,”props”:{“last-login”:”1356537252380″}}
user-1000 {“role”:”reader”,”site”:”VoltDB Blog”,”props”:{“last-login”:”1356537251017″}}
user-10000 {“read_only_user”:false,”site”:”VoltDB Management”,”props”:{“last-login”:”1356537246249″}}
user-10002 {“role”:”reader”,”site”:”VoltDB Blog”,”props”:{“last-login”:”1356537250566″}}
user-10003 {“read_only_user”:false,”site”:”VoltDB Management”,”props”:{“last-login”:”1356537252187″}}
user-10004 {“moderator”:false,”download_count”:0,”site”:”VoltDB Forum”,”props”:{“last-login”:”1356537244170″}}
user-10005 {“moderator”:false,”download_count”:0,”site”:”VoltDB Forum”,”props”:{“last-login”:”1356537250381″}}
user-10006 {“moderator”:false,”download_count”:0,”site”:”VoltDB Forum”,”props”:{“last-login”:”1356537245804″}}
user-10009 {“moderator”:false,”download_count”:0,”site”:”VoltDB Forum”,”props”:{“last-login”:”1356537249792″}}
By using the field() function we can execute a query to return only those rows where the login session is for the “VoltDB Forum” and the session is a moderator. The query and results would look as follows:
SELECT username, json_data FROM user_session_table WHERE field(json_data, 'site')='VoltDB Forum' AND field(json_data, 'moderator')='true' ORDER BY username
USERNAME JSON_DATA
———– ——————————————————————————————————————————
user-63102 {“moderator”:true,”download_count”:0,”site”:”VoltDB Forum”,”props”:{“last-login”:”1356976559063″}}
user-89383
{“moderator”:true,”download_count”:0,”site”:”VoltDB
Forum”,”props”:{“last-login”:”1356976555020″,”client_language”:”Java”}}
user-91984
{“moderator”:true,”download_count”:1,”site”:”VoltDB
Forum”,”props”:{“last-login”:”1356976557512″,”download_version”:”v3.0″}}
Note that the field() function assumes that the varchar field value is valid encoded JSON. If the value is not valid JSON, the query will fail with an appropriate error message.
Let’s say you wanted to refine the result even further and find those Forum sessions that had downloaded any 2.x version. You can use nested field() function invocations to drill deeper into the JSON structure. For example, the following query fetches the properties for the VoltDB Forum session and then further extracts the download_version field, ultimately pattern matching on the value using the SQL LIKE clause:
SELECT username, json_data FROM user_session_table WHERE field(field(json_data, 'props'), 'download_version') LIKE 'v2%' ORDER BY username LIMIT 10
USERNAME JSON_DATA
———– ——————————————————-
user-10014
{“moderator”:false,”download_count”:1,”site”:”VoltDB
Forum”, ”props”:{“last-login”:”1356537251345″,”download_version”:”v2.7″}}
user-10030
{“moderator”:false,”download_count”:1,”site”:”VoltDB
Forum”, ”props”:{“last-login”:”1356537250413″,”download_version”:”v2.7″}}
user-10052
{“moderator”:false,”download_count”:1,”site”:”VoltDB
Forum”, ”props”:{“last-login”:”1356537250274″,”download_version”:”v2.7″}}
user-10087
{“moderator”:false,”download_count”:1,”site”:”VoltDB
Forum”, ”props”:{“last-login”:”1356537247453″,”download_version”:”v2.7″}}
user-10103
{“moderator”:false,”download_count”:1,”site”:”VoltDB
Forum”, ”props”:{“last-login”:”1356537247822″,”download_version”:”v2.7″,”client_language”:”Java”}}
user-10170
{“moderator”:false,”download_count”:1,”site”:”VoltDB
Forum”, ”props”:{“last-login”:”1356537250308″,”download_version”:”v2.7″}}
user-1018
{“moderator”:false,”download_count”:1,”site”:”VoltDB
Forum”, ”props”:{“last-login”:”1356537252219″,”download_version”:”v2.7″}}
user-10223
{“moderator”:false,”download_count”:1,”site”:”VoltDB
Forum”, ”props”:{“last-login”:”1356537248629″,”download_version”:”v2.7″}}
user-10226
{“moderator”:false,”download_count”:1,”site”:”VoltDB
Forum”, ”props”:{“last-login”:”1356537249328″,”download_version”:”v2.7″,”client_language”:”Java”}}
user-10227
{“moderator”:false,”download_count”:1,”site”:”VoltDB
Forum”, ”props”:{“last-login”:”1356537252425″,”download_version”:”v2.7″}}
Indexing on JSON field values
The queries executed in the previous section all required a full table scan to compute the results. With large data sets these queries could be costly in terms of compute cycles and time. To speed up query execution for these types of queries, should they be executed frequently, you should define an index on the commonly accessed fields. Again the field() function comes into play. VoltDB supports defining function-based indexes.
To significantly improve the query execution time of the queries in the prior section, the following two indexes should be created:
CREATE INDEX session_site_moderator ON user_session_table (field(json_data, 'site'), field(json_data, 'moderator'), username); CREATE INDEX session_props ON user_session_table (field(field(json_data, 'props'), 'download_version'), field(field(json_data, 'props'), 'client_language'), username);
These are fully functional SQL indexes. Whenever you create or update a record in the user_session_table table, VoltDB will run the field() function to extract the specified field from the JSON value and store the result inside the index. When you query by that same field in the future, VoltDB will use the index and avoid the table scan. Additionally, using the index usually avoids JSON string processing.
Summary: Benefits and Considerations to using JSON in VoltDB
One of the major benefits of encoding data in a JSON field is that you don’t have to pre-define what structure, or shape, that data will have. Further, the shape of the data can vary from one row to the next.
In the json-sessions example, the schema for the JSON column was defined by the Java objects themselves, on the fly, rather than having to define the structure using SQL ahead of time. If a new Java object, such as a new session type, is needed, you simply create the Java object, serialize it to JSON, and store it in VoltDB. This avoids the need to propagate a new catalog with DDL changes to the database.
On the flip-side, using a variable schema in the JSON column means that your application must be intelligent enough to understand the various structures that appear in that JSON column. Further, you must be sensitive to when indexes are needed, should query patterns change in your application along with the shape of the data. This may, for example, require you to add (or modify) indexes based on the existence of new fields that are now frequently queried.
Another point of note is the size limit for JSON values. In VoltDB, varchar columns, in which JSON values are stored, are limited to 1MB, exactly 1024^2 bytes (1048576 bytes). In this way, JSON support lets you augment the existing relational model within VoltDB. It is not intended or appropriate as a replacement for pure blob-oriented document stores.
More JSON to come
This is our first release of JSON support in VoltDB. Over the coming year we’ll be rolling out additional JSON capabilities. We welcome your feedback. Please download VoltDB 3.0 and join the VoltDB community to stay in touch with these and other exciting new features.
Published at DZone with permission of Mike Stonebraker, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments