DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Working With Multi-Level JSON in CockroachDB
  • CockroachDB TIL: Volume 11
  • Optimizing Pgbench for CockroachDB Part 2
  • DuckDB for Python Developers

Trending

  • Dear Micromanager: Your Distrust Has a Job; It’s Just Not the One You’re Doing
  • Why Your Test Automation Is Always Behind the Code And the Architecture That Fixes It
  • Beyond Manual Annotation: Engineering Self-Correcting Pseudo-Labeling Pipelines
  • Ujorm3: A New Lightweight ORM for JavaBeans and Records
  1. DZone
  2. Data Engineering
  3. Databases
  4. Performance Characteristics Working With JSONB in CockroachDB

Performance Characteristics Working With JSONB in CockroachDB

In this article, look at performance characteristics working with JSONB in CockroachDB.

By 
Artem Ervits user avatar
Artem Ervits
DZone Core CORE ·
Mar. 09, 21 · Tutorial
Likes (5)
Comment
Save
Tweet
Share
5.3K Views

Join the DZone community and get the full member experience.

Join For Free

As I was working on my previous article on JSONB support in CockroachDB, I was interested in identifying performance characteristics for loading JSONB data. I will look at performance based on a single node instance, using the same schema and JSON as my last post. Let's dig right in.

Writes on a Table Without an Inverted Index or Computed Column

If we just take a basic table with a generated id and a JSONB column called payload. Running a 1000000 row load will finish in less than a minute.

SQL
 




x


 
1
CREATE TABLE sample (
2
    id UUID PRIMARY KEY NOT NULL DEFAULT gen_random_uuid(),
3
    payload JSONB,
4
);



Java
 




xxxxxxxxxx
1


 
1
started at 14:30:20
2
finished at 14:31:17



These are the start and end times for my python script that creates threads and using upsert, inserts 50 records at a time using multi-row upsert. I will use 57 seconds as a baseline.

Writes on a Table With an Inverted Index

This time, we're going to create a table schema along with an inverted index.

SQL
 




xxxxxxxxxx
1


 
1
CREATE TABLE sample (
2
    id UUID PRIMARY KEY NOT NULL DEFAULT gen_random_uuid(),
3
    payload JSONB,
4
    INVERTED INDEX jsonb_inv_idx (payload)
5
);



Java
 




xxxxxxxxxx
1


 
1
started at 14:35:49
2
finished at 14:46:02



This time, it took almost 10 minutes to load the exact same data. We can conclude that inverted indexes are really slow on writes. We will evaluate feasibility of creating an index post-load later in this article.

Writes on a Table With Computed Columns atop JSONB

SQL
 




xxxxxxxxxx
1
14


 
1
CREATE TABLE sample (
2
    id UUID PRIMARY KEY NOT NULL DEFAULT gen_random_uuid(),
3
    payload JSONB,
4
    level1 STRING AS (payload->>'level1') STORED,
5
    level2 STRING AS (payload ->'n2'->>'level2') STORED,
6
    level3 STRING AS (payload ->'n2'->'n3'->>'level3') STORED,
7
    level4 STRING AS (payload ->'n2'->'n3'->'n4'->>'level4') STORED,
8
    level5 STRING AS (payload ->'n2'->'n3'->'n4'->'n5'->>'level5') STORED,
9
    level6 STRING AS (payload ->'n2'->'n3'->'n4'->'n5'->'n6'->>'level6') STORED,
10
    level7 STRING AS (payload ->'n2'->'n3'->'n4'->'n5'->'n6'->'n7'->>'level7') STORED,
11
    level8 STRING AS (payload ->'n2'->'n3'->'n4'->'n5'->'n6'->'n7'->'n8'->>'level8') STORED,
12
    level9 STRING AS (payload ->'n2'->'n3'->'n4'->'n5'->'n6'->'n7'->'n8'->'n9'->>'level9') STORED,
13
    level10 STRING AS (payload ->'n2'->'n3'->'n4'->'n5'->'n6'->'n7'->'n8'->'n9'->'n10'->>'level10') STORED
14
);



Java
 




xxxxxxxxxx
1


 
1
started at 14:47:45
2
finished at 14:49:27



Using computed columns is significantly faster than using inverted index from the get go but still not as fast as barebones load. Given the flexibility, it may be a good middle ground working with JSONB.

Creating an Index Post-Load

Given the bulk of the data is already loaded and takes a little under one minute to load, compared to loading the table with the associated index from the beginning, we can reduce the time it takes to load and index the data using inverted index by first creating a table without an index and then creating an inverted index on it.

SQL
 




xxxxxxxxxx
1


 
1
CREATE TABLE sample (
2
    id UUID PRIMARY KEY NOT NULL DEFAULT gen_random_uuid(),
3
    payload JSONB);



Run the load and then issue the following command to create an index

SQL
 




xxxxxxxxxx
1


 
1
CREATE INDEX jsonb_inv_idx ON sample USING GIN (payloa
2
d);



It took 40 seconds to populate the index

SQL
 




xxxxxxxxxx
1


 
1
root@localhost:26257/defaultdb> CREATE INDEX jsonb_inv_idx ON sample USING GIN (payloa
2
d)
3
                             -> ;
4
CREATE INDEX
5

          
6
Time: 40.003267s



This is in line with the time it took us to load a table with computed column.

Hope this helps in your work with CockroachDB and JSONB.

Database sql Inverted index CockroachDB

Published at DZone with permission of Artem Ervits. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Working With Multi-Level JSON in CockroachDB
  • CockroachDB TIL: Volume 11
  • Optimizing Pgbench for CockroachDB Part 2
  • DuckDB for Python Developers

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook