Performance Characteristics Working With JSONB in CockroachDB
In this article, look at performance characteristics working with JSONB in CockroachDB.
Join the DZone community and get the full member experience.
Join For FreeAs I was working on my previous article on JSONB support in CockroachDB, I was interested in identifying performance characteristics for loading JSONB data. I will look at performance based on a single node instance, using the same schema and JSON as my last post. Let's dig right in.
Writes on a Table Without an Inverted Index or Computed Column
If we just take a basic table with a generated id
and a JSONB column called payload
. Running a 1000000 row load will finish in less than a minute.
CREATE TABLE sample (
id UUID PRIMARY KEY NOT NULL DEFAULT gen_random_uuid(),
payload JSONB,
);
xxxxxxxxxx
started at 14:30:20
finished at 14:31:17
These are the start and end times for my python script that creates threads and using upsert
, inserts 50 records at a time using multi-row upsert. I will use 57 seconds as a baseline.
Writes on a Table With an Inverted Index
This time, we're going to create a table schema along with an inverted index.
xxxxxxxxxx
CREATE TABLE sample (
id UUID PRIMARY KEY NOT NULL DEFAULT gen_random_uuid(),
payload JSONB,
INVERTED INDEX jsonb_inv_idx (payload)
);
xxxxxxxxxx
started at 14:35:49
finished at 14:46:02
This time, it took almost 10 minutes to load the exact same data. We can conclude that inverted indexes are really slow on writes. We will evaluate feasibility of creating an index post-load later in this article.
Writes on a Table With Computed Columns atop JSONB
xxxxxxxxxx
CREATE TABLE sample (
id UUID PRIMARY KEY NOT NULL DEFAULT gen_random_uuid(),
payload JSONB,
level1 STRING AS (payload->>'level1') STORED,
level2 STRING AS (payload ->'n2'->>'level2') STORED,
level3 STRING AS (payload ->'n2'->'n3'->>'level3') STORED,
level4 STRING AS (payload ->'n2'->'n3'->'n4'->>'level4') STORED,
level5 STRING AS (payload ->'n2'->'n3'->'n4'->'n5'->>'level5') STORED,
level6 STRING AS (payload ->'n2'->'n3'->'n4'->'n5'->'n6'->>'level6') STORED,
level7 STRING AS (payload ->'n2'->'n3'->'n4'->'n5'->'n6'->'n7'->>'level7') STORED,
level8 STRING AS (payload ->'n2'->'n3'->'n4'->'n5'->'n6'->'n7'->'n8'->>'level8') STORED,
level9 STRING AS (payload ->'n2'->'n3'->'n4'->'n5'->'n6'->'n7'->'n8'->'n9'->>'level9') STORED,
level10 STRING AS (payload ->'n2'->'n3'->'n4'->'n5'->'n6'->'n7'->'n8'->'n9'->'n10'->>'level10') STORED
);
xxxxxxxxxx
started at 14:47:45
finished at 14:49:27
Using computed columns is significantly faster than using inverted index from the get go but still not as fast as barebones load. Given the flexibility, it may be a good middle ground working with JSONB.
Creating an Index Post-Load
Given the bulk of the data is already loaded and takes a little under one minute to load, compared to loading the table with the associated index from the beginning, we can reduce the time it takes to load and index the data using inverted index by first creating a table without an index and then creating an inverted index on it.
xxxxxxxxxx
CREATE TABLE sample (
id UUID PRIMARY KEY NOT NULL DEFAULT gen_random_uuid(),
payload JSONB);
Run the load and then issue the following command to create an index
xxxxxxxxxx
CREATE INDEX jsonb_inv_idx ON sample USING GIN (payloa
d);
It took 40 seconds to populate the index
xxxxxxxxxx
root@localhost:26257/defaultdb> CREATE INDEX jsonb_inv_idx ON sample USING GIN (payloa
d)
-> ;
CREATE INDEX
Time: 40.003267s
This is in line with the time it took us to load a table with computed column.
Hope this helps in your work with CockroachDB and JSONB.
Published at DZone with permission of Artem Ervits. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments