DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. YCSB-JSON: Implementation for Couchbase and MongoDB

YCSB-JSON: Implementation for Couchbase and MongoDB

Learn how to implement the YCSB-JSON performance benchmarking for Couchbase and MongoDB databases.

Alex Gyryk user avatar by
Alex Gyryk
·
Nov. 14, 18 · Tutorial
Like (2)
Save
Tweet
Share
5.11K Views

Join the DZone community and get the full member experience.

Join For Free

YCSB is a great benchmarking tool built to be easily extended by any driver which supports and implements basic operations like insert, read, update, delete, and scan. Plain synthetic data introduced by YCSB fits this paradigm perfectly.

But when it comes to JSON databases, queries became way more sophisticated: querying arrays and nestled objects, running joins, aggregations. The YCSB-JSON extension, on one hand, should be able to utilize all possible JSON operations supported by a database. On the other hand, implementing this approach in YCSB should be generic enough to be easily extended by any other DB driver no matter what level of JSON querying it supports.

The YCSB-JSON is designed to better emulate realistic, end-user scenarios. It designed to work on any JSON data either real datasets or pseudo-realistic or fully synthetic. And one of the requirements for the tool is that there shouldn’t be any hardcoded values in query predicates. A user can only control the data cardinality during dataset generation process.

Fig 1. YCSB-JSON implementation at a glance.

Data Model

The data model we chose for this benchmark is well described in this article. The dataset is generated by using the fakeit tool and loaded into a database (Couchbase, MongoDB) by external scripts. While the model is defined and fixed values are randomly generated. This data is randomly generated but it’s not synthetic.

Data Management

For each operation in the workload queries are fixed, but bound values for each parameterized predicate are non-deterministic. So, the following data management flow was chosen:

  1. Generate documents with fakeit.
  2. Load generated data to a database with an external script.
  3. Run the YCSB load phase. During this phase, YCSB will read a random subset of the generated documents and store all its values in its internal cache.
  4. During the run phase, YCSB will use the values from its cache while binding and executing queries against the database.

Predicates Generator

The YCSB uses generators when operating with data. The YCSB-JSON introduces its own generator mapped to a particular data model. The mapping and the model exist only within generator namespace. The generator output is a set of generic predicates (field-value pairs) for a particular query. This allows to modify the model and extend the tool with other queries without modifying rest of the YCSB core code.

  • Predicates generator: Generator.java

Example #1: Pagination Query

One of the YCSB-JSON operations, the pagination query, can be represented by the following statement:

SELECT * FROM <bucket> WHERE address.zip = <value> OFFSET <num> LIMIT <num>

The query predicate is a field within an object. When using Couchbase N1QL the field can be simply accessed as “address.zip”. But another database might not be as flexible so YCSB-JSON generator creates 2 predicates: the parent predicate (address) and child/nested predicate (zip).

And the child predicate has a value randomly picked from a list of sample values for this particular field.

The function below generates the SoeQueryPredicate object, where name is “address” and nested predicate is another SoeQueryPredicate object with name “zip” and value <value>:

Example #2 Report Query

Predicates for more complex queries are generated in the same way. The only difference is that when a query introduces multiple predicates, the predicates sequence (array of predicates) is being generated instead for a single predicate. Here is a Report query:

SELECT o2.month, c2.address.zip, SUM(o2.sale_price) FROM <bucket> c2
INNER JOIN orders o2 ON KEYS c2.order_list
WHERE c2.address.zip = “value” AND o2.month = “value”
GROUP BY o2.month, c2.address.zip ORDER BY SUM(o2.sale_price)

The function below generates a sequence of:

“Month” predicate, “address” predicate with nested “zip” predicate, “sale_price” predicate, etc:

Other queries generators can be found here.

New Operations

The YCSB code needs to be updated with new operations.

  • Signatures in DB class

  • Implementations in DBWrapper

  • Extending YCSB CoreWorkload with new operations: SoeWorkload.java

Implementation of YCSB-JSON Operations for Couchbase and MongoDB

The DB driver function of a YCSB-JSON operation takes an additional parameter which is a generator object. It is being passed by Workload class and it has a particular predicate sequence prebuilt.

Because predicates structure and sequences are well defined by the generator a DB driver can access names and values directly and construct the query using its native query language or other access methods. Below are examples of implementing Page and Report queries.

Page query, generating query statement for Couchbase:

For MongoDB:

Report query, Couchbase:

MongoDB:

All Couchbase implementations: Couchbase2Client.java

All MongoDB implementations: MongoDbClient.java

References

  • Article part 1

  • YCSB-JSON Implementation

  • FakeIt

Next Steps

Implement a fakeit-like generator in YCSB to simplify data and query predicates generation.

Database Implementation MongoDB Data (computing)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Java Development Trends 2023
  • Using the PostgreSQL Pager With MariaDB Xpand
  • Spring Cloud: How To Deal With Microservice Configuration (Part 1)
  • What Should You Know About Graph Database’s Scalability?

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: