Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Couchbase N1QL: To Query or To Analyze?

DZone's Guide to

Couchbase N1QL: To Query or To Analyze?

Which service should run your query? Well, it depends on your workload.

· Database Zone ·
Free Resource

Download the Altoros NoSQL Performance Benchmark 2018. Compare top NoSQL solutions – Couchbase Server v5.5, MongoDB v3.6, and DataStax Enterprise v6 (Cassandra).

Note: This recommendation note was written in collaboration with Till Westmann and Prof. Mike Carey, who lead the Couchbase analytics R&D.


Transaction-Analytics spectrum.  Source: BI Research

Transactions-Analytics Spectrum. Source: BI Research

Couchbase’s Query and Analytics Services both support N1QL (SQL for JSON). A common question is: “Which service should run my query?” The quick answer is, “It depends on your workload.” This article explains this answer.

A transaction is when you buy coffee and cake at the grocery store. The analysis is when the grocery store looks at all the sales data to see what day of the week, what month of the year coffee and/or cake sells more — so they can plan the inventory, sales, and prices.

Every business does these three things in a cycle or a spiral [The Goal].

  1. Run the business process to deliver products or services to the customers.
  2. Analyze the business to determine what to change and what to change to.
  3. Make the change happen.

In modern-day business, each step requires applications to perform the steps.

The Query Service is used by the applications needed to run the business; it is designed for a large number of concurrent queries, each doing a small amount of work. In the RDBMS world, this workload is called the OLTP workload.

Applications or tools used for analysis have different workload characteristics. These typically use the Analytics Service; it is designed for a smaller number of concurrent queries analyzing a larger number of documents. In the RDBMS world, this workload is called the OLAP workload.

Query Service

Analytics Service

Tutorials

https://query-tutorial.couchbase.com/tutorial/#1 https://sqlplusplus-tutorial.couchbase.com/tutorial/#1

High Level Comparison

Used for data manipulation within application logic Used for reports, analysis (historical, interactive), dashboards
Short queries

— Relatively simple SQL

— Typically involving small amounts of data

Longer operations

— Complex SQL with analytics

— Typically involving larger amounts of data

Run SELECT, INSERT, UPDATE, DELETE, MERGE for operational applications Run SELECT for analysis
Random Updates

— Update few documents per query

No Updates

— Changes ingested from Data Service

Millisecond to a second latency,

high throughput (10-1000 qps), performance goal is queries/second

Second to minutes response time,

<1-10 queries per second,

performance goal is seconds/query

Large number of Indexes Fewer indexes
Developer written queries; generated queries are well known Ad-hoc queries; complex reports, dashboards, BI workload
Queries execute on a single query node using the distributed index and data infrastructure Queries execute on all analytics nodes using its distributed compute, index, and data infrastructure

Technical Comparison: Architecture

Queries run in SMP mode, throughput is scalable by adding new query nodes Queries run in MPP mode, can handle larger data or reduce query execution times by adding new analytics nodes

Technical Comparison: Optimizer

Nested loop join by default

Hash join via a query hint

Parallel hash join by default,

(Index-)Nested loop join or broadcast join via a query hint

Technical Comparison: Indexes

Global secondary indexes Local secondary indexes (co-located with data partitions)
Uses Memory Optimized Indexes; standard secondary indexes (Plasma) Log-structured Merge-Tree (LSM) based secondary indexes
Support for both covered and non-covered index scans Non-covered index scans
Can use full-text index for queries (6.5)

Technical Comparison: Execution

Most queries are prepared once and executed many times Ad-hoc and exploratory queries
SDKs use the prepare-execute model based on an ad-hoc flag SDKs offer ad-hoc and parameterized queries
Most operations are done in memory; only when an index scan returns large data is it written to the disk backfill Bounded-memory operations on large data (larger than cluster memory) with graceful spilling as needed
Single node query parallelism Multi-node partitioned-parallel join, sort, aggregate, and grouped aggregate operators
Expects a single (possibly RAIDed) storage device Un-RAIDed use of multiple storage devices
Performance isolation via covering indexes that support scan, grouping, and aggregation operations on Index Service nodes Performance isolation for all queries via shadowing of data on Analytics Service nodes

Technical Documentation

https://docs.couchbase.com/server/6.0/n1ql/n1ql-language-reference/index.html https://docs.couchbase.com/server/6.0/analytics/introduction.html

Technical Information

https://www.couchbase.com/products/n1ql https://www.couchbase.com/sqlplusplus

Technical Books

https://blog.couchbase.com/a-guide-to-n1ql-features-in-couchbase-5-5-special-edition/ https://resources.couchbase.com/sql_tutorial

https://www.amazon.com/SQL-Users-Tutorial-Don-Chamberlin/dp/0692184503/


  1. [The GOAL]: The Goal: The process of ongoing improvement. https://www.amazon.com/Goal-Process-Ongoing-Improvement/dp/0884271951/ref=sr_1_1?keywords=the+goal&qid=1547969233&sr=8-1
  2. Systems of Engagement for Customer Jobs: https://marketing.cioreview.com/cxoinsight/systems-of-engagement-for-customer-jobs-nid-24677-cid-51.html

Download the whitepaper, Moving From Relational to NoSQL: How to Get Started. We’ll take you step by step through your first NoSQL project.

Topics:
couchbase ,appdev ,n1ql ,performance ,json ,nosql ,oltp workload

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}