DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Securing Your Software Supply Chain with JFrog and Azure
Register Today
  1. DZone
  2. Data Engineering
  3. Databases
  4. Analyze This: MongoDB and Couchbase Analytics
Content provided by Couchbase logo

Analyze This: MongoDB and Couchbase Analytics

Comparing analytical query processing in MongoDB and Couchbase.

Keshav Murthy user avatar by
Keshav Murthy
CORE ·
Jun. 25, 20 · Opinion
Like (4)
Save
Tweet
Share
7.18K Views

The purpose of computing is insight, not numbers.  — Richard Hamming

The spiral of running the business, analyzing what to change & what to change to, and then changing the business is an eternal one. Do the right analysis, your spiral will get larger.  Else, you’ll spiral down.

Couchbase, like the other pioneers of NoSQL systems, was created to address extreme scale, performance, and availability requirements of the web 2.0 world. From the simple key-value, Couchbase has evolved to handle query, search and analytics — at scale. Each of them is purpose-built engines integrated via Couchbase’s multi-dimensional architecture.  The query and analytics service both talk N1QL. Why build two distinct engines that talk the same language?  Because…

One Size Fits All: An Idea Whose Time Has Come and Gone.  — Michael Stonebraker

Query engine was built for operational workload and the Analytics engine for the analysis workload.  We’ve compared the two engines and given the guidance.  MongoDB has followed a similar path from being a clustered database handling simple workload to complex workload for analytics and queries on data lakes. 

Last year, MongoDB announced analytic nodes in their clusters for analytic processing.  In this blog, we compare and contrast the two engines for the analytics use case.

Couchbase: High-Level Architecture


Inside Couchbase Analytics: High-level Architecture

MongoDB Analytics Nodes:

Let’s compare and contrast the analytics support in MongoDB Analytic nodes and Couchbase Analytics.


MongoDB Analytic nodes Couchbase Analytics
Docs https://docs.atlas.mongodb.com/reference/replica-set-tags/ https://docs.couchbase.com/server/6.5/analytics/introduction.html
Architecture Use a set of Secondary replica nodes with a complete copy of the operational data. The query language is the same (MQL); query processing is the same as the operational workload. Distinct Analytics nodes which have a user-defined subset of the operational data. The query language is the same (N1QL); query processing is designed for larger datasets (see below).
Architecture Details Atlas Mapped Analytics Nodes Couchbase Analytics: NoETL for Scalable NoSQL Data Analysis
Data Model BSON JSON
Query Language MQL – MongoDB Query Language N1QL – Non 1st Normal-form Query Language; SQL for JSON
Query page MongoDB Query Analytics Query
Query processing Same as operational query processing, using mongos and mongod for distributed query processing.  Analytics engine designed for massive parallel processing (MPP) of the data. Each N1QL 
Query optimizer Shape-Based Optimizer; Requires plan management. Rule-Based Optimizer. No plan management required. 
Explain Text and graphical. Text and graphical.
Indexing Need to create the index in the operational and have it copied over. Analytics only Indexing
Parallel processing Each Mongod node runs the basic operations and mongos combines it (e.g. final group and aggregation).  To handle complex analytics queries efficiently, and to deliver


the desired scale-up and speed-up properties, the Analytics Service

employs the same kinds of state-of-the-art, shared-nothing MPP

(massively parallel processing) based query processing strategies [From the VLDB paper]

Indexing Local indexing  Local indexing
Joins – Language $lookupoperator supports simple equality joins between two collections; Only simple scalar fields are allowed. Arrays need to be unwound before joins.


  1. One of the two collections CANNOT be sharded.  That means, can’t join on large collections.
  2. Need a separate pipeline stage for simple non-equality joins. That means, the queries are inefficient and takes lot of resources; 
  3. This is roughly equivalent to LEFT OUTER JOINs in SQL.  Users will have to do additional pipeline processing to get the INNER JOIN and other joins.
INNER JOIN, LEFT OUTER JOIN, NEST and UNNEST operations.


  1. Standard SQL syntax
  2. Supports sharded dataset by default.
  3. Supports equality and arbitrarily complex join expressions.
Query processing:data size Intermediate stages of aggregate() pipeline cannot be more than 100 MiB in size. Query writers/users should use a special flag to allow this. No Limitations; When the intermediate data (e.g. hash table, sort data) gets bigger, it’s spilled over to the disk.
Query processing: Join type (roughly) LEFT OUTER JOIN INNER JOIN


LEFT OUTER JOIN

Search Supports search within the query. Uses Atlas search on the cloud and basic B-tree based search on-prem.  Analytics service doesn’t have a built-in search.  We need to use query service with FTS for combining search within a query.
Queries supported find() and aggregate() SELECT statement (from SQL and SQL++)
JOIN types (Language) $lookup — this is roughly LEFT OUTER JOIN (doesn't support joining on array keys, etc)  INNER JOIN


LEFT OUTER JOIN 

JOIN types (Implementation)
  1. Join only between one sharded and another non-sharded collection.
  2. Nested loop (NL) only. (NL is bad for performance while handling large data).
  3. Intermediate results are limited to 100MB memory.  User will have to know the size and use options to allow spill over.
  1. Joins sharded datasets; All of the datasets are sharded by default.
  2. Supports nested loop, broadcast and parallel hash join
  3. Supports both Nested Loop join and Hash join. 
  4. Use Hash join by default — well suited for large scale data processing. 
  5. No size limitations on the intermediate data.
Aggregation Supports the common grouping and aggregation via aggregate() method. Supports the common grouping and aggregation via GROUP BY and respective aggregations. See below for Windowed aggregates. 
Windowed aggregate Functions: Probably, the Coolest SQL Feature. Unavailable.  Users need to work around by writing code. See the report for details: SQL to NoSQL – 7 Metrics to Compare Query Language Fully supported.


RANK()

PERCENT_RANK()

DENSERANK()

ROW_NUMBER()

CUME_DIST()

FIRST_VALUE()

LAST_VALUE()

NTH_VALUE()
LAG()

LEAD()

NTILE()

RATIO_TO_REPORT()

Analyzing data from Multi-clusters All the data analyzed is from a single MongoDB cluster. 6.5: All the data analyzed is from a single Couchbase cluster.


6.6: Can ingest and analyze the data from multiple Couchbase clusters.

External data Supports query processing on S3 data. Supports BSON, CSV, TSV, Avro, and Parquet formats. 6.6: Supports external JSON, CSV and TSV data in S3
External data sources Supports additional datasources via JDBC driver. Integrated with the aggregation pipeline via, you’ve to wait for it, $sql operator. None except the ones mentioned above. 
Subqueries Subqueries via the aggregation pipeline. Standard SQL subqueries.
Query plan $explain EXPLAIN
DataViz Built-in MongoDB charts No built-in DataViz
Business intelligence  Knowi


Tableau and other ODBC, JDBC compliant BI engines.

Knowi


Tableau and other ODBC, JDBC compliant BI engines.

References:

  1. Comparing Two SQL-Based Approaches for Querying JSON: SQL++ and SQL:2016
  2. SQL to NoSQL – 7 Metrics to Compare Query Language
  3. Couchbase Analytics: NoETL for Scalable NoSQL Data Analysis

Comments

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: