{{announcement.body}}
{{announcement.title}}

Couchbase N1QL: To Query or To Analyze? Part 2

DZone 's Guide to

Couchbase N1QL: To Query or To Analyze? Part 2

In this article, learn how to write the query in both services and look at the performance differences.

· Database Zone ·
Free Resource

When you need to query documents using SQL, there are two options available in Couchbase. The Query service and the Analytics service. Our blog, N1QL: To Query or To Analyze? provides a detailed overview of both services. I highly recommend reading it before this one. This article aims to expand on the previous blog by adding some concrete, hands-on examples. For each example we’ll cover how to write the query in both services and we’ll look at the performance differences. The goal is that readers will walk away with more knowledge to help quickly identify patterns and use cases that best fit each service.

Summary

Before jumping into examples. Let’s refresh ourselves on the high-level key characteristics of the two services.

Query Service
Analytics Service
Used for data manipulation within application logic. Used for reports, analysis (historical, interactive) and dashboards.
Most efficient for short, operational queries that retrieve or manipulate smaller amounts of data. Most efficient for longer, complex, ad-hoc queries that typically retrieve and process large amounts of data.
Supports SELECT, INSERT, UPDATE, DELETE, MERGE operations. Only supports SELECT operations.

See https://dzone.com/articles/n1ql-to-query-or-to-analyze for a complete table.

Setup

For this tutorial, we will use Couchbase 6.5 and the sample data provided in the Couchbase Admin UI. My environment is a 3 node Couchbase 6.5 cluster with 1536 MB allocated to the Analytics service. All other settings are the defaults. If you don’t have access to a cluster, you can quickly run Couchbase 6.5 in a Docker container by running the following command:

 docker run -d --name db -p 8091-8096:8091-8096 -p 11210-11211:11210-11211 couchbase:enterprise-6.5.0

If you’re going the Docker route, go to http://localhost:8091 in your browser after the container starts and setup your Couchbase instance using the default options at each step. It does not matter what name you give your Couchbase instance.

Performance Disclaimer

We will look at the response times for the examples that follow. It is important to note that your performance may vary greatly depending on how your Couchbase environment is setup. However, you should still be able to observe similar differences between the Query and Analytics services regardless of your environment.

Install Travel Sample Bucket

From the Couchbase Admin Dashboard, navigate to Settings -> Sample Buckets. Install the Travel Sample bucket. Detailed documentation on how to do this can be found over at our documentation site.

Query Service Setup

Installing the Sample Bucket also creates the necessary indexes. This means no additional setup is needed for the Query Service.

Analytics Service Setup

For the Analytics Service, we need to populate Datasets for each “type” of document in our bucket. Navigate to the Analytics Workbench and execute the following queries to create the Datasets:

SQL


This will create Datasets for Routes, Landmarks, Hotels, Airlines, and Airports using the travel-sample bucket. Finally executing the CONNECT statement will begin populating each of the Datasets.

In the examples that follow, we’ll be using simple N1QL queries. For a detailed breakdown on the N1QL language differences between Query and Analytics see the N1QL for Analytics vs. N1QL for Query reference page in our docs.

Use Case: Get All Routes For LAX to SFO

Now we are ready to write our first query. For this use case we want to find all routes available for a given source and destination airport.

Which service is best?

This is definitely an operational query that will return a limited amount of data. It is a simple query that does not perform any aggregations or complex functions over our data. There’s only a simple filter on source and destination. Therefore the Query Service is the obvious choice.

SQL

Performance:  4 milliseconds

This is a simple query and it also returns only 7 documents. This is a typical operational query that an application might send to Couchbase. Performance is reliably fast and consistent.

Analytics Service Equivalent

Let’s craft the same query for the Analytics Service for demonstration’s sake. The Analytics Service is overkill for a simple query like this. Therefore, if we were building an application for this use case, we wouldn’t choose the Analytics service. We’d expect it to underperform the Query Service.

SQL
 




xxxxxxxxxx
1


 
1
select * from routes
2
where sourceairport = "LAX"
3
and destinationairport = "SFO"



Performance: ~36 milliseconds

As you can see from this example. The Query Service performs best for this use case as expected. Under heavy load we’d expect the Query Service to perform even better than the 30+ millisecond difference than this simple test shows.

Use Case: Get the Cities With the Most Hotels

Which service is best?

For this use case we want to figure out the number of hotels available in each city and sort the results by the cities with the most hotels first. This will require us to scan through all of our hotels and collect counts by Country and City and then sort it. Following the logic we laid out in the beginning, the Analytics Service should perform better for this use case. Let’s test this theory.

SQL
 




xxxxxxxxxx
1
13
9


 
1
select 
2
  country, 
3
  city, 
4
  count(id) 
5
from hotels 
6
group by country, city
7
order by count(id) desc



Performance: ~36 milliseconds

Interestingly, the performance of this query is almost the same as our previous Analytics example (36ms) even though the previous query was much simpler and smaller computationally. This tells us the baseline performance in my 3 node environment may be around 36 milliseconds for Analytics queries. Even though this query is more complex than our first example, it is still relatively simple for the Analytics service.

Query Service Equivalent

Let’s craft the same query for the Query Service. In theory, this is a heavier query than our previous Query Service example. It’s also processing and returning much more data than the first example. We’d expect that the Query Service would not perform as well as the Analytics Service.

SQL
 




xxxxxxxxxx
1
15
9


 
1
select 
2
  country, 
3
  city, 
4
  count(id)
5
from `travel-sample`
6
where type = "hotel"
7
group by country, city
8
order by count(id) desc



Performance: ~90 milliseconds

Here we have a truly large divergence in performance. As expected, the Analytics service is able to process the query 2x faster on average than the Query Service.

Use Case: Get The Airlines With the Most Routes

Which service is best?

This query is asking a similar question to our previous example. But the twist here is that we will need a join since the Airline data resides in a separate type of document in our bucket from the routes data. We expect the Analytics Service to perform better with this query because it is doing an aggregation and a JOIN.

SQL
 




xxxxxxxxxx
1
19


 
1
select a.id,
2
  a.callsign,
3
  a.name,
4
  a.country,
5
  count(r.id) as route_count
6
from airlines a
7
join routes r on CONCAT("airline_", to_string(a.id)) = r.airlineid
8
group by a.id, a.callsign, a.name, a.country
9
order by route_count desc
10
limit 100


Performance: 82 milliseconds

Here we can see that this query is actually starting to push the Analytics service a little bit. Our first Analytics query took 36 milliseconds on average and this one is pushing it up to 82 milliseconds. The major difference with this query is the addition of a JOIN.

Query Service Equivalent

Remember in the beginning we created separate Analytics “Datasets” for each type of document in the Travel Sample data. Each Dataset functions like its own table. So joining them in a query is simple if you’ve ever written SQL joins before. The Query Service doesn’t have any concept of “Datasets” in the same way as the Analytics service does. Therefore we have to write the query a little differently to account for all of the data residing in the same bucket. We need to join documents of type = “airline” to documents of type = “route”. We need a subquery to do this.

SQL
 




xxxxxxxxxx
1
21


 
1
select a.id,
2
  a.callsign,
3
  a.name,
4
  a.country,
5
  count(r.id) as route_count
6
from `travel-sample` a
7
join (select id, airlineid from `travel-sample` where type = "route") r on CONCAT("airline_", to_string(a.id)) = r.airlineid
8
where a.type = "airline"
9
group by a.id, a.callsign, a.name, a.country
10
order by route_count desc
11
limit 100



Performance: 2 seconds

The performance of the Analytics Service is drastically better for this use case due to the JOIN. Another added benefit of the Analytics Service for this case is that writing the JOIN is simpler since we didn’t need to create a subquery to join against.

Use Case: Get the Percentile Rank of Airlines With the Most Routes

Which service is best?

This example builds on the previous one by adding the percentile rank of route counts. We expect that the performance will be better with Analytics since we are adding even more complexity and computation to the query. Let’s see how big of an impact adding a Window function to our query has.

SQL
 




xxxxxxxxxx
1
27


 
1
select
2
  a.id,
3
  a.callsign,
4
  a.name,
5
  a.country,
6
  count(r.id) as route_count,
7
  PERCENT_RANK() OVER (
8
   ORDER BY count(r.id)
9
   ) AS `rank`
10
from airlines a
11
join routes r on CONCAT("airline_", to_string(a.id)) = r.airlineid
12
group by a.id, a.callsign, a.name, a.country
13
order by route_count desc
14
limit 100



Performance: 85 milliseconds

Adding a Window function was no sweat for the Analytics service as we can barely see a performance difference.

Query Service Equivalent

SQL
 




xxxxxxxxxx
1
23


 
1
select
2
  a.id,
3
  a.callsign,
4
  a.name,
5
  a.country,
6
  count(r.id) as route_count,
7
  PERCENT_RANK() OVER ( ORDER BY count(r.id) ) AS `rank`
8
from `travel-sample` a
9
join (select id, airlineid from `travel-sample` where type = "route") r on CONCAT("airline_", to_string(a.id)) = r.airlineid
10
where a.type = 'airline'
11
group by a.id, a.callsign, a.name, a.country order by route_count desc
12
limit 100



Performance: 2 seconds

When we add the Window function to our Query service query, we are also unable to detect a performance hit. The major takeaway that we can infer from these results is that the JOIN is the biggest performance factor and the aggregation (COUNT in this case) is the second biggest.

Conclusion

We hope that this article has helped you achieve a better understanding of the two SQL options in Couchbase and when to apply them. Be sure to check out the following resources on Query and Analytics.

Topics:
couchbase ,couchbase 6.5 ,couchbase analytics ,couchbase n1ql ,database ,n1ql ,n1ql query ,nosql database ,tutorial

Published at DZone with permission of Evan Pease . See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}