Paging Data Queries With N1QL

DZone 's Guide to

Paging Data Queries With N1QL

Beer makes everything easier — including learning how to page data queries with N1QL. In this article, we'll use a hypothetical beer application to learn just that.

· Database Zone ·
Free Resource

For this tutorial, we'll be using the beer-sample dataset that comes with Couchbase so it will be easy to follow along. If you don't yet have Couchbase installed, you can get going very quickly by having Docker and following the easy instructions on the Couchbase Docker image page to set up a quick instance of Couchbase for testing. Simply running the command with Couchbase installed gets us started:

docker run -d -name db -p 8091-8094:8091-8094 -p 11210:11210 Couchbase

After following a few setup prompts described on that page, we're ready to roll with some data.

Use Cases: What Is Paging and Why Do It?

Let's consider a very common need; in our hypothetical front-end beer application, we want to show our users a table of beers and let them rate each one. The problem is that the database has over 5,000 beers in it. It'd be a waste to send the whole dataset to the browser every time, and would really slow down the loading of your web application's page. 

Much better would be to show the user only the first 10 beers. Then, by either clicking a Next button to going to the next page, or even better by using an infinite scroll plugin (such as ngInfiniteScroll for Angular or react-infinite for React), the user can be shown progressively more of the records from the database.

Show Me the Code!

SELECT name, category, abv from `beer-sample`
WHERE brewery_id is not missing

This returns the data you'd expect:

    "abv": 0, 
    "category": "North American Lager", 
    "name": "#17 Cream Ale" 
    "abv": 0, 
    "name": "#40 Golden Lager" 
    "abv": 0, 
    "category": "Other Style", 
    "name": "#42 Cream Ale" 
  /* (...) */

To get the second page of results, you only need to add the LIMIT to the OFFSET. For example, OFFSET 10 LIMIT 10 would be the second page. To generalize this, how many records should you skip over with OFFSET in order to get to the nth page? It will always be n * page size.

Using this simple math, we can generalize this to any scenario. What if we wanted to show 20 results at a time and the user wanted to jump straight to the 11th page? No problem, it's just  LIMIT 20 OFFSET 20 * 11. Application endpoints, then, whether they're searching the database or just listing beers by name, can use the exact same query logic each time. No matter how complex the query is, just adjust LIMIT andOFFSET and we're all on the same page.

How Does This Work?

The magic here is the combination of three N1QL clauses: LIMIT, OFFSET, and ORDER BY. Let's take a quick look at each because they're all needed to get the job done.

  • LIMIT does just what it says. It limits how many records Couchbase will return, and effectively enforces our page size (we only want 10 beers at a time, never 5,000).
  • OFFSET tells Couchbase how many records to jump over before it begins returning documents. If you're familiar with SQL, it behaves just like SKIP in SQL. The OFFSET clause is how we move forward through the entire data set one chunk at a time.
  • ORDER BY guarantees a certain order returned by the database. This is important because Couchbase doesn't guarantee any particular order of the results unless you ask for it. In our previous query, the #17 Cream Ale appeared as the first result when ordering by name. If we didn't use ORDER BY, we wouldn't be able to predict where in the set the #17 Cream Ale would appear, and it might be possible it could appear on more than one page depending on how Couchbase executed the query!

Put simply, imagine Couchbase building a list of all of the beers ordered by name (ORDER BY). Pagination is just taking batches of 10 of them (LIMIT 10). Each page then starts by skipping through the entire ordered list to the starting point of the next page (OFFSET).

Explain Yourself, Couchbase

One final example! As with any N1QL query, you can gain a lot of insight into how it works by just putting EXPLAIN" in front of it. This tells Couchbase to tell you what it's planning to do.

EXPLAIN SELECT name, category, abv from `beer-sample`
WHERE brewery_id is not missing

(Try this query for yourself!) The output is rather long, so we won't paste it here, but let's summarize what Couchbase is doing with this query.

  1. It first does a primary scan of the beer-sample bucket and fetches documents.
  2. In parallel, it filters for:
    1. Records that have a brewery_id (this tell us the record is a beer, not a brewery document).
    2. It projects out the name, category, and ABV properties (what we asked for).
  3. It then orders the results by name
  4. The offset is applied next, effectively skipping the first X records and not returning them at all.
  5. Finally, it applies the LIMIT by reading the next n records and ignoring all those after it.

When Are We Done?

If we're moving through our data one page at a time, we also need to know when to stop. There are two ways of approaching this. On one hand, if we keep advancing through the pages indefinitely, we'll eventually run out of data. On the other hand, you could just pre-compute how many pages are present and then iterate through them until you hit that number.

What happens if we ask for the seventh page of 1,000 results, knowing full well that there isn't any such document set?

SELECT name, category, abv from `beer-sample`
WHERE brewery_id is not missing
OFFSET 7 * 1000
LIMIT 1000;

Well, never fear — if you go past the end, you'll just get an empty array.

  "results": [],
  "metrics": {
    "elapsedTime": "465.446327ms",
    "executionTime": "465.384101ms",
    "resultCount": 0,
    "resultSize": 0,
    "sortCount": 5891

For the record, the empty array is also what you'll get if you ask for nonsense, such as:

SELECT name, category, abv from `beer-sample` 
WHERE brewery_id is not missing
ORDER BY name 
LIMIT -10;

Wait, what?!?! This query is asking to start five records before the beginning of the output set and return no more than -10 records. Five records before the beginning is nothing. And no more than -10 records is nothing. So you get... nothing!

This behavior is very convenient; it's clear when you're out of records, and it also plays nicely with a lot of other assumptions at different layers of software. If you've ever used REST APIs that supported paging, this is generally how they work. For example, you can easily imagine a REST API endpoint that looks like this: http://cool-app.com/beers?limit=10&page=6000. If you wrote such an endpoint in your backend, you can easily see how that would generate a corresponding N1QL query, and even if the user gave you a funky page number, your endpoint would do the right thing.

What If I Want to Know Ahead of Time How Many Pages There Will Be?

This is straightforward, as well. If we know that each page has n records in it, and we know the total number of records, then we can divide the total by n, take the ceiling of that number, and that will tell us how many potential pages are in Couchbase.

SELECT CEIL(count(*) / 10) from `beer-sample`
WHERE brewery_id is not missing

This yields on my local database 590. Because there are exactly 5,891 beer documents, that means there will be 589 sets of 10, and the final page will only have one document on it. Because 5,891 / 10 = 589.1, we use the N1QL CEIL function to get an integer number of pages, since whatever front-end application isn't going to display 0.1 of a page.

Keep in mind that we still need the WHERE clause here to cut down the documents to only beer documents, excluding brewery and other documents. If the WHERE was missing, you'd get the wrong number for sure.

data queries, database, n1ql, paging, tutorial

Published at DZone with permission of Laura Czajkowski , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}