Why Most Programmers Get Pagination Wrong
Why Most Programmers Get Pagination Wrong
Lukas Eder tells us what is wrong with pagination on most sites, who actually gets it right, and how they do it.
Join the DZone community and get the full member experience.Join For Free
Built by the engineers behind Netezza and the technology behind Amazon Redshift, AnzoGraph™ is a native, Massively Parallel Processing (MPP) distributed Graph OLAP (GOLAP) database that executes queries more than 100x faster than other vendors.
Pagination is one of those things that almost everyone gets wrong for two reasons:
- User experience
- Database performance
What’s Wrong With Pagination?
Most applications blindly produce pagination like this:
This is how Gmail implements pagination. With my current settings, it displays 100 emails at a time and also shows how many emails there are in total, namely 1094. Those aren’t the total number of emails I’ve ever had, they’re the total number of emails in my "blog-idea" label (I’m using Gmail as a TODO list, and yes, this blog won’t run out of articles any time soon).
What’s wrong with this practice?
Bad User Experience
As a user, in most cases, I don’t really care about the total number of objects that are in my result set. At least, I don’t care about the exact number. Does it make any difference if I have 1094 or 1093 emails? What about if I had 1067? Or 1000? 1000 would be precise enough for what I’m concerned.
Who got it right?
...And all the other websites that do timelines. Yes, I want to display only 10 rows at a time (or perhaps 100, but almost never all). So, pagination is important. But, I don’t care about the fact that I’ve clicked 317 times on that "next page" button. If I ever browse that many pages (and I hardly ever do), then the only thing that matters is the next 10 rows. Just like when you play Civilization. You don’t care that you’re in turn 317. You just want to play one more turn:
Moreover, I never ever ever want to jump to page 317 right from the beginning. There’s absolutely no use case out there, where I search for something, and then I say, hey, I believe my search result will be item #3175 in the current sort order. Exactly. Instead, I will do any of these:
- Refine the search
- Sort the result
In both cases, I will get a result where the record that I’m looking for is much more likely to appear on page #1 or perhaps #2. Again, when was the last time you Googled for SQL and then went to page #18375 to find that particular blog post that you were looking for? No. You searched for “Java 8 SQL” to find jOOQ, the best way to write SQL in Java 8. For instance.
How to Implement a Timeline With SQL
If your data source is a SQL database, you might have implemented pagination by using
LIMIT.. OFFSET, or
OFFSET.. FETCH or some
ROWNUM / ROW_NUMBER() filtering (see the jOOQ manual for some syntax comparisons across RDBMS). OFFSET is the right tool to jump to page 317, but remember, no one really wants to jump to that page, and besides, OFFSET just skips a fixed number of rows. If there are new rows in the system between the time page number 316 is displayed to a user and when the user skips to page number 317, the rows will shift, because the offsets will shift. No one wants that either when they click on "next."
Instead, you should be using what we refer to as "keyset pagination" (as opposed to "offset pagination"). We’ve described this in past articles here:
The SQL syntax is a bit cumbersome as the pagination criteria becomes an ordinary predicate, but if you’re using jOOQ, you can use the simple synthetic SQL
SEEK clause as such:
DSL.using(configuration) .select(PLAYERS.PLAYER_ID, PLAYERS.FIRST_NAME, PLAYERS.LAST_NAME, PLAYERS.SCORE) .from(PLAYERS) .where(PLAYERS.GAME_ID.eq(42)) .orderBy(PLAYERS.SCORE.desc(), PLAYERS.PLAYER_ID.asc()) .seek(949, 15) .limit(10) .fetch();
The above will fetch the next 10 players after the player with SCORE 949 and ID 15. The pagination really depends on the
ORDER BY clause, which is why you have to provide as many values in the pagination as you provided columns in the
Now, that we’ve fixed the user experience let’s also look at …
How OFFSET Pagination Is Bad for Performance
The previously linked articles about keyset pagination also mention the poor performance of OFFSET pagination. Which is kind of obvious as
OFFSET has to skip a given number of rows after applying all predicates and sorting, etc. So the database has to do all the work and then throw away 3170 records (if you’re jumping to page 317 with 10 rows per page). What a waste.
The following diagram shows very nicely how
OFFSET gets slower and slower for large offsets:
Reproduced from use-the-index-luke.com with permission by Markus Winand
That’s the obvious problem, but there’s another one. People always count the total number of rows to calculate the total number of possible pages. Why? To display nonsense like the following:
Page number: 1 2 3 ... 315 316 317 318 319 ... 50193 50194
Wow. OK, so we’re on page number 317, which we don’t really care about in the first place, but we could just as well jump to page number 50194. This means that the database needed to run the query across all the rows just to be sure we get exactly 50194 pages in total.
Google something like page number pagination and observe the number of tutorials that show how you can implement the above nonsense. On Google Image search, you’ll find:
At the same time, the Google search itself reveals:
As you can see, Google estimates that there are probably at least 10 pages for your search and you can go "next." Yes, you can skip some pages, but you cannot skip to a page number 50194, because, again:
- No one wants that
- It’s costly to predict, even for Google
In fact, Google search implements keyset pagination as well, just like Twitter, Facebook, and Reddit. And, they don’t display the total number of pages because counting that total can be very costly, depending on your database.
In particular, databases that do not support window functions will require you to run two separate queries:
- The actual query with a
- An additional query replacing the
SELECTcolumn list with a simple
Needless to say that this is not the best approach. If your database supports window functions (read about that miraculous SQL feature here on the jOOQ blog), you could produce the total row count in one go as such:
SELECT rental_date, inventory_id, COUNT(*) OVER() FROM rental WHERE customer_id = 1 ORDER BY rental_date LIMIT 10
COUNT(*) OVER() window function is like an ordinary aggregate function, except that it doesn’t group your results. It just counts all the rows of your result and produces that count in each row, prior to limiting the result to 10.
When run against the Sakila database, the above produces:
rental_date inventory_id count 2005-05-25 11:30:37 3021 32 2005-05-28 10:35:23 4020 32 2005-06-15 00:54:12 2785 32 2005-06-15 18:02:53 1021 32 2005-06-15 21:08:46 1407 32 2005-06-16 15:18:57 726 32 2005-06-18 08:41:48 197 32 2005-06-18 13:33:59 3497 32 2005-06-21 06:24:45 4566 32 2005-07-08 03:17:05 1443 32
So, we’re displaying the first page with 10 rows and we need to provide navigational links for a total of 4 pages because we have a total of 32 rows.
What happens when we benchmark this query on PostgreSQL? The first run doesn’t calculate this
COUNT(*) OVER() value, whereas the second one does:
DO $$ DECLARE v_ts TIMESTAMP; v_repeat CONSTANT INT := 10000; rec RECORD; BEGIN v_ts := clock_timestamp(); FOR i IN 1..v_repeat LOOP FOR rec IN ( SELECT rental_date, inventory_id FROM rental WHERE customer_id = 1 ORDER BY rental_date LIMIT 10 ) LOOP NULL; END LOOP; END LOOP; RAISE INFO 'Statement 1: %', (clock_timestamp() - v_ts); v_ts := clock_timestamp(); FOR i IN 1..v_repeat LOOP FOR rec IN ( SELECT rental_date, inventory_id, COUNT(*) OVER() FROM rental WHERE customer_id = 1 ORDER BY rental_date LIMIT 10 ) LOOP NULL; END LOOP; END LOOP; RAISE INFO 'Statement 2: %', (clock_timestamp() - v_ts); END$$;
The result clearly indicates that in PostgreSQL, there’s a significant overhead in calculating this value:
INFO: Statement 1: 00:00:01.041823 INFO: Statement 2: 00:00:03.57145
Oracle optimizes things a bit better when you’re using
ROWNUM to paginate:
SET SERVEROUTPUT ON DECLARE v_ts TIMESTAMP; v_repeat CONSTANT NUMBER := 5000; BEGIN v_ts := SYSTIMESTAMP; FOR i IN 1..v_repeat LOOP FOR rec IN ( SELECT rental_date, inventory_id FROM ( SELECT rental.*, ROWNUM rn FROM rental WHERE customer_id = 1 ORDER BY rental_date ) rental WHERE rn < 5 ORDER BY rn ) LOOP NULL; END LOOP; END LOOP; dbms_output.put_line('Statement 1: ' || (SYSTIMESTAMP - v_ts)); v_ts := SYSTIMESTAMP; FOR i IN 1..v_repeat LOOP FOR rec IN ( SELECT rental_date, inventory_id, COUNT(*) OVER() FROM ( SELECT rental.*, ROWNUM rn FROM rental WHERE customer_id = 1 ORDER BY rental_date ) rental WHERE rn < 5 ORDER BY rn ) LOOP NULL; END LOOP; END LOOP; dbms_output.put_line('Statement 2: ' || (SYSTIMESTAMP - v_ts)); END; /
Statement 1: X Statement 2: X +/- 1%
So, the COUNT(*) seems to be calculated "for free." Bonus question: Why is that?
Due to Oracle license restrictions, we cannot publish benchmark results here, comparing Oracle with PostgreSQL, sorry, but you can run the above code yourself against the Sakila database:
TL;DR: OFFSET pagination bad. Keyset pagination good.
If you want to paginate in your application, please make sure whether you really, really, really need:
- Exact page number
- High page numbers
- The last page number
- The total number of rows
Because if you don’t (and in 98% of all UIs, you really don’t), then you can drastically speed up your queries while providing your users a much better experience. If that’s not a win-win situation worth thinking about…?
And don’t forget, jOOQ ships with native keyset pagination support!
Published at DZone with permission of Lukas Eder , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.