Limit and Offset vs Cursor Pagination
I have some chicken here and some lasagna here1
Context
If you work on enterprise software, and its content is meant to be consumed by humans, you’ll likely need to display some data: Tabular data, images, blog posts, videos, metadata, etc.
Generally speaking, enterprise data sets are too large for any human to maintain a usable short term mental model. Giving them too much at once will invariably overwhelm your users. Inevitably, you’ll have to break down this data into more easily-consumed sections. In the beeswax2, this is called pagination. A simple breakdown of your options follows.
Pagination Options
Limit and Offset Pagination
Limit/Offset pagination is commonly accomplished by including limit and offset variables in the request URL as query parameters.
limitis the number of items to fetch: integeroffsetis the item number to start fetching at: integer
It’s a popular mechanism because these are the keywords used in SQL syntax for achieving pagination at the database level. A request for a collection of books, for example, in the form
GET /books?limit=20&offset=10
Can be simply translated into the SQL
SELECT
*
FROM
books
LIMIT 20
OFFSET 10;
So this approach has two big benefits:
- Simplicity to translate directly to a data storage query
- Statelessness: The server does not have to keep track of pagination
However, it comes with a couple of downsides:
-
Poor performance with very large offsets
Depending on your data storage technology, performance can be poor with very large offset values. 3 4
#### Books Scenario
To illustrate, let’s imagine a books data set with a magnitude in the order of tens of millions. We request ten million books, with an offset of one million.
limit: 10,000,000,offset: 1,000,000The database has to scan the first 1,000,000 rows, throw them away and return us the next 10,000,000. The waste (and performance degradation) is exacerbated as we increase our offset.
-
Broken consistency when new items are added or deleted
If rows are inserted or deleted between page requests, the page window becomes inconsistent, potentially skipping or returning duplicate results.
#### Posts scenario Consider a microblogging feed, wherein a user browses through a posts dataset that is continuously being appended to. This means your dataset is “live”, and as a user you expect new content to appear in the appropriate order as you scroll or page through the result set.
However, in an offset context, this is likely to yield unexpected results.
You have a table with 10 rows, each containing a sequence of integer values 1-10:
Value 1 2 3 4 5 6 7 8 9 10 You request a first page of 5 rows:
limit=5,offset=0. Your result is:Value 1 2 3 4 5 In the meantime, 2 more rows were added to the top of the data set.
Value -1 0 1 2 3 4 5 6 7 8 9 10 You request the next page of 5 rows:
limit=5,offset=5Your result is | Value | |——–| | 4 | | 5 | | 6 | | 7 | | 8 |Where
4,5are values you’ve already seen from the first page.
In summary, offset pagination is a good and simple solution for datasets with small upper bounds that don’t change frequently. However, it can perform poorly with very large datasets, and/or yield inconsistent results with “live”, frequently changing datasets.
Cursor/Keyset-based Pagination
A cursor is a unique pointer to the “last seen” item in the current result set. The client requests a collection of resources with a page size, and the server returns a page with the number of items requested, and a cursor pointing to the last item in the current page.
An initial request to the server:
GET /books?limit=200
Could be translated as SQL to our data source as
SELECT * FROM books
ORDER BY id DESC
LIMIT :limit
-- where limit is the number from the request plus one, to get the next cursor
and the server’s response could look like:
{
"books": [...],
"next_cursor": "foobarbaz"
}
Where next_cursor is a unique pointer for the the last book in the page. Subsequent requests would include this cursor value along with the limit to advance through the result set:
GET /books?limit=200&cursor=foobarbaz
The server would then fetch the next ${limit} records, starting from the record uniquely identified by the cursor foobarbaz.
SELECT * FROM books
WHERE id <= :cursor
ORDER BY id DESC
LIMIT :limit
-- where limit is the number from the request plus one, to get the next cursor
When there are no more pages to fetch, the server returns an empty value for next_cursor.
Like every other technical solution, this approach also comes with a few benefits and drawbacks.
Benefits:
- The scaling problem of the Offset approach is solved. We don’t have to calculate the full result set, nor index pages that will not be returned. This yields much better performance for both small and large result sets.
- The pagination window remains consistent through inserts and deletes, since we’re always fetching the next count rows after a specific reference point.
Keeping in mind that the result sets must be ordered consistently between pages.
Downsides:
- A more complex implementation
We must keep track of sorting concerns as well as uniqueness, particularly if the cursor represents a composite value
- The cursor must be based on one or more unique, sequential columns in the backend store.
The result set must contain a field or combination of fields that can uniquely identify a record
- We can’t keep track of total number of pages or results in the set.
As a consequence, the user loses the ability to jump to a specific page.
- Relies on explicit ordering of the result set by one of the cursor columns.
It’s impossible to paginate without maintaining context of the sorting field and direction.
In summary, cursor-based pagination handily solves the drawbacks from limit/offset pagination, but comes with its own set of issues, which have to be weighed when deciding on the implementation. If large result sets are expected, and records are expected to be added/removed throughout the user’s pagination session, the cursor approach is more desirable.
However, it may be unnecessarily complex to implement if there is user tolerance for an inconsistent result set, and/or sorting is infeasible, and/or jumping to a specific page is a requirement.
-
An obscure reference to an absurd, long-lost (to me) Puerto Rican ad for the food court at the quintessential mall, “Plaza Las Américas”, a microcosm of capitalistic, consumerist pathos. It starts with a nasally-voiced American flight attendant giving you the food options of “chicken or lasagna”, and ends with a display of the abundant choices available at Plaza’s food court. ↩
-
I just thought it would be funny to replace the word, “business” with “beeswax.” Sorry if it’s not funny to you. ↩