
Implementing Pagination in Python Web Applications: Strategies for Efficient Data Handling
Pagination is essential for building responsive, scalable Python web applications that handle large datasets. This post breaks down pagination strategies—offset/limit, keyset (cursor), and hybrid approaches—with practical Flask and FastAPI examples, performance tips, and advanced techniques like background prefetching using Python's threading module. Follow along to implement efficient, production-ready pagination and learn how related topics like automated file batching and Python 3.11 improvements can enhance your systems.
Introduction
When users request lists of data — product catalogs, message feeds, or file directories — returning everything at once is rarely practical. Pagination splits results into manageable pages, improving latency, reducing memory pressure, and creating a better user experience.
This tutorial covers:
- Core pagination patterns (offset/limit vs keyset/cursor)
- Practical, working Python examples with Flask, SQLAlchemy, and FastAPI
- Performance, error handling, and best practices
- Advanced topics: prefetching with Python's threading module, paginating file lists (automation), and how Python 3.11 features can help
Prerequisites
Before you begin, you should be comfortable with:
- Python 3.x (examples assume 3.8+; 3.11-specific notes are marked)
- Basic web frameworks: Flask or FastAPI
- An SQL database and SQLAlchemy (ORM) or raw SQL
- Basic threading or async concepts (optional for advanced tips)
- Flask, FastAPI, SQLAlchemy, and a DB driver (e.g., psycopg2 for Postgres) for the examples:
Core Concepts
Why paginate?
- Reduce memory and bandwidth usage
- Improve response times
- Avoid client overload and browser slowdowns
- Prevent expensive database scans
Common patterns
- Offset/Limit (classic)
- Keyset / Cursor pagination
- Hybrid / Seek approach
Visual analogy (text diagram)
Think of your dataset as a bookshelf:- Offset pagination = "skip N books from the start and read the next M" — fine for early shelves, slow when skipping many books.
- Keyset pagination = "remember the last book's ID and continue from there" — efficient and reliable as you move forward.
Example 1 — Offset/Limit Pagination with Flask + SQLAlchemy
We'll implement a minimal Flask endpoint that returns paginated user records.
# app_offset.py
from flask import Flask, request, jsonify
from sqlalchemy import create_engine, Column, Integer, String, select
from sqlalchemy.orm import declarative_base, Session
app = Flask(__name__)
engine = create_engine("sqlite+pysqlite:///example.db", echo=False, future=True)
Base = declarative_base()
class User(Base):
__tablename__ = "users"
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
Base.metadata.create_all(engine)
@app.route("/users")
def users():
# Read query params with defaults
try:
page = max(1, int(request.args.get("page", 1)))
per_page = min(100, max(1, int(request.args.get("per_page", 20))))
except ValueError:
return jsonify({"error": "page and per_page must be integers"}), 400
offset = (page - 1) per_page
with Session(engine) as session:
stmt = select(User).order_by(User.id).limit(per_page).offset(offset)
results = session.execute(stmt).scalars().all()
total = session.execute(select(User)).scalars().count() # naive total; see notes
return jsonify({
"page": page,
"per_page": per_page,
"count": len(results),
"users": [{"id": u.id, "name": u.name} for u in results],
"total_estimate": total
})
Line-by-line explanation:
- Lines 1–9: Standard imports and SQLAlchemy setup for SQLite.
- Class User: defines a simple table with
id
andname
. - In the endpoint:
page
and per_page
from query parameters; validate and clamp values to avoid abuse.
- Compute offset
.
- Build a SQL statement: ordered select, with limit and offset.
- Execute and return JSON with users
list and metadata.
Inputs:
- GET /users?page=2&per_page=20
- JSON list of up to
per_page
users and page metadata.
- Counting total rows with
select(User)).scalars().count()
is inefficient for large tables. Use database-specific COUNT() or an approximate count. - Offset pagination becomes slow when offset grows large.
- Concurrent writes may cause duplicated/missing items across pages.
page
.
Example 2 — Keyset (Cursor) Pagination with Flask + SQLAlchemy
Keyset pagination uses a cursor (often the last-seen ID). This is the preferred approach for large, frequently updated datasets.
# app_keyset.py
from flask import Flask, request, jsonify
from sqlalchemy import create_engine, Column, Integer, String, select
from sqlalchemy.orm import declarative_base, Session
app = Flask(__name__)
engine = create_engine("sqlite+pysqlite:///example.db", echo=False, future=True)
Base = declarative_base()
class Message(Base):
__tablename__ = "messages"
id = Column(Integer, primary_key=True)
body = Column(String, nullable=False)
Base.metadata.create_all(engine)
@app.route("/messages")
def messages():
try:
per_page = min(100, max(1, int(request.args.get("per_page", 20))))
except ValueError:
return jsonify({"error": "per_page must be integer"}), 400
# 'after' is the cursor (last-seen id)
after = request.args.get("after")
with Session(engine) as session:
stmt = select(Message).order_by(Message.id).limit(per_page)
if after is not None:
# Protect against invalid cursor values
try:
after_id = int(after)
stmt = stmt.where(Message.id > after_id)
except ValueError:
return jsonify({"error": "invalid cursor"}), 400
results = session.execute(stmt).scalars().all()
next_cursor = results[-1].id if results else None
return jsonify({
"per_page": per_page,
"messages": [{"id": m.id, "body": m.body} for m in results],
"next_cursor": next_cursor
})
Explanation:
- Use
after
as a cursor parameter — the last seen ID. - Query filters
WHERE id > after_id
and sorts byid
. next_cursor
returns theid
of the last message in the current page, for the client to request the next page.
- GET /messages?per_page=10
- GET /messages?per_page=10&after=50
- Constant-time queries with proper index on
id
. - No expensive OFFSET scanning.
- Deletions: if an item is deleted after the client fetched a cursor, the client may see subsequent items without issue.
- Insertions at the head: newly inserted rows with smaller IDs won't be returned when paginating forward — often desirable.
Example 3 — Cursor Pagination with FastAPI and Signed Cursors (stateless, secure)
For APIs, it's common to avoid exposing raw DB IDs. We can encode cursors (e.g., base64 or HMAC-signed) and use FastAPI with pydantic models.
# app_fastapi_cursor.py
import base64
import json
import hmac
import hashlib
from fastapi import FastAPI, HTTPException, Query
from pydantic import BaseModel
from typing import Optional, List
from sqlalchemy import create_engine, select, Column, Integer, String
from sqlalchemy.orm import declarative_base, Session
SECRET = b"replace-with-secure-secret"
app = FastAPI()
engine = create_engine("sqlite+pysqlite:///example.db", echo=False, future=True)
Base = declarative_base()
class Item(Base):
__tablename__ = "items"
id = Column(Integer, primary_key=True)
name = Column(String)
Base.metadata.create_all(engine)
class ItemOut(BaseModel):
id: int
name: str
def sign_cursor(data: dict) -> str:
payload = json.dumps(data, separators=(',', ':')).encode()
sig = hmac.new(SECRET, payload, digestmod=hashlib.sha256).digest()
return base64.urlsafe_b64encode(payload + b"." + sig).decode()
def verify_cursor(cursor: str) -> dict:
try:
raw = base64.urlsafe_b64decode(cursor.encode())
payload, sig = raw.rsplit(b".", 1)
expected = hmac.new(SECRET, payload, digestmod=hashlib.sha256).digest()
if not hmac.compare_digest(sig, expected):
raise ValueError("bad signature")
return json.loads(payload.decode())
except Exception:
raise HTTPException(status_code=400, detail="Invalid cursor")
@app.get("/items", response_model=dict)
def list_items(per_page: int = Query(20, ge=1, le=100), cursor: Optional[str] = None):
after_id = None
if cursor:
payload = verify_cursor(cursor)
after_id = payload.get("after_id")
with Session(engine) as session:
stmt = select(Item).order_by(Item.id).limit(per_page)
if after_id is not None:
stmt = stmt.where(Item.id > after_id)
rows = session.execute(stmt).scalars().all()
next_cursor = None
if rows:
next_cursor = sign_cursor({"after_id": rows[-1].id})
return {"items": [ItemOut.from_orm(r).dict() for r in rows], "next_cursor": next_cursor}
Why sign cursors?
- Prevents tampering and leaking internal state. You can include other metadata (timestamp, sorting keys).
- Rotating secrets invalidates cursors.
- Cursor payload should be minimal to reduce URL length.
Pagination for File Organization and Local Directories
What about paginating file lists? When building scripts that organize or display files, you often need to present files in pages.
Here's a simple script that paginates a directory listing and optionally groups files into folders (relates to "Building a Python Script for Automating File Organization: Practical Techniques"):
# file_pager.py
import os
from math import ceil
def list_files_paginated(directory, per_page=50, page=1):
entries = [e for e in os.listdir(directory) if os.path.isfile(os.path.join(directory, e))]
entries.sort() # deterministic order
total = len(entries)
pages = max(1, ceil(total / per_page))
page = min(max(1, page), pages)
start = (page - 1) per_page
end = start + per_page
return {
"page": page,
"per_page": per_page,
"total": total,
"pages": pages,
"files": entries[start:end]
}
This pattern is useful when combining pagination with file automation tasks (e.g., batch-moving files into folders of N items).
Performance Considerations and Best Practices
- Indexes: Always index columns used in WHERE/ORDER BY for keyset pagination (e.g., id, created_at).
- Limit per_page: Cap
per_page
(e.g., 100) to prevent abuse. - Use efficient counts: Avoid full COUNT() on huge tables on every request. Consider approximate counts, cached totals, or using indexed counters.
- Avoid large offsets: Use keyset pagination for deep pages.
- Stateless tokens: Use signed cursors so paginated APIs don't require server-side session state.
- Use response headers: Provide Link headers or metadata (total, prev/next links) to help clients.
Handling Concurrency and Prefetching with Threading
You can improve perceived UX by prefetching the next page in the background. This is a good use case for "Using Python's Threading Module for Parallel Execution: Tips and Real-World Examples". Example approach:
- While serving page N synchronously, spawn a background thread to fetch page N+1 and cache it.
- Use thread-safe caches (e.g., a TTL cache with threading.Lock) or external caches (Redis).
import threading
from functools import lru_cache
cache = {}
cache_lock = threading.Lock()
def fetch_page_db(page, per_page):
# expensive DB call simulated
# return list_of_rows
pass
def prefetch(page, per_page):
data = fetch_page_db(page, per_page)
with cache_lock:
cache[(page, per_page)] = data
def get_page(page, per_page):
key = (page, per_page)
with cache_lock:
if key in cache:
return cache.pop(key) # use cached prefetched page
# else fetch synchronously and prefetch next
data = fetch_page_db(page, per_page)
thread = threading.Thread(target=prefetch, args=(page + 1, per_page), daemon=True)
thread.start()
return data
Notes:
- Keep background tasks simple and idempotent.
- Use daemon threads or proper thread management to avoid leaks.
- For heavy concurrency, prefer multiprocessing or async IO, or delegate to a task queue (Celery, RQ).
Advanced Tips
- Composite Cursors: For sorting by non-unique columns (e.g., created_at desc), use composite keys: WHERE (created_at, id) < (:last_created_at, :last_id).
- Time-based pagination: Use created_at timestamps for feeds. Beware of clock skew and duplicates.
- Cursor encoding: Include ordering and constraints so the cursor fully describes the client's position.
- Caching: Use Redis to store pages or cursor -> result mappings for hot endpoints.
- Stream responses for very large per-page results: yield results in chunks.
- Asynchronous endpoints: Use async DB drivers and frameworks (asyncpg, SQLModel, FastAPI async) for heavy loads.
Common Pitfalls
- Returning inconsistent totals: If you compute total counts independently of the page data, they can diverge due to concurrent writes.
- Using OFFSET for deep paging: Very slow and costly.
- Exposing raw DB fields in public cursors: May leak internals or allow tampering.
- Unbounded per_page: Clients requesting huge pages can cause DoS.
How Python 3.11 Helps
"Exploring the New Features in Python 3.11: What Developers Need to Know" — there are a few ways 3.11 improves pagination-related systems:
- Speed: General CPython optimizations make request handling and DB client libraries slightly faster.
- tomllib (std library): Easier config parsing for pagination settings from TOML files.
- Exception Groups and except*: Better handling of aggregated exceptions in concurrent prefetching or parallel tasks.
- Typing improvements: Improved typing support helps with clearer pydantic models and response types.
Error Handling and Security
- Validate inputs strictly (page, per_page, cursor).
- Rate limit endpoints — pagination can be abused for scraping.
- Use prepared statements or parameterized queries to avoid injection.
- Sign or encrypt cursors if you don't want clients to guess internal IDs.
Example: End-to-End with Client Hints and Link Header
Here's a conceptual pattern for REST APIs:
- GET /resources?per_page=20 → returns items, next_cursor
- Add HTTP Link header: <https://api.example.com/resources?cursor=...>; rel="next"
- Client uses
next_cursor
or Link header for subsequent calls.
Conclusion
Pagination is one of those fundamental features that, when implemented well, dramatically improves your application’s scalability and user experience. Choose:
- Offset/Limit for simple, small datasets.
- Keyset/Cursor for large, dynamic datasets.
- Signed/stateless cursors for secure APIs.
- Prefetching with the threading module for better UX, while respecting safety and performance considerations.
Further Reading
- SQL: Using LIMIT/OFFSET and WHERE clauses — official DB docs (Postgres, MySQL)
- SQLAlchemy ORM: https://docs.sqlalchemy.org/
- Flask: https://flask.palletsprojects.com/
- FastAPI: https://fastapi.tiangolo.com/
- Python threading: https://docs.python.org/3/library/threading.html
- Python 3.11 Release Notes: https://docs.python.org/3.11/whatsnew/3.11.html
- tomllib (Python 3.11): https://docs.python.org/3/library/tomllib.html
Was this article helpful?
Your feedback helps us improve our content. Thank you!