Implementing Pagination in Python Web Applications:...

Introduction

When users request lists of data — product catalogs, message feeds, or file directories — returning everything at once is rarely practical. Pagination splits results into manageable pages, improving latency, reducing memory pressure, and creating a better user experience.

This tutorial covers:

Core pagination patterns (offset/limit vs keyset/cursor)
Practical, working Python examples with Flask, SQLAlchemy, and FastAPI
Performance, error handling, and best practices
Advanced topics: prefetching with Python's threading module, paginating file lists (automation), and how Python 3.11 features can help

Whether you're building a REST API or a web frontend, you'll learn when to use which approach and how to implement robust, efficient pagination.

Prerequisites

Before you begin, you should be comfortable with:

Python 3.x (examples assume 3.8+; 3.11-specific notes are marked)
Basic web frameworks: Flask or FastAPI
An SQL database and SQLAlchemy (ORM) or raw SQL
Basic threading or async concepts (optional for advanced tips)

You'll need to install:

Flask, FastAPI, SQLAlchemy, and a DB driver (e.g., psycopg2 for Postgres) for the examples:

pip install flask fastapi uvicorn sqlalchemy

Core Concepts

Why paginate?

Reduce memory and bandwidth usage
Improve response times
Avoid client overload and browser slowdowns
Prevent expensive database scans

Common patterns

Offset/Limit (classic)

- SQL: SELECT ... ORDER BY id LIMIT 20 OFFSET 40 - Pros: Simple, widely supported - Cons: Slow for large offsets (database scans), inconsistent with concurrent writes

Keyset / Cursor pagination

- SQL: SELECT ... WHERE (id > last_seen_id) ORDER BY id LIMIT 20 - Pros: Fast and consistent for large datasets - Cons: Slightly more complex, requires a sortable unique key

Hybrid / Seek approach

- Use keyset for deep paging, offset for early pages, or use composite cursors for complex sorting

Visual analogy (text diagram)

Think of your dataset as a bookshelf:

Offset pagination = "skip N books from the start and read the next M" — fine for early shelves, slow when skipping many books.
Keyset pagination = "remember the last book's ID and continue from there" — efficient and reliable as you move forward.

Example 1 — Offset/Limit Pagination with Flask + SQLAlchemy

We'll implement a minimal Flask endpoint that returns paginated user records.

# app_offset.py
from flask import Flask, request, jsonify
from sqlalchemy import create_engine, Column, Integer, String, select
from sqlalchemy.orm import declarative_base, Session
app = Flask(__name__)
engine = create_engine("sqlite+pysqlite:///example.db", echo=False, future=True)
Base = declarative_base()
class User(Base):
    __tablename__ = "users"
    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)
Base.metadata.create_all(engine)
@app.route("/users")
def users():
    # Read query params with defaults
    try:
        page = max(1, int(request.args.get("page", 1)))
        per_page = min(100, max(1, int(request.args.get("per_page", 20))))
    except ValueError:
        return jsonify({"error": "page and per_page must be integers"}), 400
    offset = (page - 1)  per_page
    with Session(engine) as session:
        stmt = select(User).order_by(User.id).limit(per_page).offset(offset)
        results = session.execute(stmt).scalars().all()
        total = session.execute(select(User)).scalars().count()  # naive total; see notes

    return jsonify({
        "page": page,
        "per_page": per_page,
        "count": len(results),
        "users": [{"id": u.id, "name": u.name} for u in results],
        "total_estimate": total
    })

Line-by-line explanation:

Lines 1–9: Standard imports and SQLAlchemy setup for SQLite.

Class User: defines a simple table with id and name.

In the endpoint:

- Read page and per_page from query parameters; validate and clamp values to avoid abuse. - Compute offset. - Build a SQL statement: ordered select, with limit and offset. - Execute and return JSON with users list and metadata.

Inputs:

GET /users?page=2&per_page=20

Outputs:

JSON list of up to per_page users and page metadata.

Edge cases and notes:

Counting total rows with select(User)).scalars().count() is inefficient for large tables. Use database-specific COUNT() or an approximate count.
Offset pagination becomes slow when offset grows large.
Concurrent writes may cause duplicated/missing items across pages.

Call to action: Try running this example locally, insert sample data, and observe response times as you increase page.

Example 2 — Keyset (Cursor) Pagination with Flask + SQLAlchemy

Keyset pagination uses a cursor (often the last-seen ID). This is the preferred approach for large, frequently updated datasets.

# app_keyset.py
from flask import Flask, request, jsonify
from sqlalchemy import create_engine, Column, Integer, String, select
from sqlalchemy.orm import declarative_base, Session
app = Flask(__name__)
engine = create_engine("sqlite+pysqlite:///example.db", echo=False, future=True)
Base = declarative_base()
class Message(Base):
    __tablename__ = "messages"
    id = Column(Integer, primary_key=True)
    body = Column(String, nullable=False)
Base.metadata.create_all(engine)
@app.route("/messages")
def messages():
    try:
        per_page = min(100, max(1, int(request.args.get("per_page", 20))))
    except ValueError:
        return jsonify({"error": "per_page must be integer"}), 400
    # 'after' is the cursor (last-seen id)
    after = request.args.get("after")
    with Session(engine) as session:
        stmt = select(Message).order_by(Message.id).limit(per_page)
        if after is not None:
            # Protect against invalid cursor values
            try:
                after_id = int(after)
                stmt = stmt.where(Message.id > after_id)
            except ValueError:
                return jsonify({"error": "invalid cursor"}), 400
        results = session.execute(stmt).scalars().all()
    next_cursor = results[-1].id if results else None
    return jsonify({
        "per_page": per_page,
        "messages": [{"id": m.id, "body": m.body} for m in results],
        "next_cursor": next_cursor
    })

Explanation:

Use after as a cursor parameter — the last seen ID.
Query filters WHERE id > after_id and sorts by id.
next_cursor returns the id of the last message in the current page, for the client to request the next page.

Inputs:

GET /messages?per_page=10
GET /messages?per_page=10&after=50

Advantages:

Constant-time queries with proper index on id.
No expensive OFFSET scanning.

Edge cases:

Deletions: if an item is deleted after the client fetched a cursor, the client may see subsequent items without issue.
Insertions at the head: newly inserted rows with smaller IDs won't be returned when paginating forward — often desirable.

Example 3 — Cursor Pagination with FastAPI and Signed Cursors (stateless, secure)

For APIs, it's common to avoid exposing raw DB IDs. We can encode cursors (e.g., base64 or HMAC-signed) and use FastAPI with pydantic models.

# app_fastapi_cursor.py
import base64
import json
import hmac
import hashlib
from fastapi import FastAPI, HTTPException, Query
from pydantic import BaseModel
from typing import Optional, List
from sqlalchemy import create_engine, select, Column, Integer, String
from sqlalchemy.orm import declarative_base, Session
SECRET = b"replace-with-secure-secret"
app = FastAPI()
engine = create_engine("sqlite+pysqlite:///example.db", echo=False, future=True)
Base = declarative_base()
class Item(Base):
    __tablename__ = "items"
    id = Column(Integer, primary_key=True)
    name = Column(String)
Base.metadata.create_all(engine)
class ItemOut(BaseModel):
    id: int
    name: str
def sign_cursor(data: dict) -> str:
    payload = json.dumps(data, separators=(',', ':')).encode()
    sig = hmac.new(SECRET, payload, digestmod=hashlib.sha256).digest()
    return base64.urlsafe_b64encode(payload + b"." + sig).decode()
def verify_cursor(cursor: str) -> dict:
    try:
        raw = base64.urlsafe_b64decode(cursor.encode())
        payload, sig = raw.rsplit(b".", 1)
        expected = hmac.new(SECRET, payload, digestmod=hashlib.sha256).digest()
        if not hmac.compare_digest(sig, expected):
            raise ValueError("bad signature")
        return json.loads(payload.decode())
    except Exception:
        raise HTTPException(status_code=400, detail="Invalid cursor")
@app.get("/items", response_model=dict)
def list_items(per_page: int = Query(20, ge=1, le=100), cursor: Optional[str] = None):
    after_id = None
    if cursor:
        payload = verify_cursor(cursor)
        after_id = payload.get("after_id")
    with Session(engine) as session:
        stmt = select(Item).order_by(Item.id).limit(per_page)
        if after_id is not None:
            stmt = stmt.where(Item.id > after_id)
        rows = session.execute(stmt).scalars().all()
    next_cursor = None
    if rows:
        next_cursor = sign_cursor({"after_id": rows[-1].id})
    return {"items": [ItemOut.from_orm(r).dict() for r in rows], "next_cursor": next_cursor}

Why sign cursors?

Prevents tampering and leaking internal state. You can include other metadata (timestamp, sorting keys).

Edge cases:

Rotating secrets invalidates cursors.
Cursor payload should be minimal to reduce URL length.

Pagination for File Organization and Local Directories

What about paginating file lists? When building scripts that organize or display files, you often need to present files in pages.

Here's a simple script that paginates a directory listing and optionally groups files into folders (relates to "Building a Python Script for Automating File Organization: Practical Techniques"):

# file_pager.py
import os
from math import ceil
def list_files_paginated(directory, per_page=50, page=1):
    entries = [e for e in os.listdir(directory) if os.path.isfile(os.path.join(directory, e))]
    entries.sort()  # deterministic order
    total = len(entries)
    pages = max(1, ceil(total / per_page))
    page = min(max(1, page), pages)
    start = (page - 1)  per_page
    end = start + per_page
    return {
        "page": page,
        "per_page": per_page,
        "total": total,
        "pages": pages,
        "files": entries[start:end]
    }

This pattern is useful when combining pagination with file automation tasks (e.g., batch-moving files into folders of N items).

Performance Considerations and Best Practices

Indexes: Always index columns used in WHERE/ORDER BY for keyset pagination (e.g., id, created_at).

Limit per_page: Cap per_page (e.g., 100) to prevent abuse.

Use efficient counts: Avoid full COUNT() on huge tables on every request. Consider approximate counts, cached totals, or using indexed counters.
Avoid large offsets: Use keyset pagination for deep pages.
Stateless tokens: Use signed cursors so paginated APIs don't require server-side session state.
Use response headers: Provide Link headers or metadata (total, prev/next links) to help clients.

Handling Concurrency and Prefetching with Threading

You can improve perceived UX by prefetching the next page in the background. This is a good use case for "Using Python's Threading Module for Parallel Execution: Tips and Real-World Examples". Example approach:

While serving page N synchronously, spawn a background thread to fetch page N+1 and cache it.
Use thread-safe caches (e.g., a TTL cache with threading.Lock) or external caches (Redis).

Quick snippet demonstrating prefetching (synchronous app):

import threading
from functools import lru_cache
cache = {}
cache_lock = threading.Lock()
def fetch_page_db(page, per_page):
    # expensive DB call simulated
    # return list_of_rows
    pass
def prefetch(page, per_page):
    data = fetch_page_db(page, per_page)
    with cache_lock:
        cache[(page, per_page)] = data
def get_page(page, per_page):
    key = (page, per_page)
    with cache_lock:
        if key in cache:
            return cache.pop(key)  # use cached prefetched page
    # else fetch synchronously and prefetch next
    data = fetch_page_db(page, per_page)
    thread = threading.Thread(target=prefetch, args=(page + 1, per_page), daemon=True)
    thread.start()
    return data

Notes:

Keep background tasks simple and idempotent.
Use daemon threads or proper thread management to avoid leaks.
For heavy concurrency, prefer multiprocessing or async IO, or delegate to a task queue (Celery, RQ).

Advanced Tips

Composite Cursors: For sorting by non-unique columns (e.g., created_at desc), use composite keys: WHERE (created_at, id) < (:last_created_at, :last_id).
Time-based pagination: Use created_at timestamps for feeds. Beware of clock skew and duplicates.
Cursor encoding: Include ordering and constraints so the cursor fully describes the client's position.
Caching: Use Redis to store pages or cursor -> result mappings for hot endpoints.
Stream responses for very large per-page results: yield results in chunks.
Asynchronous endpoints: Use async DB drivers and frameworks (asyncpg, SQLModel, FastAPI async) for heavy loads.

Common Pitfalls

Returning inconsistent totals: If you compute total counts independently of the page data, they can diverge due to concurrent writes.
Using OFFSET for deep paging: Very slow and costly.
Exposing raw DB fields in public cursors: May leak internals or allow tampering.
Unbounded per_page: Clients requesting huge pages can cause DoS.

How Python 3.11 Helps

"Exploring the New Features in Python 3.11: What Developers Need to Know" — there are a few ways 3.11 improves pagination-related systems:

Speed: General CPython optimizations make request handling and DB client libraries slightly faster.
tomllib (std library): Easier config parsing for pagination settings from TOML files.
Exception Groups and except*: Better handling of aggregated exceptions in concurrent prefetching or parallel tasks.
Typing improvements: Improved typing support helps with clearer pydantic models and response types.

Consider upgrading to 3.11 for performance-sensitive endpoints and cleaner concurrent error handling when prefetching pages with threads or async tasks.

Error Handling and Security

Validate inputs strictly (page, per_page, cursor).
Rate limit endpoints — pagination can be abused for scraping.
Use prepared statements or parameterized queries to avoid injection.
Sign or encrypt cursors if you don't want clients to guess internal IDs.

Example: End-to-End with Client Hints and Link Header

Here's a conceptual pattern for REST APIs:

GET /resources?per_page=20 → returns items, next_cursor
Add HTTP Link header: <https://api.example.com/resources?cursor=...>; rel="next"
Client uses next_cursor or Link header for subsequent calls.

Conclusion

Pagination is one of those fundamental features that, when implemented well, dramatically improves your application’s scalability and user experience. Choose:

Offset/Limit for simple, small datasets.
Keyset/Cursor for large, dynamic datasets.
Signed/stateless cursors for secure APIs.
Prefetching with the threading module for better UX, while respecting safety and performance considerations.

Try the examples in this post, adapt them to your database and framework, and monitor query performance as you scale. Want to go further? Explore these advanced directions in "Further Reading".

Implementing Pagination in Python Web Applications: Strategies for Efficient Data Handling