
Using Python's functools to Optimize Your Code: Memoization Techniques Explained
Discover how Python's functools module can dramatically speed up your code with memoization. This post walks you step-by-step through built-in tools like lru_cache, creating custom memo decorators, and practical patterns that integrate dataclasses, collections, and even a simple Flask example to illustrate real-world uses.
Introduction
Have you ever written a function that repeatedly performs the same expensive calculation? Imagine recalculating Fibonacci numbers, parsing large files, or performing heavy CPU-bound operations inside a web request. Memoization is a technique that caches function results so repeated calls with the same inputs return instantly.
In Python, the functools module provides powerful tools to implement memoization quickly and safely. In this post you'll learn:
- Core memoization concepts and prerequisites
- Built-in memoization via
functools.lru_cache
andfunctools.cache
- How to build custom memo decorators using
functools.wraps
- Using
dataclasses
andcollections
to manage keys and caches - Integrating memoization into a simple Flask web app
- Best practices, performance considerations, and common pitfalls
Prerequisites
Before diving in, you should be comfortable with:
- Python 3.x function definitions and decorators
- Basic data structures (dict, list, tuple)
- The standard library (functools, collections, dataclasses)
- Basic web development concepts for the Flask example (optional)
Core Concepts
- Memoization: Storing the results of function calls keyed by the call arguments to avoid repeated computation.
- Cache hit/miss: A hit occurs when cached data is returned; a miss when computation is required.
- Hashable keys: Python dicts require hashable keys. Non-hashable arguments (like lists) must be handled or transformed.
- Cache eviction: Without limits, caches can grow unbounded. Eviction policies (LRU — least recently used) limit size.
functools.lru_cache(maxsize=128, typed=False)
— least-recently-used cache with optional max size.functools.cache()
— unlimited cache (Python 3.9+).functools.wraps
— preserve function metadata when writing decorators.
Simple Example: Fibonacci with and without memoization
Let's compare a naive recursive Fibonacci to one using lru_cache
.
# naive_fib.py
def fib_naive(n):
if n < 2:
return n
return fib_naive(n - 1) + fib_naive(n - 2)
Explanation line-by-line:
- def fib_naive(n): — Define a simple recursive Fibonacci function.
- if n < 2: return n — Base cases: fib(0)=0, fib(1)=1.
- return fib_naive(n - 1) + fib_naive(n - 2) — Recursive case; exponential time complexity.
Now use lru_cache
:
# cached_fib.py
from functools import lru_cache
@lru_cache(maxsize=128)
def fib_cached(n):
if n < 2:
return n
return fib_cached(n - 1) + fib_cached(n - 2)
Explanation line-by-line:
- from functools import lru_cache — Import lru_cache from functools.
- @lru_cache(maxsize=128) — Decorate the function to cache up to 128 results. When the cache exceeds 128 different argument combinations, the least-recently-used entries are evicted.
- def fib_cached(n): — Define the cached fibonacci.
- Base and recursive cases identical; repeated calls will be served from cache.
Edge cases:
lru_cache
uses function arguments as cache keys; arguments must be hashable (immutable types like ints, tuples, strings are fine).- If you call with different types (e.g., 1 vs 1.0) and
typed=True
, they are treated as distinct keys.
print(fib_cached.cache_info()) # hits, misses, maxsize, currsize
fib_cached.cache_clear() # clear cache
Writing a Custom Memoization Decorator
Built-ins are great, but sometimes you want control (e.g., for unhashable args or custom eviction). Here’s a simple memo decorator that uses a dict.
# simple_memo.py
from functools import wraps
def simple_memo(func):
cache = {}
@wraps(func)
def wrapper(args, kwargs):
key = (args, tuple(sorted(kwargs.items())))
if key in cache:
return cache[key]
result = func(args, *kwargs)
cache[key] = result
return result
wrapper._cache = cache # expose for introspection/testing
return wrapper
Explanation line-by-line:
- from functools import wraps — Use wraps to preserve metadata like __name__ and __doc__.
- def simple_memo(func): — Define the decorator factory.
- cache = {} — A local dictionary holds cached results.
- @wraps(func) — Decorate wrapper with wraps(func).
- def wrapper(args, *kwargs): — Accept arbitrary positional and keyword args.
- key = (args, tuple(sorted(kwargs.items()))) — Build a hashable key. sorted ensures deterministic order of kwargs.
- if key in cache: return cache[key] — Return cached result if present.
- result = func(args, *kwargs) — Call original function.
- cache[key] = result — Store in cache.
- wrapper._cache = cache — Expose cache for debugging/tests.
- return wrapper — Return the decorated function.
- This simple approach still requires that items in args are hashable. If any argument is mutable (list, dict), the key creation will fail.
- Sorting kwargs is O(k log k) which matters if there are many kwargs.
Handling Unhashable Arguments
If you need to support lists or dicts as arguments, you must transform them into hashable representations. A common approach is to convert lists to tuples and dicts to frozensets (or sorted tuples). Be careful: converting mutable objects at the time of the call won't protect against later mutation that changes semantics.
Example generic key-maker:
# make_key.py
def make_hashable(obj):
if isinstance(obj, (tuple, list)):
return tuple(make_hashable(e) for e in obj)
if isinstance(obj, dict):
return tuple(sorted((k, make_hashable(v)) for k, v in obj.items()))
if isinstance(obj, set):
return tuple(sorted(make_hashable(e) for e in obj))
# primitives and already hashable objects
return obj
Explanation:
- Convert nested lists/tuples to tuples recursively.
- Convert dict to sorted tuple of (key, value) pairs so that ordering is deterministic.
- Convert set to sorted tuple.
- Return primitive or already hashable object as-is.
make_hashable
inside your decorator to create safe keys.
Custom LRU Using collections.OrderedDict
Before lru_cache
was widely used, a typical pattern was collections.OrderedDict
to implement LRU.
# custom_lru.py
from collections import OrderedDict
from functools import wraps
def lru_memo(maxsize=128):
def decorator(func):
cache = OrderedDict()
@wraps(func)
def wrapper(args):
key = args
if key in cache:
cache.move_to_end(key)
return cache[key]
result = func(args)
cache[key] = result
if len(cache) > maxsize:
cache.popitem(last=False) # pop oldest
return result
wrapper._cache = cache
return wrapper
return decorator
Line-by-line notes:
- OrderedDict preserves insertion order. move_to_end marks an item as most recently used.
- If cache exceeds
maxsize
, pop the oldest item (last=False). - This is a simple LRU for positional args only. You can extend key creation to include kwargs.
lru_cache
's internals aren't flexible enough.
Using dataclasses for Cache Keys and Lightweight Models
dataclasses
simplify class definitions and can be used to represent complex function parameters or tasks. When marked frozen=True
, dataclass instances are hashable and safe to use as cache keys.
Example: suppose a heavy computation keyed by multiple parameters:
# dataclass_key.py
from dataclasses import dataclass
@dataclass(frozen=True)
class ComputeParams:
width: int
height: int
mode: str
Explanation:
- @dataclass(frozen=True) — Create an immutable dataclass, making instances hashable.
- Fields width, height, mode — represent computation parameters.
from functools import lru_cache
@lru_cache(maxsize=256)
def compute(params: ComputeParams):
# expensive operation using params.width, params.height, params.mode
return params.width params.height # placeholder
Benefits:
- Cleaner API than passing many separate parameters.
- Freezes the object making it safe for cache keys.
- Works well with type hints and IDE tooling.
frozen=True
alone won't make it hashable.
Using collections for Advanced Data Structures
The collections
module complements memoization patterns:
- defaultdict: Accumulate lists of results grouped by keys.
- deque: Manage a rolling window or time-based caching.
- namedtuple / dataclass: Lightweight records for keys/values.
- Counter: Track cache hit frequencies (for analysis).
defaultdict
to map parameter categories to cached results.
from collections import defaultdict
category_cache = defaultdict(dict)
def cache_by_category(category, params):
if params in category_cache[category]:
return category_cache[category][params]
result = expensive_computation(params)
category_cache[category][params] = result
return result
Explain:
- defaultdict(dict) creates a dict-of-dicts to group caches by category.
- This pattern helps segment the cache and make eviction policies per category.
Practical Example: Memoization in a Flask App
Imagine a simple Flask app that calculates expensive analytics. Caching results reduces latency and server load.
# app.py
from flask import Flask, jsonify, request
from functools import lru_cache
from dataclasses import dataclass
app = Flask(__name__)
@dataclass(frozen=True)
class Query:
x: int
y: int
@lru_cache(maxsize=1024)
def expensive_analytics(q: Query):
# Simulate expensive computation
import time
time.sleep(2)
return {"sum": q.x + q.y, "product": q.x * q.y}
@app.route("/compute")
def compute_route():
try:
x = int(request.args.get("x", 0))
y = int(request.args.get("y", 0))
except ValueError:
return jsonify({"error": "x and y must be integers"}), 400
q = Query(x, y)
result = expensive_analytics(q)
return jsonify(result)
Explanation:
- dataclass Query provides a clean, frozen, hashable key type.
- lru_cache caches results per unique Query.
- The expensive calculation is simulated with time.sleep(2).
- On repeated requests with same x and y, response is fast (cache hit), reducing load.
- For multi-process servers (uWSGI, Gunicorn),
lru_cache
is process-local. Consider a distributed cache (Redis) or Flask-Caching for shared caches. - Validate inputs and handle errors (as demonstrated).
Measuring Performance
You should measure before and after. Example using timeit:
import time
start = time.perf_counter()
print(fib_cached(35))
print("Elapsed:", time.perf_counter() - start)
Tips:
- Use perf_counter for wall-clock measurements.
- For microbenchmarks, use
timeit
module to avoid noise. - Watch out for JIT/warm-up effects in long-running processes.
Best Practices
- Prefer built-in
lru_cache
/cache
for most memoization needs — they are well-tested and efficient. - Use
maxsize
to avoid unbounded memory growth. Tune based on workload and memory constraints. - Use
typed=True
when results differ by argument type (1 vs 1.0). - Make sure arguments are hashable; use dataclasses or helper functions to ensure that.
- Avoid memoizing functions with important side effects (I/O, randomness, database writes).
- Clear or invalidate caches when underlying data changes (
.cache_clear()
). - For web apps, consider process-local vs distributed caching (use Redis/Flask-Caching for multi-process).
- Expose cache metrics (hits/misses using
cache_info()
) to monitor effectiveness.
Common Pitfalls
- Unhashable arguments: lists, dictionaries will raise TypeError unless converted.
- Memory leak: unlimited caches (functools.cache) can exhaust memory.
- Stale data: caching results while underlying data changes — implement invalidation.
- Using memoization on functions that return different values for the same args (e.g., datetime.now()).
- Thread-safety: CPython dict updates are atomic for single operations, but complex cache operations may still have race conditions.
lru_cache
is thread-safe for lookup and updates in CPython, but careful design is required for complex use.
Advanced Tips
- Use
functools.partial
to create fixed-parameter functions that can be memoized separately. - Combine memoization with async functions using caches compatible with async patterns. Note:
functools.lru_cache
does not support async functions directly — wrap synchronous parts or use an external async-aware cache library. - For numeric-heavy workloads, consider memoization with NumPy arrays by converting array inputs to a hashable key (e.g., bytes or a checksum). Be mindful of cost-to-hash vs compute cost.
- Use
cachetools
(third-party) for advanced cache policies (LFU, TTL, time-aware caches). - Profile the application to find where memoization will have the biggest impact (hot paths).
Full Example: Custom Memo with Dataclasses and Collections
Putting it all together — a function that caches results using a dataclass key and an LRU built with deque for eviction:
# full_example.py
from dataclasses import dataclass
from collections import deque
from functools import wraps
@dataclass(frozen=True)
class Params:
a: int
b: int
def memo_with_window(maxsize=1000):
def decorator(func):
cache = {}
order = deque() # hold keys in recency order: right=most recent
@wraps(func)
def wrapper(params: Params):
key = params
if key in cache:
# move key to the right (most recent)
try:
order.remove(key)
except ValueError:
pass
order.append(key)
return cache[key]
result = func(params)
cache[key] = result
order.append(key)
if len(order) > maxsize:
oldest = order.popleft()
del cache[oldest]
return result
wrapper._cache = cache
return wrapper
return decorator
@memo_with_window(maxsize=3)
def compute(params: Params):
return params.a + params.b
Explain:
- dataclass Params is frozen and hashable.
- deque stores recency order and enables O(1) popleft for eviction.
- On cache hit, we remove and re-append the key to mark it most recent (O(n) remove in deque), so this is a simple illustrative approach — for large caches prefer OrderedDict for O(1) move operations.
When Not to Memoize
Ask yourself:
- Is the function cheap already?
- Does the function rely on current external state?
- Will the cache grow uncontrollably?
- Are the arguments frequently unique?
Conclusion
Memoization is one of the simplest and most effective optimizations you can apply. Python's functools.lru_cache
covers the majority of use cases with a small API and solid performance. When you need more control, combine functools
, dataclasses
, and collections
for robust, readable caching solutions. For web applications, remember that local caches are per-process and consider using an external cache for distributed environments.
Call to action: Try applying @lru_cache
to a slow function in your codebase. Measure before and after. If you use Flask, try caching a route's expensive computation with dataclasses
as keys and see your response times drop.
Further Reading and References
- functools — Official Python documentation: https://docs.python.org/3/library/functools.html
- collections — Official Python documentation: https://docs.python.org/3/library/collections.html
- dataclasses — Official Python documentation: https://docs.python.org/3/library/dataclasses.html
- Flask — Official docs and quickstart: https://flask.palletsprojects.com/
- Flask-Caching: https://flask-caching.readthedocs.io/
- cachetools (third-party caching utilities): https://cachetools.readthedocs.io/
- Exploring Python's Data Classes: Simplifying Class Creation and Maintenance — dataclasses make hashing and immutable keys simple.
- Effective Use of Python's collections Module for Advanced Data Structures — OrderedDict, defaultdict, deque are invaluable in cache design.
- Building a Simple Web Application with Flask: Step-by-Step for Beginners — integrate memoization for fast endpoints.
Was this article helpful?
Your feedback helps us improve our content. Thank you!