
Implementing a Custom Python Iterator: Patterns, Best Practices, and Real-World Use Cases
Learn how to design and implement custom Python iterators that are robust, memory-efficient, and fit real-world tasks like streaming files, batching database results, and async I/O. This guide walks you step-by-step through iterator protocols, class-based and generator-based approaches, context-manager patterns for clean resource management, and how to combine iterators with asyncio and solid error handling.
Iterators are at the heart of Pythonic, memory-efficient code. Whether you're processing huge logs, paginating API responses, or streaming data over the network, custom iterators let you express lazy, composable pipelines with clarity and performance.
In this post you will learn:
- The iterator protocol and when to implement it directly.
- Class-based and generator-based iterator patterns.
- Real-world examples: file chunk streaming, batched database access, and async iterators for I/O-bound work.
- How to use context managers for safe resource cleanup.
- Best practices for error handling, performance, and avoid common pitfalls.
---
Prerequisites
This post assumes you know:
- Basic Python syntax (classes, functions).
- How
for
loops andnext()
work conceptually. - Some exposure to
asyncio
is helpful for the async section (Python 3.7+ recommended).
---
Core Concepts: Iterable vs Iterator
Quick refresher:
- Iterable: An object that can return an iterator (implements
__iter__()
that returns an iterator). Examples: list, dict, generator function result. - Iterator: An object representing a stream of values, implementing
__iter__()
(returns self) and__next__()
which either returns the next item or raisesStopIteration
.
Important: for
and many other constructs use iterables. A common pattern is to make your container iterable by having __iter__()
return a new iterator instance so multiple iterations work independently.
Official reference: Python docs for the iterator protocol (https://docs.python.org/3/reference/iterator-types.html).
---
Basic class-based iterator
Let's implement a simple counting iterator: it yields numbers from start
to stop - 1
.
class Count:
"""A simple iterator that yields integers from start (inclusive) to stop (exclusive)."""
def __init__(self, start, stop):
self.start = start
self.stop = stop
self.current = start
def __iter__(self):
# Returning self makes this object both iterable and an iterator.
return self
def __next__(self):
if self.current >= self.stop:
raise StopIteration
val = self.current
self.current += 1
return val
Line-by-line explanation:
class Count:
— define the iterator class.__init__
— storestart
,stop
, and initializecurrent
(iteration state).__iter__
— returnsself
, so the object is also the iterator.__next__
— ifcurrent >= stop
, raiseStopIteration
to signal the end. Otherwise return current value and advance state.
c = Count(0, 3)
print(list(c)) # -> [0, 1, 2]
print(list(c)) # -> [] (iterator is exhausted)
- Because
Count
returns itself from__iter__
, it is a single-use iterator: once exhausted, it stays exhausted. For multi-pass iteration you should make__iter__
return a fresh iterator (see below).
Iterable that returns a fresh iterator (multi-pass)
If you want a collection that's iterable multiple times, separate the container and iterator:
class RangeLike:
def __init__(self, start, stop):
self.start = start
self.stop = stop
def __iter__(self):
# Each call creates a new iterator with its own state
return Count(self.start, self.stop)
Now:
r = RangeLike(0, 3)
print(list(r)) # [0, 1, 2]
print(list(r)) # [0, 1, 2] (works again)
Best practice: If your object represents a collection, prefer returning a new iterator in __iter__
. If it's a streaming object (like a generator object), returning self is acceptable.
---
Generator-based iterators: simpler and idiomatic
Generators are often the cleanest way to implement iterators.
Equivalent of Count
with a generator:
def count_gen(start, stop):
current = start
while current < stop:
yield current
current += 1
yield
creates an iterator automatically.- Generators also maintain state and raise
StopIteration
automatically when the function returns.
current
, loop while condition holds, yield
current, then advance. Very concise.
Generators are great for most iterator needs — prefer them unless you must control the iterator object, implement complex methods, or interact with __iter__
/__next__
semantics explicitly.
---
Use case: Streaming a large file in chunks (with safe cleanup)
Imagine processing a huge binary file in fixed-size chunks to avoid memory spikes. We'll implement an iterator that is also a context manager so file resources are cleaned up automatically.
from typing import Iterator, Optional
class FileChunkReader:
"""
Iterate over a file in fixed-size chunks. Can be used as:
with FileChunkReader('data.bin', 4096) as reader:
for chunk in reader:
process(chunk)
"""
def __init__(self, path: str, chunk_size: int = 4096):
self.path = path
self.chunk_size = chunk_size
self._file = None # type: Optional[IO[bytes]]
def __enter__(self):
self._file = open(self.path, 'rb')
return self
def __exit__(self, exc_type, exc, tb):
if self._file:
self._file.close()
self._file = None
def __iter__(self) -> Iterator[bytes]:
if self._file is None:
# Allow iteration without context manager by opening lazily
self._file = open(self.path, 'rb')
return self
def __next__(self) -> bytes:
assert self._file is not None, "File must be open to iterate"
chunk = self._file.read(self.chunk_size)
if not chunk:
# close resource proactively
self._file.close()
self._file = None
raise StopIteration
return chunk
Explanation and important behaviors:
- The class is constructed with the path and chunk size.
- It implements
__enter__
/__exit__
so you can dowith FileChunkReader(...) as r:
and be sure the file is closed. __iter__
opens lazily if not already open, enabling bothwith
usage and a directfor
loop.__next__
reads a chunk and raisesStopIteration
on EOF; it also closes the file when done.
# Preferred: with context manager for deterministic cleanup
with FileChunkReader('large.bin', 1024) as reader:
for idx, chunk in enumerate(reader):
print(idx, len(chunk))
Also works without with, but be careful to avoid leaks:
reader = FileChunkReader('large.bin', 1024)
for chunk in reader:
pass # file will be closed at EOF, but exceptions may leak file handle
Edge cases:
- If an exception occurs inside the
for
loop and you didn't usewith
, the file may remain open unless you handle it. That's why pairing iterators that manage resources with context managers is a best practice.
---
Use case: Batched database cursor iterator with retries and error handling
A typical pattern is to fetch rows in batches from a DB or API. Here is a simplified example simulating DB pagination with error handling and retry logic.
import time
from typing import Iterator, Callable, Iterable
class BatchIterator:
"""
Iterate through a producer function that returns lists of items (pages).
The producer is called repeatedly until it returns an empty list.
"""
def __init__(self, producer: Callable[[int], Iterable], start_page: int = 0, max_retries: int = 3):
self.producer = producer
self.page = start_page
self.buffer = []
self.max_retries = max_retries
def __iter__(self):
return self
def __next__(self):
while not self.buffer:
retries = 0
while retries < self.max_retries:
try:
page_items = list(self.producer(self.page))
break
except Exception as exc:
retries += 1
# Log or handle transient errors, then retry
print(f"Producer error on page {self.page}: {exc} - retry {retries}")
time.sleep(0.1 * retries)
else:
# All retries failed — raise to caller
raise RuntimeError(f"Failed to fetch page {self.page} after {self.max_retries} retries")
if not page_items:
raise StopIteration
self.buffer.extend(page_items)
self.page += 1
return self.buffer.pop(0)
Explanation:
producer(page)
simulates a DB/API call returning zero or more items for a page.BatchIterator
keeps abuffer
of items from the latest page and yields them one by one.- Includes retry logic with exponential backoff and a clear
RuntimeError
when retries fail (example of robust error handling). - Caller can handle
RuntimeError
or let it bubble up — it's explicit and manageable.
---
Asynchronous iterators: leveraging asyncio for efficient I/O-bound operations
For I/O-bound tasks (network, disk, etc.) you should consider async iterators to avoid blocking the event loop. Python supports asynchronous iteration via __aiter__
and __anext__
, or with async generators (using async def
+ yield
).
Simple async generator that simulates streaming data from a network connection:
import asyncio
from typing import AsyncIterator
async def async_stream_simulator(n: int) -> AsyncIterator[int]:
"""Simulate an async stream yielding numbers with an I/O delay."""
for i in range(n):
# Simulate an I/O delay (e.g., network read)
await asyncio.sleep(0.1)
yield i
Usage:
async def main():
async for item in async_stream_simulator(5):
print("Got:", item)
To run: asyncio.run(main())
Key points:
async def
+yield
creates an async generator, which is both an async iterator and an async iterable.- Use
async for
to consume it;await
is used internally to yield control while waiting for I/O. - This pattern is an idiomatic way to leverage Python's asyncio for efficient I/O bound operations: you can process many concurrent streams without blocking.
class AsyncCounter:
def __init__(self, n):
self.n = n
self.i = 0
def __aiter__(self):
return self
async def __anext__(self):
if self.i >= self.n:
raise StopAsyncIteration
await asyncio.sleep(0.05)
val = self.i
self.i += 1
return val
Consume with async for
.
Error handling with async iterators:
- Use try/except blocks in the async consumer.
- If you manage resources (e.g., network connection), combine with asynchronous context managers
async with
(PEP 492) and implement__aenter__
/__aexit__
or useasynccontextmanager
.
Best Practices
- Prefer generators for concise iterators. Use class-based iterators when you need explicit state control or additional methods.
- Use context managers (
with
/__enter__
/__exit__
, orcontextlib.contextmanager
) to manage resources and ensure deterministic cleanup. - Handle
StopIteration
only if you are implementing or manually controlling iteration withnext()
. Letfor
loops handle it for you. - For external I/O, prefer
asyncio
and asynchronous iterators/generators to maximize throughput of I/O-bound apps. - Be explicit in your exception handling: catch expected transient errors, log details, and use retries/backoff where appropriate.
- Document whether your iterable is single-use or multi-pass in the class docstring.
Common Pitfalls
- Returning
self
from__iter__
means single-use iterator. That may surprise consumers who expect multi-pass iterables. - Forgetting to raise
StopIteration
properly in__next__
. - Resource leaks if you open resources in
__iter__
/__next__
and don't close them (use context managers). - Trying to reuse generator objects after exhaustion — generators are single-use.
- Not considering thread-safety: iterators are usually not thread-safe. For multithreaded consumers, use locks or queue patterns.
Advanced Tips
- Combine iterators with
itertools
(islice, chain, tee, groupby) to build powerful lazy pipelines. - For large or slow producers, consider prefetching or buffering strategies — but be careful: prefetching increases memory usage and complexity.
- For CPU-bound processing, combine iterators with
concurrent.futures.ProcessPoolExecutor
for parallelism (iterators are good to feed work lazily). - For async producers, consider
asyncio.Queue
with producer/consumer coroutines when ordering or backpressure matters.
itertools.islice
:
from itertools import islice
def infinite_counter():
i = 0
while True:
yield i
i += 1
Take first 10 values lazily
first_ten = list(islice(infinite_counter(), 10))
---
Final example: a robust file scanner that yields lines, with retries and async option
Below is a final consolidated pattern showing:
- An iterator class for line streaming (sync).
- An async generator alternative for non-blocking reads (simulated with sleep).
- Uses context manager for safe cleanup and robust error handling.
from contextlib import contextmanager
import asyncio
class LineReader:
def __init__(self, path):
self.path = path
self._f = None
def __enter__(self):
self._f = open(self.path, 'r', encoding='utf-8')
return self
def __exit__(self, exc_type, exc, tb):
if self._f:
self._f.close()
self._f = None
def __iter__(self):
if self._f is None:
self._f = open(self.path, 'r', encoding='utf-8')
return self
def __next__(self):
assert self._f is not None
line = self._f.readline()
if line == '':
self._f.close()
self._f = None
raise StopIteration
return line.rstrip('\n')
Async alternative (simulate async I/O)
async def async_line_stream(n: int):
for i in range(n):
await asyncio.sleep(0.01) # simulate read
yield f"line {i}"
Call-to-action: Try the above classes and generators on your own data. Replace the simulated async_line_stream
with a real async file reader like aiofiles
for production use.
---
Conclusion
Custom iterators are powerful tools:
- Use them to express lazy, memory-efficient pipelines.
- Prefer generators for most cases; implement class-based iterators when you need more control.
- Combine iterators with context managers to manage resources cleanly.
- For I/O-bound workloads, leverage asyncio and async iterators/generators.
- Apply robust error handling—retry transient errors, and fail with clear exceptions on unrecoverable conditions.
---
Further Reading
- Official Python docs — Iterator Types: https://docs.python.org/3/library/stdtypes.html#iterator-types
- PEP 525 — Asynchronous Generators
- contextlib — Utilities for context managers: https://docs.python.org/3/library/contextlib.html
- asyncio — Asynchronous I/O framework: https://docs.python.org/3/library/asyncio.html
- itertools — Fast, memory-efficient tools: https://docs.python.org/3/library/itertools.html