Back to Blog
Implementing a Custom Python Iterator: Patterns, Best Practices, and Real-World Use Cases

Implementing a Custom Python Iterator: Patterns, Best Practices, and Real-World Use Cases

August 16, 202556 viewsImplementing a Custom Python Iterator: Patterns and Use Cases

Learn how to design and implement custom Python iterators that are robust, memory-efficient, and fit real-world tasks like streaming files, batching database results, and async I/O. This guide walks you step-by-step through iterator protocols, class-based and generator-based approaches, context-manager patterns for clean resource management, and how to combine iterators with asyncio and solid error handling.

Iterators are at the heart of Pythonic, memory-efficient code. Whether you're processing huge logs, paginating API responses, or streaming data over the network, custom iterators let you express lazy, composable pipelines with clarity and performance.

In this post you will learn:

  • The iterator protocol and when to implement it directly.
  • Class-based and generator-based iterator patterns.
  • Real-world examples: file chunk streaming, batched database access, and async iterators for I/O-bound work.
  • How to use context managers for safe resource cleanup.
  • Best practices for error handling, performance, and avoid common pitfalls.
Let's break it down.

---

Prerequisites

This post assumes you know:

  • Basic Python syntax (classes, functions).
  • How for loops and next() work conceptually.
  • Some exposure to asyncio is helpful for the async section (Python 3.7+ recommended).
If you're comfortable with those, you're ready.

---

Core Concepts: Iterable vs Iterator

Quick refresher:

  • Iterable: An object that can return an iterator (implements __iter__() that returns an iterator). Examples: list, dict, generator function result.
  • Iterator: An object representing a stream of values, implementing __iter__() (returns self) and __next__() which either returns the next item or raises StopIteration.
Analogy: Iterable is a book on the shelf. Iterator is a bookmark you use to walk through pages; when you reach the end, you stop.

Important: for and many other constructs use iterables. A common pattern is to make your container iterable by having __iter__() return a new iterator instance so multiple iterations work independently.

Official reference: Python docs for the iterator protocol (https://docs.python.org/3/reference/iterator-types.html).

---

Basic class-based iterator

Let's implement a simple counting iterator: it yields numbers from start to stop - 1.

class Count:
    """A simple iterator that yields integers from start (inclusive) to stop (exclusive)."""
    def __init__(self, start, stop):
        self.start = start
        self.stop = stop
        self.current = start

def __iter__(self): # Returning self makes this object both iterable and an iterator. return self

def __next__(self): if self.current >= self.stop: raise StopIteration val = self.current self.current += 1 return val

Line-by-line explanation:

  • class Count: — define the iterator class.
  • __init__ — store start, stop, and initialize current (iteration state).
  • __iter__ — returns self, so the object is also the iterator.
  • __next__ — if current >= stop, raise StopIteration to signal the end. Otherwise return current value and advance state.
Usage and edge cases:

c = Count(0, 3)
print(list(c))  # -> [0, 1, 2]
print(list(c))  # -> []  (iterator is exhausted)
  • Because Count returns itself from __iter__, it is a single-use iterator: once exhausted, it stays exhausted. For multi-pass iteration you should make __iter__ return a fresh iterator (see below).
---

Iterable that returns a fresh iterator (multi-pass)

If you want a collection that's iterable multiple times, separate the container and iterator:

class RangeLike:
    def __init__(self, start, stop):
        self.start = start
        self.stop = stop

def __iter__(self): # Each call creates a new iterator with its own state return Count(self.start, self.stop)

Now:

r = RangeLike(0, 3)
print(list(r))  # [0, 1, 2]
print(list(r))  # [0, 1, 2]  (works again)

Best practice: If your object represents a collection, prefer returning a new iterator in __iter__. If it's a streaming object (like a generator object), returning self is acceptable.

---

Generator-based iterators: simpler and idiomatic

Generators are often the cleanest way to implement iterators.

Equivalent of Count with a generator:

def count_gen(start, stop):
    current = start
    while current < stop:
        yield current
        current += 1
  • yield creates an iterator automatically.
  • Generators also maintain state and raise StopIteration automatically when the function returns.
Line-by-line: initialize current, loop while condition holds, yield current, then advance. Very concise.

Generators are great for most iterator needs — prefer them unless you must control the iterator object, implement complex methods, or interact with __iter__/__next__ semantics explicitly.

---

Use case: Streaming a large file in chunks (with safe cleanup)

Imagine processing a huge binary file in fixed-size chunks to avoid memory spikes. We'll implement an iterator that is also a context manager so file resources are cleaned up automatically.

from typing import Iterator, Optional

class FileChunkReader: """ Iterate over a file in fixed-size chunks. Can be used as: with FileChunkReader('data.bin', 4096) as reader: for chunk in reader: process(chunk) """

def __init__(self, path: str, chunk_size: int = 4096): self.path = path self.chunk_size = chunk_size self._file = None # type: Optional[IO[bytes]]

def __enter__(self): self._file = open(self.path, 'rb') return self

def __exit__(self, exc_type, exc, tb): if self._file: self._file.close() self._file = None

def __iter__(self) -> Iterator[bytes]: if self._file is None: # Allow iteration without context manager by opening lazily self._file = open(self.path, 'rb') return self

def __next__(self) -> bytes: assert self._file is not None, "File must be open to iterate" chunk = self._file.read(self.chunk_size) if not chunk: # close resource proactively self._file.close() self._file = None raise StopIteration return chunk

Explanation and important behaviors:

  • The class is constructed with the path and chunk size.
  • It implements __enter__/__exit__ so you can do with FileChunkReader(...) as r: and be sure the file is closed.
  • __iter__ opens lazily if not already open, enabling both with usage and a direct for loop.
  • __next__ reads a chunk and raises StopIteration on EOF; it also closes the file when done.
Example usage:

# Preferred: with context manager for deterministic cleanup
with FileChunkReader('large.bin', 1024) as reader:
    for idx, chunk in enumerate(reader):
        print(idx, len(chunk))

Also works without with, but be careful to avoid leaks:

reader = FileChunkReader('large.bin', 1024) for chunk in reader: pass # file will be closed at EOF, but exceptions may leak file handle

Edge cases:

  • If an exception occurs inside the for loop and you didn't use with, the file may remain open unless you handle it. That's why pairing iterators that manage resources with context managers is a best practice.
Reference: Using context managers for clean resource management is the idiomatic way to handle files, sockets, DB connections, etc.

---

Use case: Batched database cursor iterator with retries and error handling

A typical pattern is to fetch rows in batches from a DB or API. Here is a simplified example simulating DB pagination with error handling and retry logic.

import time
from typing import Iterator, Callable, Iterable

class BatchIterator: """ Iterate through a producer function that returns lists of items (pages). The producer is called repeatedly until it returns an empty list. """

def __init__(self, producer: Callable[[int], Iterable], start_page: int = 0, max_retries: int = 3): self.producer = producer self.page = start_page self.buffer = [] self.max_retries = max_retries

def __iter__(self): return self

def __next__(self): while not self.buffer: retries = 0 while retries < self.max_retries: try: page_items = list(self.producer(self.page)) break except Exception as exc: retries += 1 # Log or handle transient errors, then retry print(f"Producer error on page {self.page}: {exc} - retry {retries}") time.sleep(0.1 * retries) else: # All retries failed — raise to caller raise RuntimeError(f"Failed to fetch page {self.page} after {self.max_retries} retries") if not page_items: raise StopIteration self.buffer.extend(page_items) self.page += 1 return self.buffer.pop(0)

Explanation:

  • producer(page) simulates a DB/API call returning zero or more items for a page.
  • BatchIterator keeps a buffer of items from the latest page and yields them one by one.
  • Includes retry logic with exponential backoff and a clear RuntimeError when retries fail (example of robust error handling).
  • Caller can handle RuntimeError or let it bubble up — it's explicit and manageable.
This pattern shows combining iterators with Error Handling in Python: Best Practices for Robust Applications — catch known transient exceptions, retry with backoff, and raise clear exceptions when unrecoverable.

---

Asynchronous iterators: leveraging asyncio for efficient I/O-bound operations

For I/O-bound tasks (network, disk, etc.) you should consider async iterators to avoid blocking the event loop. Python supports asynchronous iteration via __aiter__ and __anext__, or with async generators (using async def + yield).

Simple async generator that simulates streaming data from a network connection:

import asyncio
from typing import AsyncIterator

async def async_stream_simulator(n: int) -> AsyncIterator[int]: """Simulate an async stream yielding numbers with an I/O delay.""" for i in range(n): # Simulate an I/O delay (e.g., network read) await asyncio.sleep(0.1) yield i

Usage:

async def main(): async for item in async_stream_simulator(5): print("Got:", item)

To run: asyncio.run(main())

Key points:

  • async def + yield creates an async generator, which is both an async iterator and an async iterable.
  • Use async for to consume it; await is used internally to yield control while waiting for I/O.
  • This pattern is an idiomatic way to leverage Python's asyncio for efficient I/O bound operations: you can process many concurrent streams without blocking.
If you need to build a custom async iterator class:

class AsyncCounter:
    def __init__(self, n):
        self.n = n
        self.i = 0

def __aiter__(self): return self

async def __anext__(self): if self.i >= self.n: raise StopAsyncIteration await asyncio.sleep(0.05) val = self.i self.i += 1 return val

Consume with async for.

Error handling with async iterators:

  • Use try/except blocks in the async consumer.
  • If you manage resources (e.g., network connection), combine with asynchronous context managers async with (PEP 492) and implement __aenter__/__aexit__ or use asynccontextmanager.
---

Best Practices

  • Prefer generators for concise iterators. Use class-based iterators when you need explicit state control or additional methods.
  • Use context managers (with/__enter__/__exit__, or contextlib.contextmanager) to manage resources and ensure deterministic cleanup.
  • Handle StopIteration only if you are implementing or manually controlling iteration with next(). Let for loops handle it for you.
  • For external I/O, prefer asyncio and asynchronous iterators/generators to maximize throughput of I/O-bound apps.
  • Be explicit in your exception handling: catch expected transient errors, log details, and use retries/backoff where appropriate.
  • Document whether your iterable is single-use or multi-pass in the class docstring.
---

Common Pitfalls

  • Returning self from __iter__ means single-use iterator. That may surprise consumers who expect multi-pass iterables.
  • Forgetting to raise StopIteration properly in __next__.
  • Resource leaks if you open resources in __iter__/__next__ and don't close them (use context managers).
  • Trying to reuse generator objects after exhaustion — generators are single-use.
  • Not considering thread-safety: iterators are usually not thread-safe. For multithreaded consumers, use locks or queue patterns.
---

Advanced Tips

  • Combine iterators with itertools (islice, chain, tee, groupby) to build powerful lazy pipelines.
  • For large or slow producers, consider prefetching or buffering strategies — but be careful: prefetching increases memory usage and complexity.
  • For CPU-bound processing, combine iterators with concurrent.futures.ProcessPoolExecutor for parallelism (iterators are good to feed work lazily).
  • For async producers, consider asyncio.Queue with producer/consumer coroutines when ordering or backpressure matters.
Example: composing iterators with itertools.islice:
from itertools import islice

def infinite_counter(): i = 0 while True: yield i i += 1

Take first 10 values lazily

first_ten = list(islice(infinite_counter(), 10))

---

Final example: a robust file scanner that yields lines, with retries and async option

Below is a final consolidated pattern showing:

  • An iterator class for line streaming (sync).
  • An async generator alternative for non-blocking reads (simulated with sleep).
  • Uses context manager for safe cleanup and robust error handling.
from contextlib import contextmanager
import asyncio

class LineReader: def __init__(self, path): self.path = path self._f = None

def __enter__(self): self._f = open(self.path, 'r', encoding='utf-8') return self

def __exit__(self, exc_type, exc, tb): if self._f: self._f.close() self._f = None

def __iter__(self): if self._f is None: self._f = open(self.path, 'r', encoding='utf-8') return self

def __next__(self): assert self._f is not None line = self._f.readline() if line == '': self._f.close() self._f = None raise StopIteration return line.rstrip('\n')

Async alternative (simulate async I/O)

async def async_line_stream(n: int): for i in range(n): await asyncio.sleep(0.01) # simulate read yield f"line {i}"

Call-to-action: Try the above classes and generators on your own data. Replace the simulated async_line_stream with a real async file reader like aiofiles for production use.

---

Conclusion

Custom iterators are powerful tools:

  • Use them to express lazy, memory-efficient pipelines.
  • Prefer generators for most cases; implement class-based iterators when you need more control.
  • Combine iterators with context managers to manage resources cleanly.
  • For I/O-bound workloads, leverage asyncio and async iterators/generators.
  • Apply robust error handling—retry transient errors, and fail with clear exceptions on unrecoverable conditions.
Iterators unlock elegant designs for streaming, batching, and composing complex flows. Try converting one of your eager data-processing routines into an iterator-based pipeline and notice the clarity and memory gains.

---

Further Reading

If you found this helpful, try implementing one of your project's streaming needs using the patterns here — and share your code or questions in the comments!

Related Posts

Mastering Retry Logic in Python: Best Practices for Robust API Calls

Ever wondered why your Python scripts fail miserably during flaky network conditions? In this comprehensive guide, you'll learn how to implement resilient retry logic for API calls, ensuring your applications stay robust and reliable. Packed with practical code examples, best practices, and tips on integrating with virtual environments and advanced formatting, this post will elevate your Python skills to handle real-world challenges effortlessly.

Mastering List Comprehensions: Tips and Tricks for Cleaner Python Code

Unlock the full power of Python's list comprehensions to write clearer, faster, and more expressive code. This guide walks intermediate developers through essentials, advanced patterns, performance trade-offs, and practical integrations with caching and decorators to make your code both concise and robust.

Harnessing Python's Context Managers for Resource Management: Patterns and Best Practices

Discover how Python's context managers simplify safe, readable resource management from simple file handling to complex async workflows. This post breaks down core concepts, practical patterns (including generator-based context managers), type hints integration, CLI use cases, and advanced tools like ExitStack — with clear code examples and actionable best practices.