Implementing a Custom Python Iterator: Patterns, Best...

Iterators are at the heart of Pythonic, memory-efficient code. Whether you're processing huge logs, paginating API responses, or streaming data over the network, custom iterators let you express lazy, composable pipelines with clarity and performance.

In this post you will learn:

The iterator protocol and when to implement it directly.
Class-based and generator-based iterator patterns.
Real-world examples: file chunk streaming, batched database access, and async iterators for I/O-bound work.
How to use context managers for safe resource cleanup.
Best practices for error handling, performance, and avoid common pitfalls.

Let's break it down.

---

Prerequisites

This post assumes you know:

Basic Python syntax (classes, functions).
How for loops and next() work conceptually.
Some exposure to asyncio is helpful for the async section (Python 3.7+ recommended).

If you're comfortable with those, you're ready.

---

Core Concepts: Iterable vs Iterator

Quick refresher:

Iterable: An object that can return an iterator (implements __iter__() that returns an iterator). Examples: list, dict, generator function result.
Iterator: An object representing a stream of values, implementing __iter__() (returns self) and __next__() which either returns the next item or raises StopIteration.

Analogy: Iterable is a book on the shelf. Iterator is a bookmark you use to walk through pages; when you reach the end, you stop.

Important: for and many other constructs use iterables. A common pattern is to make your container iterable by having __iter__() return a new iterator instance so multiple iterations work independently.

Official reference: Python docs for the iterator protocol (https://docs.python.org/3/reference/iterator-types.html).

---

Basic class-based iterator

Let's implement a simple counting iterator: it yields numbers from start to stop - 1.

class Count:
    """A simple iterator that yields integers from start (inclusive) to stop (exclusive)."""
    def __init__(self, start, stop):
        self.start = start
        self.stop = stop
        self.current = start
    def __iter__(self):
        # Returning self makes this object both iterable and an iterator.
        return self
    def __next__(self):
        if self.current >= self.stop:
            raise StopIteration
        val = self.current
        self.current += 1
        return val

Line-by-line explanation:

class Count: — define the iterator class.
__init__ — store start, stop, and initialize current (iteration state).
__iter__ — returns self, so the object is also the iterator.
__next__ — if current >= stop, raise StopIteration to signal the end. Otherwise return current value and advance state.

Usage and edge cases:

c = Count(0, 3)
print(list(c))  # -> [0, 1, 2]
print(list(c))  # -> []  (iterator is exhausted)

Because Count returns itself from __iter__, it is a single-use iterator: once exhausted, it stays exhausted. For multi-pass iteration you should make __iter__ return a fresh iterator (see below).

---

Iterable that returns a fresh iterator (multi-pass)

If you want a collection that's iterable multiple times, separate the container and iterator:

class RangeLike:
    def __init__(self, start, stop):
        self.start = start
        self.stop = stop
    def __iter__(self):
        # Each call creates a new iterator with its own state
        return Count(self.start, self.stop)

Now:

r = RangeLike(0, 3)
print(list(r))  # [0, 1, 2]
print(list(r))  # [0, 1, 2]  (works again)

Best practice: If your object represents a collection, prefer returning a new iterator in __iter__. If it's a streaming object (like a generator object), returning self is acceptable.

---

Generator-based iterators: simpler and idiomatic

Generators are often the cleanest way to implement iterators.

Equivalent of Count with a generator:

def count_gen(start, stop):
    current = start
    while current < stop:
        yield current
        current += 1

yield creates an iterator automatically.
Generators also maintain state and raise StopIteration automatically when the function returns.

Line-by-line: initialize current, loop while condition holds, yield current, then advance. Very concise.

Generators are great for most iterator needs — prefer them unless you must control the iterator object, implement complex methods, or interact with __iter__/__next__ semantics explicitly.

---

Use case: Streaming a large file in chunks (with safe cleanup)

Imagine processing a huge binary file in fixed-size chunks to avoid memory spikes. We'll implement an iterator that is also a context manager so file resources are cleaned up automatically.

from typing import Iterator, Optional
class FileChunkReader:
    """
    Iterate over a file in fixed-size chunks. Can be used as:
      with FileChunkReader('data.bin', 4096) as reader:
          for chunk in reader:
              process(chunk)
    """
    def __init__(self, path: str, chunk_size: int = 4096):
        self.path = path
        self.chunk_size = chunk_size
        self._file = None  # type: Optional[IO[bytes]]
    def __enter__(self):
        self._file = open(self.path, 'rb')
        return self
    def __exit__(self, exc_type, exc, tb):
        if self._file:
            self._file.close()
            self._file = None
    def __iter__(self) -> Iterator[bytes]:
        if self._file is None:
            # Allow iteration without context manager by opening lazily
            self._file = open(self.path, 'rb')
        return self
    def __next__(self) -> bytes:
        assert self._file is not None, "File must be open to iterate"
        chunk = self._file.read(self.chunk_size)
        if not chunk:
            # close resource proactively
            self._file.close()
            self._file = None
            raise StopIteration
        return chunk

Explanation and important behaviors:

The class is constructed with the path and chunk size.
It implements __enter__/__exit__ so you can do with FileChunkReader(...) as r: and be sure the file is closed.
__iter__ opens lazily if not already open, enabling both with usage and a direct for loop.
__next__ reads a chunk and raises StopIteration on EOF; it also closes the file when done.

Example usage:

# Preferred: with context manager for deterministic cleanup
with FileChunkReader('large.bin', 1024) as reader:
    for idx, chunk in enumerate(reader):
        print(idx, len(chunk))
Also works without with, but be careful to avoid leaks:
reader = FileChunkReader('large.bin', 1024)
for chunk in reader:
    pass  # file will be closed at EOF, but exceptions may leak file handle

Edge cases:

If an exception occurs inside the for loop and you didn't use with, the file may remain open unless you handle it. That's why pairing iterators that manage resources with context managers is a best practice.

Reference: Using context managers for clean resource management is the idiomatic way to handle files, sockets, DB connections, etc.

---

Use case: Batched database cursor iterator with retries and error handling

A typical pattern is to fetch rows in batches from a DB or API. Here is a simplified example simulating DB pagination with error handling and retry logic.

import time
from typing import Iterator, Callable, Iterable
class BatchIterator:
    """
    Iterate through a producer function that returns lists of items (pages).
    The producer is called repeatedly until it returns an empty list.
    """
    def __init__(self, producer: Callable[[int], Iterable], start_page: int = 0, max_retries: int = 3):
        self.producer = producer
        self.page = start_page
        self.buffer = []
        self.max_retries = max_retries
    def __iter__(self):
        return self
    def __next__(self):
        while not self.buffer:
            retries = 0
            while retries < self.max_retries:
                try:
                    page_items = list(self.producer(self.page))
                    break
                except Exception as exc:
                    retries += 1
                    # Log or handle transient errors, then retry
                    print(f"Producer error on page {self.page}: {exc} - retry {retries}")
                    time.sleep(0.1 * retries)
            else:
                # All retries failed — raise to caller
                raise RuntimeError(f"Failed to fetch page {self.page} after {self.max_retries} retries")
            if not page_items:
                raise StopIteration
            self.buffer.extend(page_items)
            self.page += 1
        return self.buffer.pop(0)

Explanation:

producer(page) simulates a DB/API call returning zero or more items for a page.
BatchIterator keeps a buffer of items from the latest page and yields them one by one.
Includes retry logic with exponential backoff and a clear RuntimeError when retries fail (example of robust error handling).
Caller can handle RuntimeError or let it bubble up — it's explicit and manageable.

This pattern shows combining iterators with Error Handling in Python: Best Practices for Robust Applications — catch known transient exceptions, retry with backoff, and raise clear exceptions when unrecoverable.

---

Asynchronous iterators: leveraging asyncio for efficient I/O-bound operations

For I/O-bound tasks (network, disk, etc.) you should consider async iterators to avoid blocking the event loop. Python supports asynchronous iteration via __aiter__ and __anext__, or with async generators (using async def + yield).

Simple async generator that simulates streaming data from a network connection:

import asyncio
from typing import AsyncIterator
async def async_stream_simulator(n: int) -> AsyncIterator[int]:
    """Simulate an async stream yielding numbers with an I/O delay."""
    for i in range(n):
        # Simulate an I/O delay (e.g., network read)
        await asyncio.sleep(0.1)
        yield i
Usage:
async def main():
    async for item in async_stream_simulator(5):
        print("Got:", item)
To run: asyncio.run(main())

Key points:

async def + yield creates an async generator, which is both an async iterator and an async iterable.
Use async for to consume it; await is used internally to yield control while waiting for I/O.
This pattern is an idiomatic way to leverage Python's asyncio for efficient I/O bound operations: you can process many concurrent streams without blocking.

If you need to build a custom async iterator class:

class AsyncCounter:
    def __init__(self, n):
        self.n = n
        self.i = 0
    def __aiter__(self):
        return self
    async def __anext__(self):
        if self.i >= self.n:
            raise StopAsyncIteration
        await asyncio.sleep(0.05)
        val = self.i
        self.i += 1
        return val

Consume with async for.

Error handling with async iterators:

Use try/except blocks in the async consumer.
If you manage resources (e.g., network connection), combine with asynchronous context managers async with (PEP 492) and implement __aenter__/__aexit__ or use asynccontextmanager.

---

Best Practices

Prefer generators for concise iterators. Use class-based iterators when you need explicit state control or additional methods.
Use context managers (with/__enter__/__exit__, or contextlib.contextmanager) to manage resources and ensure deterministic cleanup.
Handle StopIteration only if you are implementing or manually controlling iteration with next(). Let for loops handle it for you.
For external I/O, prefer asyncio and asynchronous iterators/generators to maximize throughput of I/O-bound apps.
Be explicit in your exception handling: catch expected transient errors, log details, and use retries/backoff where appropriate.
Document whether your iterable is single-use or multi-pass in the class docstring.

---

Common Pitfalls

Returning self from __iter__ means single-use iterator. That may surprise consumers who expect multi-pass iterables.
Forgetting to raise StopIteration properly in __next__.
Resource leaks if you open resources in __iter__/__next__ and don't close them (use context managers).
Trying to reuse generator objects after exhaustion — generators are single-use.
Not considering thread-safety: iterators are usually not thread-safe. For multithreaded consumers, use locks or queue patterns.

---

Advanced Tips

Combine iterators with itertools (islice, chain, tee, groupby) to build powerful lazy pipelines.
For large or slow producers, consider prefetching or buffering strategies — but be careful: prefetching increases memory usage and complexity.
For CPU-bound processing, combine iterators with concurrent.futures.ProcessPoolExecutor for parallelism (iterators are good to feed work lazily).
For async producers, consider asyncio.Queue with producer/consumer coroutines when ordering or backpressure matters.

Example: composing iterators with itertools.islice:

from itertools import islice
def infinite_counter():
    i = 0
    while True:
        yield i
        i += 1
Take first 10 values lazily
first_ten = list(islice(infinite_counter(), 10))

---

Final example: a robust file scanner that yields lines, with retries and async option

Below is a final consolidated pattern showing:

An iterator class for line streaming (sync).
An async generator alternative for non-blocking reads (simulated with sleep).
Uses context manager for safe cleanup and robust error handling.

from contextlib import contextmanager
import asyncio
class LineReader:
    def __init__(self, path):
        self.path = path
        self._f = None
    def __enter__(self):
        self._f = open(self.path, 'r', encoding='utf-8')
        return self
    def __exit__(self, exc_type, exc, tb):
        if self._f:
            self._f.close()
            self._f = None
    def __iter__(self):
        if self._f is None:
            self._f = open(self.path, 'r', encoding='utf-8')
        return self
    def __next__(self):
        assert self._f is not None
        line = self._f.readline()
        if line == '':
            self._f.close()
            self._f = None
            raise StopIteration
        return line.rstrip('\n')
Async alternative (simulate async I/O)
async def async_line_stream(n: int):
    for i in range(n):
        await asyncio.sleep(0.01)  # simulate read
        yield f"line {i}"

Call-to-action: Try the above classes and generators on your own data. Replace the simulated async_line_stream with a real async file reader like aiofiles for production use.

---

Conclusion

Custom iterators are powerful tools:

Use them to express lazy, memory-efficient pipelines.
Prefer generators for most cases; implement class-based iterators when you need more control.
Combine iterators with context managers to manage resources cleanly.
For I/O-bound workloads, leverage asyncio and async iterators/generators.
Apply robust error handling—retry transient errors, and fail with clear exceptions on unrecoverable conditions.

Iterators unlock elegant designs for streaming, batching, and composing complex flows. Try converting one of your eager data-processing routines into an iterator-based pipeline and notice the clarity and memory gains.

---

Implementing a Custom Python Iterator: Patterns, Best Practices, and Real-World Use Cases

Prerequisites

Core Concepts: Iterable vs Iterator

Basic class-based iterator

Iterable that returns a fresh iterator (multi-pass)

Generator-based iterators: simpler and idiomatic

Use case: Streaming a large file in chunks (with safe cleanup)

Also works without with, but be careful to avoid leaks:

Use case: Batched database cursor iterator with retries and error handling

Asynchronous iterators: leveraging asyncio for efficient I/O-bound operations

Usage:

To run: asyncio.run(main())

Best Practices

Common Pitfalls

Advanced Tips

Take first 10 values lazily

Final example: a robust file scanner that yields lines, with retries and async option

Async alternative (simulate async I/O)

Conclusion

Further Reading

Was this article helpful?

Stay Updated with Python Tips

Related Posts