Practical Guide to Python Logging: Best Practices for...

Introduction

Why does logging matter? When things go wrong in production, logs are your primary source of truth. Good logs make debugging faster, enable effective monitoring, and power analytics and alerting pipelines. This post is a practical, example-driven guide to Python logging for intermediate developers. You’ll learn how to structure logs, configure handlers, avoid common pitfalls, and scale logging for data-heavy workflows and distributed systems.

We’ll cover:

Core concepts: loggers, handlers, formatters, levels, and filters.
Real-world patterns: rotating logs, structured (JSON) logs, async/multiprocess-safe logging.
Integrations: logging during Pandas/NumPy large-data processing, testing with Pytest, and configuring logs for Scrapy spiders.
Best practices and advanced tips for observability and performance.

Prerequisites: Python 3.x, basic familiarity with modules and package structure. Optional: Pandas, NumPy, Pytest, Scrapy for the integration examples.

Core Concepts (quick overview)

Logger: the entry point you call (e.g., logger = logging.getLogger(__name__)).
Handler: where log records go (console, file, socket).
Formatter: how the log record is formatted (text, JSON).
Level: severity (DEBUG, INFO, WARNING, ERROR, CRITICAL).
Filter: optional additional filtering logic.

Analogy: Think of a logger as a faucet, handlers as pipes to sinks, and formatters as the labels printed on the water droplets.

Official docs: https://docs.python.org/3/library/logging.html

Step-by-Step Examples

Example 1 — Basic Logger Setup

A minimal, idiomatic logger for a module.

# basic_logger.py
import logging
logger = logging.getLogger(__name__)  # 1
logger.setLevel(logging.INFO)         # 2
handler = logging.StreamHandler()     # 3
handler.setLevel(logging.INFO)        # 4
formatter = logging.Formatter(
    "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)                                      # 5
handler.setFormatter(formatter)       # 6
logger.addHandler(handler)            # 7
def divide(a, b):
    logger.debug("divide called with a=%s, b=%s", a, b)  # 8
    try:
        result = a / b
        logger.info("division result: %s", result)       # 9
        return result
    except ZeroDivisionError:
        logger.exception("Attempted division by zero")    # 10
        raise

Line-by-line explanation:

Create a logger named after the module (recommended). Input: None. Output: Logger object.
Set logger level to INFO — DEBUG messages will be ignored by this logger unless handler levels allow.
Create a console (stream) handler — sends logs to stdout/stderr.
Set handler level to INFO.
Create a formatter specifying timestamp, module name, level, and message.
Attach the formatter to the handler.
Add the handler to the logger. Edge case: if you run this module multiple times in a long-lived process, avoid adding duplicate handlers (use checks or configure once).
Use lazy formatting (pass args, not f-strings) so interpolation only occurs when message will actually be emitted — improves performance.
Log informative result messages.
logger.exception logs the stack trace at ERROR level — useful in except blocks.

Why lazy formatting? logger.debug("x=%s", expensive()) will call expensive() regardless. Instead pass the raw value or a cheap repr; or guard with if logger.isEnabledFor(logging.DEBUG).

Call to action: Try this file, import divide from another script and invoke it to see structured console output.

Example 2 — Rotating File Handler (practical for production)

Use RotatingFileHandler to prevent log files growing without bounds.

# rotating_logger.py
import logging
from logging.handlers import RotatingFileHandler
logger = logging.getLogger("myapp")
logger.setLevel(logging.INFO)
1. Rotate when file reaches 5 MB, keep 3 backups
handler = RotatingFileHandler("myapp.log", maxBytes=5  1024  1024, backupCount=3)
handler.setLevel(logging.INFO)
formatter = logging.Formatter("%(asctime)s %(levelname)s %(name)s: %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)
Example usage:
for i in range(1000):
    logger.info("Processing item %d", i)

Explain:

Creating a RotatingFileHandler prevents disk exhaustion by rotating logs at size threshold.
Inputs: file path, maxBytes, backupCount. Output: managed log files myapp.log, myapp.log.1, etc.
Edge cases: rotation in multi-process apps can conflict (prefer TimedRotatingFileHandler with process-safe libraries or central logging).

Example 3 — Structured JSON Logging (for central logging/ELK)

Structured logs are machine-parseable and ideal for indexing in ELK or other log stores.

# json_logger.py
import json
import logging
from datetime import datetime
class JsonFormatter(logging.Formatter):
    def format(self, record):
        payload = {
            "timestamp": datetime.utcfromtimestamp(record.created).isoformat() + "Z",
            "level": record.levelname,
            "logger": record.name,
            "message": record.getMessage(),
            "module": record.module,
            "line": record.lineno,
        }
        if record.exc_info:
            payload["exception"] = self.formatException(record.exc_info)
        return json.dumps(payload)
logger = logging.getLogger("service")
logger.setLevel(logging.INFO)
h = logging.StreamHandler()
h.setFormatter(JsonFormatter())
logger.addHandler(h)
logger.info("User created", extra={"user_id": 123})

Line-by-line:

Define a JsonFormatter subclass; format builds a dict with useful fields.
Use record.getMessage() to get the final message.
Include exception stack traces when present.
Logging call: logger.info("User created", extra={"user_id": 123}) — note: extra adds keys to the LogRecord; you can include them in payload if needed.

Edge cases: JSON must be serializable — avoid passing numpy arrays or pandas DataFrame directly. Convert to primitive types or strings.

Why structured logs? They let you query fields like log.level, user_id, or request_id in a log index.

Example 4 — Logging While Processing Large Datasets with Pandas/NumPy

Common pattern: process large CSVs in chunks and log progress. This integrates with "Efficient Data Processing with Python: Leveraging Pandas and NumPy for Large Datasets".

# data_processing.py
import logging
import pandas as pd
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler())
def process_chunk(df):
    # Dummy processing using NumPy/pandas
    df['value'] = df['value']  2  # vectorized operation
    return df

def process_large_csv(path):
    total_rows = 0
    for i, chunk in enumerate(pd.read_csv(path, chunksize=100_000)):
        logger.info("Processing chunk %d with %d rows", i, len(chunk))
        result = process_chunk(chunk)
        total_rows += len(result)
        # Write chunk to output, save, or aggregate
    logger.info("Processing complete, total rows=%d", total_rows)

Line-by-line:

Use pandas.read_csv(..., chunksize) to avoid loading the whole dataset — memory efficient.

Log progress at chunk boundaries — provides visibility without overwhelming logs.

process_chunk uses vectorized operations (NumPy under the hood) for performance.

Edge cases: If you need to log occasional samples of rows, don't log entire DataFrames on every chunk (too verbose and heavy).

Performance tip: Use logger.isEnabledFor(logging.DEBUG) before constructing expensive debug messages (e.g., serializing a DataFrame sample).

Example 5 — Testing Logging with Pytest (Advanced Testing Techniques)

Pytest provides caplog to capture logs during tests.

# test_logging.py import logging from mymodule import divide # from Example 1
def test_divide_by_zero_logs_exception(caplog): caplog.set_level(logging.ERROR) with caplog.at_level(logging.ERROR): try: divide(1, 0) except ZeroDivisionError: pass # Assert that an error log with "division by zero" was emitted assert any("division by zero" in rec.message.lower() or "division" in rec.message.lower() for rec in caplog.records)

Explain:

caplog fixture captures logs; set_level controls which levels are captured.

The test ensures divide logs an exception on ZeroDivisionError.

Edge cases: If your code configures loggers at import time, test isolation may be needed (reset logger handlers between tests).

Tip: Combine Pytest's caplog with assertions on structured logs (JSON) by parsing the logged JSON.

Example 6 — Logging in Scrapy (Developing a Python Web Scraper with Scrapy)

Scrapy uses Python logging; you can set settings to control verbosity and log format.

Scrapy settings snippet (settings.py):

# settings.py (Scrapy) LOG_LEVEL = 'INFO' # 1 LOG_STDOUT = False # 2 LOG_FILE = 'scrapy_run.log' # 3

Inside your spider:

# myspider.py import logging from scrapy import Spider logger = logging.getLogger(__name__) class MySpider(Spider): name = "myspider"
def parse(self, response): count = len(response.css("a")) logger.info("Found %d links on %s", count, response.url) # ... parsing logic ...

Explain:

Set LOG_LEVEL globally for the crawl.

LOG_STDOUT controls whether prints are redirected to the log.

LOG_FILE writes logs to a file.

Edge cases: Scrapy logs can be noisy — tune LOG_LEVEL and use module-specific loggers.

Example 7 — Multiprocessing-Safe Logging

In multi-process apps, file handlers can clash. Use QueueHandler and QueueListener.

# mp_logging.py
import logging
import logging.handlers
from multiprocessing import Process, Queue
def worker(queue, idx):
    logger = logging.getLogger(f"worker-{idx}")
    qh = logging.handlers.QueueHandler(queue)
    logger.addHandler(qh)
    logger.setLevel(logging.INFO)
    logger.info("Worker started")
def listener_configurer(logfile):
    root = logging.getLogger()
    fh = logging.FileHandler(logfile)
    formatter = logging.Formatter("%(asctime)s %(name)s %(levelname)s: %(message)s")
    fh.setFormatter(formatter)
    root.addHandler(fh)
def listener_process(queue, logfile):
    listener_configurer(logfile)
    listener = logging.handlers.QueueListener(queue, logging.getLogger().handlers)
    listener.start()
    return listener
if __name__ == "__main__":
    q = Queue(-1)
    listener = listener_process(q, "mp.log")
    ps = [Process(target=worker, args=(q, i)) for i in range(4)]
    for p in ps:
        p.start()
    for p in ps:
        p.join()
    # Stop listener (in real code, signal to stop)

Explain:

Workers push LogRecords into a multiprocessing.Queue via QueueHandler.
A listening process pulls records and writes them via a FileHandler — avoids file locking issues.

Edge cases: Need a clean shutdown protocol (put a sentinel record to stop listener).

Best Practices

Use logger = logging.getLogger(__name__) inside modules — helps filter by module.
Prefer lazy interpolation: logger.debug("Value: %s", expensive()) — prevents unnecessary work.
Avoid logging secrets (API keys, passwords). Sanitize logs.
Set appropriate log levels and adjust in production via configuration — don't log DEBUG in high-throughput production.
Use structured logging for production (JSON). Tools: python-json-logger, structlog.
Correlate requests with request IDs or trace IDs for distributed tracing.
Combine logs with metrics: emit counters/timers to Prometheus in addition to logs.
For data-heavy jobs (Pandas/NumPy), log coarse-grained progress (per chunk), sample data carefully, and never log entire DataFrames unless for debugging.
For tests, use Pytest caplog to assert logging behavior and to ensure logs don't leak.

Common Pitfalls

Double logging: configuring logging in multiple modules adds duplicate handlers — configure logging once in entrypoint.
Using f-strings inside logger calls (logger.debug(f"... {x}")) defeats lazy evaluation.
Writing logs to disk without rotation leads to disk full.
Logging heavy objects (DataFrames/Numpy arrays) without serialization can crash JSON formatters.
Ignoring performance: logging in tight loops without checks can degrade throughput.

Quick fix for f-strings problem:

Bad: logger.debug(f"Result: {compute()}") — compute() runs even if DEBUG is off.
Good: logger.debug("Result: %s", compute()) or if compute() is expensive:

if logger.isEnabledFor(logging.DEBUG): logger.debug("Result: %s", compute())

Advanced Tips

Use logging configuration files or dictConfig for reproducible setups — great for complex apps.
Integrate logs with APM (OpenTelemetry), ELK, Fluentd, or Graylog for central analysis.
Use tracing correlation (OpenTelemetry trace_id) to link logs with traces.
Consider third-party libraries: structlog (for structured, composable logs), python-json-logger, or loguru (developer-friendly).
For async frameworks (asyncio), consider non-blocking handlers or offload heavy formatting to a background thread.

Example: dictConfig skeleton (one-liner to reconfigure app-wide):

# config_logging.py
from logging.config import dictConfig
dictConfig({
    "version": 1,
    "disable_existing_loggers": False,
    "formatters": {
        "default": {"format": "%(asctime)s %(levelname)s %(name)s: %(message)s"}
    },
    "handlers": {
        "console": {"class": "logging.StreamHandler", "formatter": "default", "level": "INFO"}
    },
    "root": {"handlers": ["console"], "level": "INFO"}
})

Conclusion

Logging is an essential skill: with the right setup you’ll reduce debugging time, improve incident response, and create logs that can feed into metrics and analytics systems. Start with a sensible default config, prefer structured logs for production, and always pay attention to performance and security (no secrets in logs).

Try the examples:

Configure a rotating log for your app.
Add JSON structured logs and ingest a sample into a local ELK stack or Kibana.
Use Pandas chunked reading with logging for your next large dataset.
Add tests using Pytest caplog to assert error logging behavior.

Practical Guide to Python Logging: Best Practices for Debugging and Monitoring Your Applications

Introduction

Core Concepts (quick overview)

Step-by-Step Examples

Example 1 — Basic Logger Setup

Example 2 — Rotating File Handler (practical for production)

1. Rotate when file reaches 5 MB, keep 3 backups

Example usage:

Example 3 — Structured JSON Logging (for central logging/ELK)

Example 4 — Logging While Processing Large Datasets with Pandas/NumPy

Example 5 — Testing Logging with Pytest (Advanced Testing Techniques)

Example 6 — Logging in Scrapy (Developing a Python Web Scraper with Scrapy)

Example 7 — Multiprocessing-Safe Logging

Best Practices

Common Pitfalls

Advanced Tips

Conclusion

Further Reading

Was this article helpful?

Stay Updated with Python Tips

Related Posts