Practical Guide to Python Logging: Best Practices for Debugging and Monitoring Your Applications

Practical Guide to Python Logging: Best Practices for Debugging and Monitoring Your Applications

November 06, 202510 min read8 viewsPractical Guide to Python Logging: Best Practices for Debugging and Monitoring Your Applications

Learn how to design reliable, performant logging for real-world Python applications. This guide walks you through core concepts, configuration patterns, and practical examples — from rotating files and structured JSON logs to integrating with Pandas for large-data processing, testing logs with Pytest, and configuring Scrapy spiders — with clear, line-by-line explanations.

Introduction

Why does logging matter? When things go wrong in production, logs are your primary source of truth. Good logs make debugging faster, enable effective monitoring, and power analytics and alerting pipelines. This post is a practical, example-driven guide to Python logging for intermediate developers. You’ll learn how to structure logs, configure handlers, avoid common pitfalls, and scale logging for data-heavy workflows and distributed systems.

We’ll cover:

  • Core concepts: loggers, handlers, formatters, levels, and filters.
  • Real-world patterns: rotating logs, structured (JSON) logs, async/multiprocess-safe logging.
  • Integrations: logging during Pandas/NumPy large-data processing, testing with Pytest, and configuring logs for Scrapy spiders.
  • Best practices and advanced tips for observability and performance.
Prerequisites: Python 3.x, basic familiarity with modules and package structure. Optional: Pandas, NumPy, Pytest, Scrapy for the integration examples.

Core Concepts (quick overview)

  • Logger: the entry point you call (e.g., logger = logging.getLogger(__name__)).
  • Handler: where log records go (console, file, socket).
  • Formatter: how the log record is formatted (text, JSON).
  • Level: severity (DEBUG, INFO, WARNING, ERROR, CRITICAL).
  • Filter: optional additional filtering logic.
Analogy: Think of a logger as a faucet, handlers as pipes to sinks, and formatters as the labels printed on the water droplets.

Official docs: https://docs.python.org/3/library/logging.html

Step-by-Step Examples

Example 1 — Basic Logger Setup

A minimal, idiomatic logger for a module.

# basic_logger.py
import logging

logger = logging.getLogger(__name__) # 1 logger.setLevel(logging.INFO) # 2

handler = logging.StreamHandler() # 3 handler.setLevel(logging.INFO) # 4

formatter = logging.Formatter( "%(asctime)s - %(name)s - %(levelname)s - %(message)s" ) # 5 handler.setFormatter(formatter) # 6

logger.addHandler(handler) # 7

def divide(a, b): logger.debug("divide called with a=%s, b=%s", a, b) # 8 try: result = a / b logger.info("division result: %s", result) # 9 return result except ZeroDivisionError: logger.exception("Attempted division by zero") # 10 raise

Line-by-line explanation:

  1. Create a logger named after the module (recommended). Input: None. Output: Logger object.
  2. Set logger level to INFO — DEBUG messages will be ignored by this logger unless handler levels allow.
  3. Create a console (stream) handler — sends logs to stdout/stderr.
  4. Set handler level to INFO.
  5. Create a formatter specifying timestamp, module name, level, and message.
  6. Attach the formatter to the handler.
  7. Add the handler to the logger. Edge case: if you run this module multiple times in a long-lived process, avoid adding duplicate handlers (use checks or configure once).
  8. Use lazy formatting (pass args, not f-strings) so interpolation only occurs when message will actually be emitted — improves performance.
  9. Log informative result messages.
  10. logger.exception logs the stack trace at ERROR level — useful in except blocks.
Why lazy formatting? logger.debug("x=%s", expensive()) will call expensive() regardless. Instead pass the raw value or a cheap repr; or guard with if logger.isEnabledFor(logging.DEBUG).

Call to action: Try this file, import divide from another script and invoke it to see structured console output.

Example 2 — Rotating File Handler (practical for production)

Use RotatingFileHandler to prevent log files growing without bounds.

# rotating_logger.py
import logging
from logging.handlers import RotatingFileHandler

logger = logging.getLogger("myapp") logger.setLevel(logging.INFO)

1. Rotate when file reaches 5 MB, keep 3 backups

handler = RotatingFileHandler("myapp.log", maxBytes=5 1024 1024, backupCount=3) handler.setLevel(logging.INFO) formatter = logging.Formatter("%(asctime)s %(levelname)s %(name)s: %(message)s") handler.setFormatter(formatter) logger.addHandler(handler)

Example usage:

for i in range(1000): logger.info("Processing item %d", i)

Explain:

  • Creating a RotatingFileHandler prevents disk exhaustion by rotating logs at size threshold.
  • Inputs: file path, maxBytes, backupCount. Output: managed log files myapp.log, myapp.log.1, etc.
  • Edge cases: rotation in multi-process apps can conflict (prefer TimedRotatingFileHandler with process-safe libraries or central logging).

Example 3 — Structured JSON Logging (for central logging/ELK)

Structured logs are machine-parseable and ideal for indexing in ELK or other log stores.

# json_logger.py
import json
import logging
from datetime import datetime

class JsonFormatter(logging.Formatter): def format(self, record): payload = { "timestamp": datetime.utcfromtimestamp(record.created).isoformat() + "Z", "level": record.levelname, "logger": record.name, "message": record.getMessage(), "module": record.module, "line": record.lineno, } if record.exc_info: payload["exception"] = self.formatException(record.exc_info) return json.dumps(payload)

logger = logging.getLogger("service") logger.setLevel(logging.INFO) h = logging.StreamHandler() h.setFormatter(JsonFormatter()) logger.addHandler(h)

logger.info("User created", extra={"user_id": 123})

Line-by-line:

  • Define a JsonFormatter subclass; format builds a dict with useful fields.
  • Use record.getMessage() to get the final message.
  • Include exception stack traces when present.
  • Logging call: logger.info("User created", extra={"user_id": 123}) — note: extra adds keys to the LogRecord; you can include them in payload if needed.
Edge cases: JSON must be serializable — avoid passing numpy arrays or pandas DataFrame directly. Convert to primitive types or strings.

Why structured logs? They let you query fields like log.level, user_id, or request_id in a log index.

Example 4 — Logging While Processing Large Datasets with Pandas/NumPy

Common pattern: process large CSVs in chunks and log progress. This integrates with "Efficient Data Processing with Python: Leveraging Pandas and NumPy for Large Datasets".

# data_processing.py
import logging
import pandas as pd

logger = logging.getLogger(__name__) logger.setLevel(logging.INFO) logger.addHandler(logging.StreamHandler())

def process_chunk(df): # Dummy processing using NumPy/pandas df['value'] = df['value'] 2 # vectorized operation return df

def process_large_csv(path): total_rows = 0 for i, chunk in enumerate(pd.read_csv(path, chunksize=100_000)): logger.info("Processing chunk %d with %d rows", i, len(chunk)) result = process_chunk(chunk) total_rows += len(result) # Write chunk to output, save, or aggregate logger.info("Processing complete, total rows=%d", total_rows)

Line-by-line:

  • Use pandas.read_csv(..., chunksize) to avoid loading the whole dataset — memory efficient.
  • Log progress at chunk boundaries — provides visibility without overwhelming logs.
  • process_chunk uses vectorized operations (NumPy under the hood) for performance.
Edge cases: If you need to log occasional samples of rows, don't log entire DataFrames on every chunk (too verbose and heavy).

Performance tip: Use logger.isEnabledFor(logging.DEBUG) before constructing expensive debug messages (e.g., serializing a DataFrame sample).

Example 5 — Testing Logging with Pytest (Advanced Testing Techniques)

Pytest provides caplog to capture logs during tests.

# test_logging.py
import logging
from mymodule import divide  # from Example 1

def test_divide_by_zero_logs_exception(caplog): caplog.set_level(logging.ERROR) with caplog.at_level(logging.ERROR): try: divide(1, 0) except ZeroDivisionError: pass # Assert that an error log with "division by zero" was emitted assert any("division by zero" in rec.message.lower() or "division" in rec.message.lower() for rec in caplog.records)

Explain:

  • caplog fixture captures logs; set_level controls which levels are captured.
  • The test ensures divide logs an exception on ZeroDivisionError.
  • Edge cases: If your code configures loggers at import time, test isolation may be needed (reset logger handlers between tests).
Tip: Combine Pytest's caplog with assertions on structured logs (JSON) by parsing the logged JSON.

Example 6 — Logging in Scrapy (Developing a Python Web Scraper with Scrapy)

Scrapy uses Python logging; you can set settings to control verbosity and log format.

Scrapy settings snippet (settings.py):

# settings.py (Scrapy)
LOG_LEVEL = 'INFO'              # 1
LOG_STDOUT = False              # 2
LOG_FILE = 'scrapy_run.log'     # 3

Inside your spider:

# myspider.py
import logging
from scrapy import Spider

logger = logging.getLogger(__name__)

class MySpider(Spider): name = "myspider"

def parse(self, response): count = len(response.css("a")) logger.info("Found %d links on %s", count, response.url) # ... parsing logic ...

Explain:

  1. Set LOG_LEVEL globally for the crawl.
  2. LOG_STDOUT controls whether prints are redirected to the log.
  3. LOG_FILE writes logs to a file.
Edge cases: Scrapy logs can be noisy — tune LOG_LEVEL and use module-specific loggers.

Example 7 — Multiprocessing-Safe Logging

In multi-process apps, file handlers can clash. Use QueueHandler and QueueListener.

# mp_logging.py
import logging
import logging.handlers
from multiprocessing import Process, Queue

def worker(queue, idx): logger = logging.getLogger(f"worker-{idx}") qh = logging.handlers.QueueHandler(queue) logger.addHandler(qh) logger.setLevel(logging.INFO) logger.info("Worker started")

def listener_configurer(logfile): root = logging.getLogger() fh = logging.FileHandler(logfile) formatter = logging.Formatter("%(asctime)s %(name)s %(levelname)s: %(message)s") fh.setFormatter(formatter) root.addHandler(fh)

def listener_process(queue, logfile): listener_configurer(logfile) listener = logging.handlers.QueueListener(queue, logging.getLogger().handlers) listener.start() return listener

if __name__ == "__main__": q = Queue(-1) listener = listener_process(q, "mp.log") ps = [Process(target=worker, args=(q, i)) for i in range(4)] for p in ps: p.start() for p in ps: p.join() # Stop listener (in real code, signal to stop)

Explain:

  • Workers push LogRecords into a multiprocessing.Queue via QueueHandler.
  • A listening process pulls records and writes them via a FileHandler — avoids file locking issues.
Edge cases: Need a clean shutdown protocol (put a sentinel record to stop listener).

Best Practices

  • Use logger = logging.getLogger(__name__) inside modules — helps filter by module.
  • Prefer lazy interpolation: logger.debug("Value: %s", expensive()) — prevents unnecessary work.
  • Avoid logging secrets (API keys, passwords). Sanitize logs.
  • Set appropriate log levels and adjust in production via configuration — don't log DEBUG in high-throughput production.
  • Use structured logging for production (JSON). Tools: python-json-logger, structlog.
  • Correlate requests with request IDs or trace IDs for distributed tracing.
  • Combine logs with metrics: emit counters/timers to Prometheus in addition to logs.
  • For data-heavy jobs (Pandas/NumPy), log coarse-grained progress (per chunk), sample data carefully, and never log entire DataFrames unless for debugging.
  • For tests, use Pytest caplog to assert logging behavior and to ensure logs don't leak.

Common Pitfalls

  • Double logging: configuring logging in multiple modules adds duplicate handlers — configure logging once in entrypoint.
  • Using f-strings inside logger calls (logger.debug(f"... {x}")) defeats lazy evaluation.
  • Writing logs to disk without rotation leads to disk full.
  • Logging heavy objects (DataFrames/Numpy arrays) without serialization can crash JSON formatters.
  • Ignoring performance: logging in tight loops without checks can degrade throughput.
Quick fix for f-strings problem:
  • Bad: logger.debug(f"Result: {compute()}") — compute() runs even if DEBUG is off.
  • Good: logger.debug("Result: %s", compute()) or if compute() is expensive:
if logger.isEnabledFor(logging.DEBUG): logger.debug("Result: %s", compute())

Advanced Tips

  • Use logging configuration files or dictConfig for reproducible setups — great for complex apps.
  • Integrate logs with APM (OpenTelemetry), ELK, Fluentd, or Graylog for central analysis.
  • Use tracing correlation (OpenTelemetry trace_id) to link logs with traces.
  • Consider third-party libraries: structlog (for structured, composable logs), python-json-logger, or loguru (developer-friendly).
  • For async frameworks (asyncio), consider non-blocking handlers or offload heavy formatting to a background thread.
Example: dictConfig skeleton (one-liner to reconfigure app-wide):
# config_logging.py
from logging.config import dictConfig

dictConfig({ "version": 1, "disable_existing_loggers": False, "formatters": { "default": {"format": "%(asctime)s %(levelname)s %(name)s: %(message)s"} }, "handlers": { "console": {"class": "logging.StreamHandler", "formatter": "default", "level": "INFO"} }, "root": {"handlers": ["console"], "level": "INFO"} })

Conclusion

Logging is an essential skill: with the right setup you’ll reduce debugging time, improve incident response, and create logs that can feed into metrics and analytics systems. Start with a sensible default config, prefer structured logs for production, and always pay attention to performance and security (no secrets in logs).

Try the examples:

  • Configure a rotating log for your app.
  • Add JSON structured logs and ingest a sample into a local ELK stack or Kibana.
  • Use Pandas chunked reading with logging for your next large dataset.
  • Add tests using Pytest caplog to assert error logging behavior.

Further Reading

If you enjoyed this guide, try instrumenting one of your services with structured logging and send a sample to a log aggregator — then write a small Pytest that asserts a key event was logged. Happy logging!

Was this article helpful?

Your feedback helps us improve our content. Thank you!

Stay Updated with Python Tips

Get weekly Python tutorials and best practices delivered to your inbox

We respect your privacy. Unsubscribe at any time.

Related Posts

Creating Asynchronous Web Applications with Python and Flask-SocketIO — Real-Time Apps, Scaling, and Best Practices

Learn how to build responsive, scalable asynchronous web applications using Python and **Flask-SocketIO**. This guide walks you through core concepts, practical examples (server and client), background tasks, scaling with Redis, and integrations with SQLAlchemy, Plotly Dash, and automation tools like Selenium/Beautiful Soup.

Leveraging Python's Built-in HTTP Client for Efficient API Interactions: Patterns with Validation, Logging, and Parallelism

Learn how to use Python's built-in HTTP client libraries to build efficient, robust API clients. This post walks through practical examples—GET/POST requests, persistent connections, streaming, retries, response validation with Pydantic, custom logging, and parallel requests with multiprocessing—so you can interact with APIs reliably in production.

Mastering Python Automation: Practical Examples with Selenium and Beautiful Soup

Dive into the world of Python automation and unlock the power to streamline repetitive tasks with Selenium for web browser control and Beautiful Soup for effortless web scraping. This comprehensive guide offers intermediate learners step-by-step examples, from scraping dynamic websites to automating form submissions, complete with code snippets and best practices. Whether you're looking to boost productivity or gather data efficiently, you'll gain actionable insights to elevate your Python skills and tackle real-world automation challenges.