Practical Guide to Python Logging: Best Practices for Debugging and Monitoring Your Applications
Learn how to design reliable, performant logging for real-world Python applications. This guide walks you through core concepts, configuration patterns, and practical examples — from rotating files and structured JSON logs to integrating with Pandas for large-data processing, testing logs with Pytest, and configuring Scrapy spiders — with clear, line-by-line explanations.
Introduction
Why does logging matter? When things go wrong in production, logs are your primary source of truth. Good logs make debugging faster, enable effective monitoring, and power analytics and alerting pipelines. This post is a practical, example-driven guide to Python logging for intermediate developers. You’ll learn how to structure logs, configure handlers, avoid common pitfalls, and scale logging for data-heavy workflows and distributed systems.
We’ll cover:
- Core concepts: loggers, handlers, formatters, levels, and filters.
- Real-world patterns: rotating logs, structured (JSON) logs, async/multiprocess-safe logging.
- Integrations: logging during Pandas/NumPy large-data processing, testing with Pytest, and configuring logs for Scrapy spiders.
- Best practices and advanced tips for observability and performance.
Core Concepts (quick overview)
- Logger: the entry point you call (e.g., logger = logging.getLogger(__name__)).
- Handler: where log records go (console, file, socket).
- Formatter: how the log record is formatted (text, JSON).
- Level: severity (DEBUG, INFO, WARNING, ERROR, CRITICAL).
- Filter: optional additional filtering logic.
Official docs: https://docs.python.org/3/library/logging.html
Step-by-Step Examples
Example 1 — Basic Logger Setup
A minimal, idiomatic logger for a module.
# basic_logger.py
import logging
logger = logging.getLogger(__name__) # 1
logger.setLevel(logging.INFO) # 2
handler = logging.StreamHandler() # 3
handler.setLevel(logging.INFO) # 4
formatter = logging.Formatter(
"%(asctime)s - %(name)s - %(levelname)s - %(message)s"
) # 5
handler.setFormatter(formatter) # 6
logger.addHandler(handler) # 7
def divide(a, b):
logger.debug("divide called with a=%s, b=%s", a, b) # 8
try:
result = a / b
logger.info("division result: %s", result) # 9
return result
except ZeroDivisionError:
logger.exception("Attempted division by zero") # 10
raise
Line-by-line explanation:
- Create a logger named after the module (recommended). Input: None. Output: Logger object.
- Set logger level to INFO — DEBUG messages will be ignored by this logger unless handler levels allow.
- Create a console (stream) handler — sends logs to stdout/stderr.
- Set handler level to INFO.
- Create a formatter specifying timestamp, module name, level, and message.
- Attach the formatter to the handler.
- Add the handler to the logger. Edge case: if you run this module multiple times in a long-lived process, avoid adding duplicate handlers (use checks or configure once).
- Use lazy formatting (pass args, not f-strings) so interpolation only occurs when message will actually be emitted — improves performance.
- Log informative result messages.
- logger.exception logs the stack trace at ERROR level — useful in except blocks.
Call to action: Try this file, import divide from another script and invoke it to see structured console output.
Example 2 — Rotating File Handler (practical for production)
Use RotatingFileHandler to prevent log files growing without bounds.
# rotating_logger.py
import logging
from logging.handlers import RotatingFileHandler
logger = logging.getLogger("myapp")
logger.setLevel(logging.INFO)
1. Rotate when file reaches 5 MB, keep 3 backups
handler = RotatingFileHandler("myapp.log", maxBytes=5 1024 1024, backupCount=3)
handler.setLevel(logging.INFO)
formatter = logging.Formatter("%(asctime)s %(levelname)s %(name)s: %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)
Example usage:
for i in range(1000):
logger.info("Processing item %d", i)
Explain:
- Creating a RotatingFileHandler prevents disk exhaustion by rotating logs at size threshold.
- Inputs: file path, maxBytes, backupCount. Output: managed log files myapp.log, myapp.log.1, etc.
- Edge cases: rotation in multi-process apps can conflict (prefer TimedRotatingFileHandler with process-safe libraries or central logging).
Example 3 — Structured JSON Logging (for central logging/ELK)
Structured logs are machine-parseable and ideal for indexing in ELK or other log stores.
# json_logger.py
import json
import logging
from datetime import datetime
class JsonFormatter(logging.Formatter):
def format(self, record):
payload = {
"timestamp": datetime.utcfromtimestamp(record.created).isoformat() + "Z",
"level": record.levelname,
"logger": record.name,
"message": record.getMessage(),
"module": record.module,
"line": record.lineno,
}
if record.exc_info:
payload["exception"] = self.formatException(record.exc_info)
return json.dumps(payload)
logger = logging.getLogger("service")
logger.setLevel(logging.INFO)
h = logging.StreamHandler()
h.setFormatter(JsonFormatter())
logger.addHandler(h)
logger.info("User created", extra={"user_id": 123})
Line-by-line:
- Define a JsonFormatter subclass; format builds a dict with useful fields.
- Use record.getMessage() to get the final message.
- Include exception stack traces when present.
- Logging call: logger.info("User created", extra={"user_id": 123}) — note: extra adds keys to the LogRecord; you can include them in payload if needed.
Why structured logs? They let you query fields like log.level, user_id, or request_id in a log index.
Example 4 — Logging While Processing Large Datasets with Pandas/NumPy
Common pattern: process large CSVs in chunks and log progress. This integrates with "Efficient Data Processing with Python: Leveraging Pandas and NumPy for Large Datasets".
# data_processing.py
import logging
import pandas as pd
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler())
def process_chunk(df):
# Dummy processing using NumPy/pandas
df['value'] = df['value'] 2 # vectorized operation
return df
def process_large_csv(path):
total_rows = 0
for i, chunk in enumerate(pd.read_csv(path, chunksize=100_000)):
logger.info("Processing chunk %d with %d rows", i, len(chunk))
result = process_chunk(chunk)
total_rows += len(result)
# Write chunk to output, save, or aggregate
logger.info("Processing complete, total rows=%d", total_rows)
Line-by-line:
- Use pandas.read_csv(..., chunksize) to avoid loading the whole dataset — memory efficient.
- Log progress at chunk boundaries — provides visibility without overwhelming logs.
- process_chunk uses vectorized operations (NumPy under the hood) for performance.
Performance tip: Use logger.isEnabledFor(logging.DEBUG) before constructing expensive debug messages (e.g., serializing a DataFrame sample).
Example 5 — Testing Logging with Pytest (Advanced Testing Techniques)
Pytest provides caplog to capture logs during tests.
# test_logging.py
import logging
from mymodule import divide # from Example 1
def test_divide_by_zero_logs_exception(caplog):
caplog.set_level(logging.ERROR)
with caplog.at_level(logging.ERROR):
try:
divide(1, 0)
except ZeroDivisionError:
pass
# Assert that an error log with "division by zero" was emitted
assert any("division by zero" in rec.message.lower() or "division" in rec.message.lower() for rec in caplog.records)
Explain:
- caplog fixture captures logs; set_level controls which levels are captured.
- The test ensures divide logs an exception on ZeroDivisionError.
- Edge cases: If your code configures loggers at import time, test isolation may be needed (reset logger handlers between tests).
Example 6 — Logging in Scrapy (Developing a Python Web Scraper with Scrapy)
Scrapy uses Python logging; you can set settings to control verbosity and log format.
Scrapy settings snippet (settings.py):
# settings.py (Scrapy)
LOG_LEVEL = 'INFO' # 1
LOG_STDOUT = False # 2
LOG_FILE = 'scrapy_run.log' # 3
Inside your spider:
# myspider.py
import logging
from scrapy import Spider
logger = logging.getLogger(__name__)
class MySpider(Spider):
name = "myspider"
def parse(self, response):
count = len(response.css("a"))
logger.info("Found %d links on %s", count, response.url)
# ... parsing logic ...
Explain:
- Set LOG_LEVEL globally for the crawl.
- LOG_STDOUT controls whether prints are redirected to the log.
- LOG_FILE writes logs to a file.
Example 7 — Multiprocessing-Safe Logging
In multi-process apps, file handlers can clash. Use QueueHandler and QueueListener.
# mp_logging.py
import logging
import logging.handlers
from multiprocessing import Process, Queue
def worker(queue, idx):
logger = logging.getLogger(f"worker-{idx}")
qh = logging.handlers.QueueHandler(queue)
logger.addHandler(qh)
logger.setLevel(logging.INFO)
logger.info("Worker started")
def listener_configurer(logfile):
root = logging.getLogger()
fh = logging.FileHandler(logfile)
formatter = logging.Formatter("%(asctime)s %(name)s %(levelname)s: %(message)s")
fh.setFormatter(formatter)
root.addHandler(fh)
def listener_process(queue, logfile):
listener_configurer(logfile)
listener = logging.handlers.QueueListener(queue, logging.getLogger().handlers)
listener.start()
return listener
if __name__ == "__main__":
q = Queue(-1)
listener = listener_process(q, "mp.log")
ps = [Process(target=worker, args=(q, i)) for i in range(4)]
for p in ps:
p.start()
for p in ps:
p.join()
# Stop listener (in real code, signal to stop)
Explain:
- Workers push LogRecords into a multiprocessing.Queue via QueueHandler.
- A listening process pulls records and writes them via a FileHandler — avoids file locking issues.
Best Practices
- Use logger = logging.getLogger(__name__) inside modules — helps filter by module.
- Prefer lazy interpolation: logger.debug("Value: %s", expensive()) — prevents unnecessary work.
- Avoid logging secrets (API keys, passwords). Sanitize logs.
- Set appropriate log levels and adjust in production via configuration — don't log DEBUG in high-throughput production.
- Use structured logging for production (JSON). Tools: python-json-logger, structlog.
- Correlate requests with request IDs or trace IDs for distributed tracing.
- Combine logs with metrics: emit counters/timers to Prometheus in addition to logs.
- For data-heavy jobs (Pandas/NumPy), log coarse-grained progress (per chunk), sample data carefully, and never log entire DataFrames unless for debugging.
- For tests, use Pytest caplog to assert logging behavior and to ensure logs don't leak.
Common Pitfalls
- Double logging: configuring logging in multiple modules adds duplicate handlers — configure logging once in entrypoint.
- Using f-strings inside logger calls (logger.debug(f"... {x}")) defeats lazy evaluation.
- Writing logs to disk without rotation leads to disk full.
- Logging heavy objects (DataFrames/Numpy arrays) without serialization can crash JSON formatters.
- Ignoring performance: logging in tight loops without checks can degrade throughput.
- Bad: logger.debug(f"Result: {compute()}") — compute() runs even if DEBUG is off.
- Good: logger.debug("Result: %s", compute()) or if compute() is expensive:
Advanced Tips
- Use logging configuration files or dictConfig for reproducible setups — great for complex apps.
- Integrate logs with APM (OpenTelemetry), ELK, Fluentd, or Graylog for central analysis.
- Use tracing correlation (OpenTelemetry trace_id) to link logs with traces.
- Consider third-party libraries: structlog (for structured, composable logs), python-json-logger, or loguru (developer-friendly).
- For async frameworks (asyncio), consider non-blocking handlers or offload heavy formatting to a background thread.
# config_logging.py
from logging.config import dictConfig
dictConfig({
"version": 1,
"disable_existing_loggers": False,
"formatters": {
"default": {"format": "%(asctime)s %(levelname)s %(name)s: %(message)s"}
},
"handlers": {
"console": {"class": "logging.StreamHandler", "formatter": "default", "level": "INFO"}
},
"root": {"handlers": ["console"], "level": "INFO"}
})
Conclusion
Logging is an essential skill: with the right setup you’ll reduce debugging time, improve incident response, and create logs that can feed into metrics and analytics systems. Start with a sensible default config, prefer structured logs for production, and always pay attention to performance and security (no secrets in logs).
Try the examples:
- Configure a rotating log for your app.
- Add JSON structured logs and ingest a sample into a local ELK stack or Kibana.
- Use Pandas chunked reading with logging for your next large dataset.
- Add tests using Pytest caplog to assert error logging behavior.
Further Reading
- Python logging docs: https://docs.python.org/3/library/logging.html
- Logging Cookbook: https://docs.python.org/3/howto/logging-cookbook.html
- structlog: https://www.structlog.org
- python-json-logger: https://github.com/madzak/python-json-logger
- Scrapy logging documentation: https://docs.scrapy.org/en/latest/topics/logging.html
- Pytest caplog: https://docs.pytest.org/en/stable/how-to/capture-logging.html
Was this article helpful?
Your feedback helps us improve our content. Thank you!