
Creating a Python CLI Tool: Best Practices for User Input and Output Handling
Command-line tools remain essential for automation, ETL tasks, and developer workflows. This guide walks intermediate Python developers through building robust CLI tools with practical examples, covering input parsing, I/O patterns, error handling, logging, packaging, and Docker deployment. Learn best practices and real-world patterns to make your CLI reliable, user-friendly, and production-ready.
Introduction
Command-line interfaces (CLIs) are the glue of automation: they trigger data pipelines, run tests, orchestrate deployments, and serve as the backbone for many developer workflows. But building a reliable, user-friendly CLI requires thoughtful handling of user input and program output, careful error handling, and sensible defaults.
Have you ever written a script that works locally but fails when users pipe input into it, or when it runs inside a Docker container with different locale settings? This post teaches the practical patterns and best practices to avoid those pitfalls.
We'll cover:
- Core concepts and prerequisites
- Practical, working examples using argparse and click
- Patterns for stdin/stdout/stderr handling, logging and exit codes
- Error handling with custom exceptions and validation
- Packaging and running your CLI in Docker
- Real-world use-case: invoking ETL processes from a CLI
Prerequisites
You should be comfortable with:
- Python 3.7+ (3.11 recommended)
- Basic standard library modules (argparse, logging, sys, subprocess)
- Virtual environments and pip
- click (for richer CLI UX)
- Docker (for containerization)
Core Concepts
Before writing code, let's break down the problem:
- Input sources: command-line arguments, environment variables, config files, stdin (piped data), interactive prompts.
- Output sinks: stdout (normal output), stderr (diagnostics), log files, structured outputs (JSON/CSV).
- Exit codes: 0 for success, non-zero for different error classes.
- Error handling: validate early, raise meaningful exceptions, expose helpful messages to users.
- UX: helpful --help, sensible defaults, progress indication, verbosity flags.
Step-by-Step Examples
We'll build a small real-world tool: etl-cli
— a simplified CLI to kick off ETL jobs. It accepts commands (extract, transform, load), can read input from a file or stdin, outputs results to stdout or file, and supports verbosity and dry-run.
Minimal approach: argparse and functions
First, a straightforward implementation using argparse.
# etl_argparse.py
import argparse
import sys
import json
import logging
logger = logging.getLogger("etl")
def extract(source):
"""Simulate extraction: read JSON lines from source (file-like)."""
for line in source:
line = line.strip()
if not line:
continue
yield json.loads(line)
def transform(records):
"""Simple transform: add a field."""
for r in records:
r["transformed"] = True
yield r
def load(records, dest):
"""Write JSON lines to dest (file-like)."""
count = 0
for r in records:
dest.write(json.dumps(r) + "\n")
count += 1
return count
def main(argv=None):
parser = argparse.ArgumentParser(prog="etl-cli")
parser.add_argument("--input", "-i", help="Input file (defaults to stdin)")
parser.add_argument("--output", "-o", help="Output file (defaults to stdout)")
parser.add_argument("--dry-run", action="store_true", help="Don't write output")
parser.add_argument("--verbose", "-v", action="count", default=0)
args = parser.parse_args(argv)
logging.basicConfig(level=logging.WARNING - args.verbose * 10)
in_f = open(args.input, "r") if args.input else sys.stdin
out_f = open(args.output, "w") if args.output else sys.stdout
try:
records = extract(in_f)
transformed = transform(records)
if args.dry_run:
for r in transformed:
logger.info("DRY: %s", r)
print("Dry run complete", file=sys.stderr)
return 0
count = load(transformed, out_f)
logger.info("Loaded %d records", count)
return 0
except Exception as exc:
logger.exception("ETL failed: %s", exc)
return 2
finally:
if args.input:
in_f.close()
if args.output:
out_f.close()
if __name__ == "__main__":
raise SystemExit(main())
Line-by-line explanation:
- import argparse, sys, json, logging: core modules we need.
- logger = logging.getLogger("etl"): module logger for consistent messages.
- extract(source): generator that yields parsed JSON lines — supports file-like objects and stdin.
- transform(records): generator to demonstrate transformation, adds a flag.
- load(records, dest): writes JSON lines to the destination and returns count.
- main(argv=None): parses arguments, configures logging level using verbosity (-v increases verbosity).
- if __name__ == "__main__": raise SystemExit(main()) — ensures proper exit codes.
- Input encoding: stdin/out use locale encoding; if you need deterministic behavior, open files with explicit encoding='utf-8'.
- Binary vs text mode: this example treats input as text JSON lines.
- Large files: using generators ensures streaming and low memory usage.
- echo '{"a":1}' | python etl_argparse.py
- python etl_argparse.py -i data.jsonl -o out.jsonl
- python etl_argparse.py -i data.jsonl --dry-run -v
Interactive prompts and validation
What if you need prompts? Use input() carefully — detect non-interactive sessions.
# prompt_example.py
import sys
def ask_confirm(prompt="Continue? (y/n): "):
if not sys.stdin.isatty():
raise RuntimeError("Interactive prompt required but stdin is not a TTY")
resp = input(prompt).strip().lower()
return resp in ("y", "yes")
Example use:
if __name__ == "__main__":
try:
if ask_confirm():
print("Proceeding...")
else:
print("Aborted.")
except RuntimeError as exc:
print(f"Error: {exc}", file=sys.stderr)
raise SystemExit(1)
Explanation:
- sys.stdin.isatty() ensures prompts are only shown when input is interactive; when piping data, this avoids stalling.
- When non-interactive, raise a clear error and exit non-zero so calling scripts can detect failure.
Advanced CLI: click for UX and subcommands
For more structured CLIs, click simplifies subcommands, auto-help, and prompts.
# etl_click.py
import click
import sys
import json
import logging
logger = logging.getLogger("etl")
@click.group()
@click.version_option("1.0")
def cli():
logging.basicConfig(level=logging.INFO)
logger.debug("CLI started")
@cli.command()
@click.option("--input", "-i", type=click.File("r"), default="-", help="Input file (defaults to stdin)")
@click.option("--output", "-o", type=click.File("w"), default="-", help="Output file (defaults to stdout)")
@click.option("--dry-run", is_flag=True, help="Don't write output")
def run(input, output, dry_run):
"""Run a simple ETL pipeline."""
def extract(src):
for line in src:
line = line.strip()
if not line:
continue
yield json.loads(line)
def transform(records):
for r in records:
r["ts"] = "2025-01-01"
yield r
try:
records = transform(extract(input))
if dry_run:
for r in records:
click.echo(f"DRY: {r}", err=True)
click.echo("Dry run complete", err=True)
return
for r in records:
output.write(json.dumps(r) + "\n")
except Exception as exc:
logger.exception("ETL failed")
raise click.ClickException(str(exc))
if __name__ == "__main__":
cli()
Explanation:
- click.File handles opening/closing files and supports "-" as stdin/stdout.
- click.group and subcommands make adding more commands (status, validate, etc.) easy.
- click.ClickException prints a clean, colored error message and sets non-zero exit code.
- Better user experience, built-in validation, and composable subcommands.
- Useful for CLIs that will be extended.
I/O Patterns and Best Practices
- Prefer file-like objects and generators: accept file handles rather than filenames internally. This allows piping and better testing.
- Respect stdin/stdout: default to them instead of forcing filenames.
- Do not close sys.stdin/sys.stdout — only close files you opened.
- Support both line-oriented and streaming modes for large data (avoid loading entire files into memory).
- Add a --format option (json/csv) to make output machine-friendly.
# chunked_loader.py
import json
from time import sleep
def batched(iterable, batch_size=100):
batch = []
for item in iterable:
batch.append(item)
if len(batch) >= batch_size:
yield batch
batch = []
if batch:
yield batch
def load_in_batches(records, writer, batch_size=100):
for batch in batched(records, batch_size):
# simulate heavy work
for r in batch:
writer.write(json.dumps(r) + "\n")
writer.flush()
This pattern:
- Reduces memory pressure and enables checkpointing or transactional batch operations.
- Integrates well with progress bars (tqdm) and backpressure.
Error Handling and Custom Exceptions
Robust CLI tools must distinguish user errors (invalid input), system errors (IO failures), and transient errors (network/timeouts).
Create a small exception hierarchy:
# errors.py
class CLIError(Exception):
"""Base class for CLI errors that should be shown to the user."""
exit_code = 2
class ValidationError(CLIError):
exit_code = 3
class UnexpectedError(CLIError):
exit_code = 1
Use these in your main logic:
# main_with_errors.py
import sys
from errors import ValidationError, UnexpectedError
def main():
try:
# validate args
raise ValidationError("Invalid configuration: missing --source")
except ValidationError as e:
print(f"Error: {e}", file=sys.stderr)
return e.exit_code
except Exception as e:
print("An unexpected error occurred", file=sys.stderr)
return UnexpectedError().exit_code
if __name__ == "__main__":
raise SystemExit(main())
Best practices:
- Map exceptions to meaningful exit codes.
- Separate logging level from user-facing messages. Log debug info to files when -v used.
- For networked ETL tasks, implement retries with backoff and surface retryable vs non-retryable errors.
Packaging and Entry Points
Make your CLI installable so users can run it directly. Use setuptools entry_points:
setup.cfg (snippet)
[options.entry_points]
console_scripts =
etl-cli = etl_argparse:main
This creates an executable etl-cli
that calls main()
. Prefer raising SystemExit for explicit exit codes.
Integrating with Docker
Integrating Python CLIs with Docker streamlines deployment and ensures consistent environments.
Example Dockerfile:
FROM python:3.11-slim
WORKDIR /app
COPY pyproject.toml poetry.lock /app/
RUN pip install --no-cache-dir .
COPY . /app
ENTRYPOINT ["etl-cli"]
CMD ["--help"]
Tips:
- Keep images small (slim/base images), use multi-stage builds for compiled dependencies.
- Use environment variables and volumes for config and data rather than baking large datasets into images.
- For local development, mount the source directory and use a thin wrapper to run inside container to mirror production.
Performance and UX Considerations
- Streaming > loading: for large data, process incrementally.
- Use compiled libraries (ujson, orjson) for heavy JSON workloads.
- Provide a --quiet/--verbose toggle and structured logging (JSON logs option) for machine consumption.
- Keep help text concise; provide examples in --help or on a README.
Common Pitfalls
- Blocking on input: prompt in scripts without checking TTY.
- Closing sys.stdout or sys.stdin inadvertently (don't close them).
- Hardcoding encoding — prefer using UTF-8 explicitly when required.
- Not returning proper exit codes: other scripts cannot detect failures.
- Mixed responsibilities: avoid mixing UI logic and business logic; keep core processing functions pure for easier testing.
Advanced Tips
- Testing CLIs: use click.testing.CliRunner for click tools, and pytest with monkeypatch to simulate stdin and environment variables.
- Subcommand composition: design small focused commands (extract, transform, load) that can be chained.
- Instrumentation: integrate with metrics systems (Prometheus pushgateway) or add telemetry for long-running ETL jobs.
- Security: sanitize file paths and guard against shell injection when invoking subprocesses.
Real-world Scenario: Triggering an ETL Pipeline
Imagine you maintain an ETL pipeline. You want a CLI that:
- Validates configs
- Runs an extraction from a database or S3
- Runs transform scripts
- Loads results and reports stats
- Separate steps into commands:
etl-cli validate
,etl-cli run --step extract
,etl-cli status
. - Support dry runs and
--snapshot
toggles. - Output machine-readable status on stdout (e.g., JSON) and human logs to stderr.
Conclusion
Creating effective Python CLIs requires more than argument parsing. Handle user input gracefully (support stdin, files, prompts), structure output for both humans and machines, implement robust error handling with clear exit codes, and consider packaging and deployment via Docker.
Try the code examples above:
- Run them with pip-installed dependencies or execute directly.
- Experiment with piping input and running inside Docker.
- Add JSON schema validation for inputs
- Integrate a job queue to run ETL steps asynchronously
- Add unit and integration tests to cover I/O edge cases
Further Reading
- argparse — the Python standard library documentation
- click — documentation for building composable CLIs
- Official logging docs — configuring complex logging setups
- Effective Error Handling in Python: Custom Exceptions and Best Practices
- Building Data Pipelines with Python: A Step-by-Step Guide to ETL Processes
- Integrating Python with Docker: Streamlining Your Development Workflow