Creating a Python CLI Tool: Best Practices for User...

Introduction

Command-line interfaces (CLIs) are the glue of automation: they trigger data pipelines, run tests, orchestrate deployments, and serve as the backbone for many developer workflows. But building a reliable, user-friendly CLI requires thoughtful handling of user input and program output, careful error handling, and sensible defaults.

Have you ever written a script that works locally but fails when users pipe input into it, or when it runs inside a Docker container with different locale settings? This post teaches the practical patterns and best practices to avoid those pitfalls.

We'll cover:

Core concepts and prerequisites
Practical, working examples using argparse and click
Patterns for stdin/stdout/stderr handling, logging and exit codes
Error handling with custom exceptions and validation
Packaging and running your CLI in Docker
Real-world use-case: invoking ETL processes from a CLI

Relevant topics integrated: Building Data Pipelines with Python: A Step-by-Step Guide to ETL Processes; Effective Error Handling in Python: Custom Exceptions and Best Practices; Integrating Python with Docker: Streamlining Your Development Workflow.

Prerequisites

You should be comfortable with:

Python 3.7+ (3.11 recommended)
Basic standard library modules (argparse, logging, sys, subprocess)
Virtual environments and pip

Optional:

click (for richer CLI UX)
Docker (for containerization)

Core Concepts

Before writing code, let's break down the problem:

Input sources: command-line arguments, environment variables, config files, stdin (piped data), interactive prompts.
Output sinks: stdout (normal output), stderr (diagnostics), log files, structured outputs (JSON/CSV).
Exit codes: 0 for success, non-zero for different error classes.
Error handling: validate early, raise meaningful exceptions, expose helpful messages to users.
UX: helpful --help, sensible defaults, progress indication, verbosity flags.

Think of the CLI as a small API for humans and other tools. Design it to be programmable (composeable) and predictable.

Step-by-Step Examples

We'll build a small real-world tool: etl-cli — a simplified CLI to kick off ETL jobs. It accepts commands (extract, transform, load), can read input from a file or stdin, outputs results to stdout or file, and supports verbosity and dry-run.

Minimal approach: argparse and functions

First, a straightforward implementation using argparse.

# etl_argparse.py
import argparse
import sys
import json
import logging
logger = logging.getLogger("etl")
def extract(source):
    """Simulate extraction: read JSON lines from source (file-like)."""
    for line in source:
        line = line.strip()
        if not line:
            continue
        yield json.loads(line)
def transform(records):
    """Simple transform: add a field."""
    for r in records:
        r["transformed"] = True
        yield r
def load(records, dest):
    """Write JSON lines to dest (file-like)."""
    count = 0
    for r in records:
        dest.write(json.dumps(r) + "\n")
        count += 1
    return count
def main(argv=None):
    parser = argparse.ArgumentParser(prog="etl-cli")
    parser.add_argument("--input", "-i", help="Input file (defaults to stdin)")
    parser.add_argument("--output", "-o", help="Output file (defaults to stdout)")
    parser.add_argument("--dry-run", action="store_true", help="Don't write output")
    parser.add_argument("--verbose", "-v", action="count", default=0)
    args = parser.parse_args(argv)
    logging.basicConfig(level=logging.WARNING - args.verbose * 10)
    in_f = open(args.input, "r") if args.input else sys.stdin
    out_f = open(args.output, "w") if args.output else sys.stdout
    try:
        records = extract(in_f)
        transformed = transform(records)
        if args.dry_run:
            for r in transformed:
                logger.info("DRY: %s", r)
            print("Dry run complete", file=sys.stderr)
            return 0
        count = load(transformed, out_f)
        logger.info("Loaded %d records", count)
        return 0
    except Exception as exc:
        logger.exception("ETL failed: %s", exc)
        return 2
    finally:
        if args.input:
            in_f.close()
        if args.output:
            out_f.close()
if __name__ == "__main__":
    raise SystemExit(main())

Line-by-line explanation:

import argparse, sys, json, logging: core modules we need.
logger = logging.getLogger("etl"): module logger for consistent messages.
extract(source): generator that yields parsed JSON lines — supports file-like objects and stdin.
transform(records): generator to demonstrate transformation, adds a flag.
load(records, dest): writes JSON lines to the destination and returns count.
main(argv=None): parses arguments, configures logging level using verbosity (-v increases verbosity).

- args.input/args.output: default to stdin/stdout when not provided; this makes the tool composable. - logging.basicConfig(level=...): compute level by subtracting 10 per -v (WARNING->INFO->DEBUG). - in_f/out_f: open files only when specified; otherwise use sys.stdin/sys.stdout (do not close sys.stdin/out). - dry-run: demonstrates not writing output; useful for testing. - exceptions are caught; logger.exception logs stack trace and returns a non-zero exit code. - finally: close only files we explicitly opened.

if __name__ == "__main__": raise SystemExit(main()) — ensures proper exit codes.

Edge cases and considerations:

Input encoding: stdin/out use locale encoding; if you need deterministic behavior, open files with explicit encoding='utf-8'.
Binary vs text mode: this example treats input as text JSON lines.
Large files: using generators ensures streaming and low memory usage.

Try it:

echo '{"a":1}' | python etl_argparse.py
python etl_argparse.py -i data.jsonl -o out.jsonl
python etl_argparse.py -i data.jsonl --dry-run -v

Interactive prompts and validation

What if you need prompts? Use input() carefully — detect non-interactive sessions.

# prompt_example.py
import sys
def ask_confirm(prompt="Continue? (y/n): "):
    if not sys.stdin.isatty():
        raise RuntimeError("Interactive prompt required but stdin is not a TTY")
    resp = input(prompt).strip().lower()
    return resp in ("y", "yes")
Example use:
if __name__ == "__main__":
    try:
        if ask_confirm():
            print("Proceeding...")
        else:
            print("Aborted.")
    except RuntimeError as exc:
        print(f"Error: {exc}", file=sys.stderr)
        raise SystemExit(1)

Explanation:

sys.stdin.isatty() ensures prompts are only shown when input is interactive; when piping data, this avoids stalling.
When non-interactive, raise a clear error and exit non-zero so calling scripts can detect failure.

Advanced CLI: click for UX and subcommands

For more structured CLIs, click simplifies subcommands, auto-help, and prompts.

# etl_click.py
import click
import sys
import json
import logging
logger = logging.getLogger("etl")
@click.group()
@click.version_option("1.0")
def cli():
    logging.basicConfig(level=logging.INFO)
    logger.debug("CLI started")
@cli.command()
@click.option("--input", "-i", type=click.File("r"), default="-", help="Input file (defaults to stdin)")
@click.option("--output", "-o", type=click.File("w"), default="-", help="Output file (defaults to stdout)")
@click.option("--dry-run", is_flag=True, help="Don't write output")
def run(input, output, dry_run):
    """Run a simple ETL pipeline."""
    def extract(src):
        for line in src:
            line = line.strip()
            if not line:
                continue
            yield json.loads(line)
    def transform(records):
        for r in records:
            r["ts"] = "2025-01-01"
            yield r
    try:
        records = transform(extract(input))
        if dry_run:
            for r in records:
                click.echo(f"DRY: {r}", err=True)
            click.echo("Dry run complete", err=True)
            return
        for r in records:
            output.write(json.dumps(r) + "\n")
    except Exception as exc:
        logger.exception("ETL failed")
        raise click.ClickException(str(exc))
if __name__ == "__main__":
    cli()

Explanation:

click.File handles opening/closing files and supports "-" as stdin/stdout.
click.group and subcommands make adding more commands (status, validate, etc.) easy.
click.ClickException prints a clean, colored error message and sets non-zero exit code.

Why use click?

Better user experience, built-in validation, and composable subcommands.
Useful for CLIs that will be extended.

I/O Patterns and Best Practices

Prefer file-like objects and generators: accept file handles rather than filenames internally. This allows piping and better testing.
Respect stdin/stdout: default to them instead of forcing filenames.
Do not close sys.stdin/sys.stdout — only close files you opened.
Support both line-oriented and streaming modes for large data (avoid loading entire files into memory).
Add a --format option (json/csv) to make output machine-friendly.

Example pattern: process chunked input with a progress indicator.

# chunked_loader.py
import json
from time import sleep
def batched(iterable, batch_size=100):
    batch = []
    for item in iterable:
        batch.append(item)
        if len(batch) >= batch_size:
            yield batch
            batch = []
    if batch:
        yield batch
def load_in_batches(records, writer, batch_size=100):
    for batch in batched(records, batch_size):
        # simulate heavy work
        for r in batch:
            writer.write(json.dumps(r) + "\n")
        writer.flush()

This pattern:

Reduces memory pressure and enables checkpointing or transactional batch operations.
Integrates well with progress bars (tqdm) and backpressure.

Error Handling and Custom Exceptions

Robust CLI tools must distinguish user errors (invalid input), system errors (IO failures), and transient errors (network/timeouts).

Create a small exception hierarchy:

# errors.py
class CLIError(Exception):
    """Base class for CLI errors that should be shown to the user."""
    exit_code = 2
class ValidationError(CLIError):
    exit_code = 3
class UnexpectedError(CLIError):
    exit_code = 1

Use these in your main logic:

# main_with_errors.py
import sys
from errors import ValidationError, UnexpectedError
def main():
    try:
        # validate args
        raise ValidationError("Invalid configuration: missing --source")
    except ValidationError as e:
        print(f"Error: {e}", file=sys.stderr)
        return e.exit_code
    except Exception as e:
        print("An unexpected error occurred", file=sys.stderr)
        return UnexpectedError().exit_code
if __name__ == "__main__":
    raise SystemExit(main())

Best practices:

Map exceptions to meaningful exit codes.
Separate logging level from user-facing messages. Log debug info to files when -v used.
For networked ETL tasks, implement retries with backoff and surface retryable vs non-retryable errors.

Refer to Effective Error Handling in Python: Custom Exceptions and Best Practices for patterns such as error wrapping and context preservation.

Packaging and Entry Points

Make your CLI installable so users can run it directly. Use setuptools entry_points:

setup.cfg (snippet)

[options.entry_points]
console_scripts =
    etl-cli = etl_argparse:main

This creates an executable etl-cli that calls main(). Prefer raising SystemExit for explicit exit codes.

Integrating with Docker

Integrating Python CLIs with Docker streamlines deployment and ensures consistent environments.

Example Dockerfile:

FROM python:3.11-slim
WORKDIR /app
COPY pyproject.toml poetry.lock /app/
RUN pip install --no-cache-dir .
COPY . /app
ENTRYPOINT ["etl-cli"]
CMD ["--help"]

Tips:

Keep images small (slim/base images), use multi-stage builds for compiled dependencies.
Use environment variables and volumes for config and data rather than baking large datasets into images.
For local development, mount the source directory and use a thin wrapper to run inside container to mirror production.

Integrating Python with Docker: Streamlining Your Development Workflow — containerizing ensures consistent encodings, installed dependencies, and system locales for your CLI.

Performance and UX Considerations

Streaming > loading: for large data, process incrementally.
Use compiled libraries (ujson, orjson) for heavy JSON workloads.
Provide a --quiet/--verbose toggle and structured logging (JSON logs option) for machine consumption.
Keep help text concise; provide examples in --help or on a README.

Common Pitfalls

Blocking on input: prompt in scripts without checking TTY.
Closing sys.stdout or sys.stdin inadvertently (don't close them).
Hardcoding encoding — prefer using UTF-8 explicitly when required.
Not returning proper exit codes: other scripts cannot detect failures.
Mixed responsibilities: avoid mixing UI logic and business logic; keep core processing functions pure for easier testing.

Advanced Tips

Testing CLIs: use click.testing.CliRunner for click tools, and pytest with monkeypatch to simulate stdin and environment variables.
Subcommand composition: design small focused commands (extract, transform, load) that can be chained.
Instrumentation: integrate with metrics systems (Prometheus pushgateway) or add telemetry for long-running ETL jobs.
Security: sanitize file paths and guard against shell injection when invoking subprocesses.

Real-world Scenario: Triggering an ETL Pipeline

Imagine you maintain an ETL pipeline. You want a CLI that:

Validates configs
Runs an extraction from a database or S3
Runs transform scripts
Loads results and reports stats

Design:

Separate steps into commands: etl-cli validate, etl-cli run --step extract, etl-cli status.
Support dry runs and --snapshot toggles.
Output machine-readable status on stdout (e.g., JSON) and human logs to stderr.

This approach mirrors many production tools and integrates with broader topics like Building Data Pipelines with Python: A Step-by-Step Guide to ETL Processes.

Conclusion

Creating effective Python CLIs requires more than argument parsing. Handle user input gracefully (support stdin, files, prompts), structure output for both humans and machines, implement robust error handling with clear exit codes, and consider packaging and deployment via Docker.

Try the code examples above:

Run them with pip-installed dependencies or execute directly.
Experiment with piping input and running inside Docker.

If you found this useful, try extending the examples:

Add JSON schema validation for inputs
Integrate a job queue to run ETL steps asynchronously
Add unit and integration tests to cover I/O edge cases

Creating a Python CLI Tool: Best Practices for User Input and Output Handling