Docker Best Practices for Production Deployments in 2026: The Complete Engineering Guide
I once audited a production Docker setup where the base image was ubuntu:latest, the application ran as root, the image was 3.4GB, and the Dockerfile had the application code copied before the dependencies — meaning every single code change triggered a full dependency reinstall in CI. The team had been deploying this way for two years. The security scanner flagged 847 CVEs. Build time was 18 minutes per commit. None of this was malicious — it was simply a team that had learned Docker by copying examples from the internet without understanding the principles behind them. This guide is the resource I wish they had found first.
Who Is This Guide For?
- Developers who know how to write a Dockerfile but are not confident their images are production-ready
- DevOps engineers auditing existing Docker configurations for security and performance issues
- Engineering leads establishing container standards for a team or organization
- Full-stack developers containerizing their first production application and wanting to do it correctly from the start
Practice 1: Multi-Stage Builds — The Single Biggest Improvement
If you take only one practice from this guide, make it this one. Multi-stage builds separate your build environment from your runtime environment — resulting in production images that are dramatically smaller, faster to pull, and contain zero build tooling that could be exploited.
❌ Single-stage build
✅ Multi-stage build
FROM python:3.13
WORKDIR /app
# Copies everything including dev deps,
# test files, .git, documentation
COPY . .
RUN pip install -r requirements.txt
# Runs as root — security risk
CMD ["python", "main.py"]
# Result: 1.2GB image
# Contains: gcc, pip cache, test files
# Running as: root (UID 0)
FROM python:3.13-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --user --no-cache-dir \
-r requirements.txt
FROM python:3.13-slim AS runtime
WORKDIR /app
# Non-root user for security
RUN useradd -m -u 1001 appuser
COPY --from=builder /root/.local /home/appuser/.local
COPY --chown=appuser:appuser src/ .
USER appuser
CMD ["python", "main.py"]
# Result: 142MB image
# No build tools, no pip cache
# Running as: appuser (UID 1001)
Practice 2: Layer Ordering for Maximum Cache Efficiency
Docker builds images layer by layer, caching each one. When a layer changes, all subsequent layers are invalidated and rebuilt. This sounds technical — but the practical impact is enormous. To be honest, incorrect layer ordering is the most common reason CI pipelines are unnecessarily slow. I have seen 18-minute builds drop to under 3 minutes by simply reordering four lines.
The rule is simple: order from least-changed to most-changed.
FROM python:3.13-slim
WORKDIR /app
# Layer 1: System dependencies (rarely changes)
RUN apt-get update && apt-get install -y \
libpq-dev \
&& rm -rf /var/lib/apt/lists/* # Clean cache in same layer!
# Layer 2: Python dependencies (changes occasionally)
# Copy ONLY requirements.txt first — not the entire app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Layer 3: Application code (changes every commit)
# This layer invalidation only rebuilds this one layer
COPY src/ .
# Metadata
LABEL maintainer="brandteam@bioquro.com"
LABEL version="1.0"
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Critical detail: rm -rf /var/lib/apt/lists/* must be in the same RUN command as the apt-get install. If you put it in a separate RUN command, Docker has already committed the cache as a layer — the cleanup layer saves zero bytes.
Practice 3: Security Hardening — Never Run as Root
By default, processes inside Docker containers run as root (UID 0). If an attacker exploits a vulnerability in your application and escapes the container, they have root access on the host. This is not theoretical — container escape vulnerabilities are discovered regularly. The fix is one line in your Dockerfile.
FROM python:3.13-slim
# Create non-root user and group
RUN groupadd -r -g 1001 appgroup && \
useradd -r -u 1001 -g appgroup -m -d /app appuser
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy files with correct ownership
COPY --chown=appuser:appgroup src/ .
# Drop to non-root before running
USER appuser
# Expose non-privileged port (>1024)
EXPOSE 8000
# Use exec form (not shell form) — proper signal handling
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
CMD shell form vs exec form: CMD python main.py (shell form) runs your app as a child of /bin/sh. Kubernetes and Docker cannot send signals directly to your process — graceful shutdown breaks. Always use exec form: CMD ["python", "main.py"]. This single change fixes silent shutdown failures in production.
Practice 4: The .dockerignore File — Smaller Context, Faster Builds
Every docker build command sends a "build context" — the entire directory — to the Docker daemon before building starts. Without a .dockerignore, this includes your .git folder, test files, virtual environments, and documentation. I have seen build contexts of 800MB for applications whose actual source was 2MB.
# Version control
.git
.gitignore
.github
# Python artifacts
__pycache__
*.pyc
*.pyo
*.pyd
.Python
*.egg-info
dist/
build/
.eggs/
# Virtual environments
.venv
venv
env
.env
# Testing
tests/
.pytest_cache
.coverage
coverage.xml
htmlcov/
# Development tools
.mypy_cache
.ruff_cache
.pre-commit-config.yaml
# Documentation
docs/
*.md
*.rst
# IDE configs
.vscode
.idea
*.swp
# OS artifacts
.DS_Store
Thumbs.db
# Secrets — never in image
.env.local
.env.production
secrets/
Practice 5: Health Checks — Kubernetes Needs to Know
Without a HEALTHCHECK, Kubernetes and Docker Swarm have no way to distinguish a container that is running from one that is running but completely broken. I have debugged production incidents where containers were showing as "running" in the dashboard while returning 500 errors to every single request — because there was no health check to trigger a restart.
# In Dockerfile
HEALTHCHECK --interval=30s \
--timeout=10s \
--start-period=15s \
--retries=3 \
CMD curl -f http://localhost:8000/health/live || exit 1
# ─────────────────────────────────────────
# In your FastAPI application (main.py)
from fastapi import FastAPI
from datetime import datetime, timezone
import asyncio
app = FastAPI()
@app.get("/health/live")
async def liveness():
"""Is the process alive? Used by Docker HEALTHCHECK."""
return {"status": "alive", "timestamp": datetime.now(timezone.utc).isoformat()}
@app.get("/health/ready")
async def readiness():
"""Is the app ready to serve traffic? Used by Kubernetes readiness probe."""
try:
# Check database connectivity
await db.execute("SELECT 1")
# Check Redis connectivity
await redis.ping()
return {"status": "ready"}
except Exception as e:
# Return 503 — Kubernetes removes pod from load balancer
from fastapi import HTTPException
raise HTTPException(status_code=503, detail=str(e))
Liveness vs Readiness: Liveness checks whether the process should be restarted. Readiness checks whether it should receive traffic. A container can be alive (process running) but not ready (database unavailable). Kubernetes uses both separately — always implement both endpoints.
Practice 6: Secrets Management — What Never Goes in a Dockerfile
Docker image layers are permanent and inspectable. Anything you put in a Dockerfile — including secrets passed as build arguments — becomes part of the image history and can be extracted with docker history. Every week, security researchers find API keys and database passwords in public Docker images on Docker Hub.
| Secret Type | Wrong Approach | Correct Approach 2026 |
|---|---|---|
| Database credentials | ENV DB_PASSWORD=secret123 in Dockerfile | Runtime environment variable injected by orchestrator |
| API keys | ARG API_KEY build argument | Docker secrets or Kubernetes secrets mounted as files |
| SSL certificates | COPY into image | Volume mount or Kubernetes cert-manager |
| SSH keys | COPY id_rsa into image | SSH agent forwarding during build (BuildKit) |
| Cloud credentials | .aws/credentials in image | OIDC / IAM instance roles — no stored credentials |
Practice 7: Choosing the Right Base Image
The base image choice has the largest single impact on image size and security surface area. In 2026, the options from largest to smallest attack surface are:
| Base Image | Size | CVEs (typical) | Use Case |
|---|---|---|---|
python:3.13 | ~1GB | 200-400 | Development only |
python:3.13-slim | ~130MB | 20-50 | Most production workloads ✅ |
python:3.13-alpine | ~50MB | 5-15 | Use carefully — musl libc breaks some packages |
gcr.io/distroless/python3 | ~30MB | 0-5 | Maximum security — no shell, no package manager |
Alpine warning: Alpine uses musl libc instead of glibc. Many Python packages with C extensions (NumPy, psycopg2, Pillow) either fail to install or require compilation from source — dramatically slowing builds. Use python:3.13-slim as your default production base unless you have a specific reason for Alpine.
The Production-Ready Dockerfile — Complete Reference
# ── Stage 1: Build dependencies ──────────────────────────────────────
FROM python:3.13-slim AS builder
WORKDIR /build
# Install build-time system dependencies
RUN apt-get update && apt-get install -y \
gcc libpq-dev \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies into user directory
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
# ── Stage 2: Production runtime ───────────────────────────────────────
FROM python:3.13-slim AS production
# Metadata
LABEL maintainer="brandteam@bioquro.com"
LABEL org.opencontainers.image.source="https://github.com/bioquro/app"
# Install runtime system dependencies only
RUN apt-get update && apt-get install -y \
libpq5 curl \
&& rm -rf /var/lib/apt/lists/* \
&& groupadd -r -g 1001 appgroup \
&& useradd -r -u 1001 -g appgroup -m -d /app appuser
WORKDIR /app
# Copy built dependencies from builder stage
COPY --from=builder /root/.local /home/appuser/.local
# Copy application source with correct ownership
COPY --chown=appuser:appgroup src/ .
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
CMD curl -f http://localhost:8000/health/live || exit 1
# Drop to non-root user
USER appuser
# Expose application port
EXPOSE 8000
# Exec form — critical for proper signal handling
CMD ["uvicorn", "main:app", \
"--host", "0.0.0.0", \
"--port", "8000", \
"--workers", "1", \
"--no-access-log"]
Production Readiness Checklist
- ✅ Multi-stage build separates build and runtime environments
- ✅ .dockerignore excludes .git, tests, venv, and documentation
- ✅ Dependencies copied before source code — correct layer cache order
- ✅ Base image is slim or distroless — not full OS
- ✅ Application runs as non-root user (UID 1001+)
- ✅ CMD uses exec form — not shell form
- ✅ HEALTHCHECK configured with appropriate intervals
- ✅ No secrets, credentials, or API keys in Dockerfile or image layers
- ✅ apt/pip caches cleared in the same RUN layer that installs them
- ❌ Never use
:latesttag in production — always pin exact versions - ❌ Never run
docker buildwithout scanning the result with Trivy
Frequently Asked Questions
A multi-stage build uses multiple FROM instructions in a single Dockerfile to separate the build environment from the runtime environment. The build stage installs compilers and dev dependencies. The final stage copies only the compiled artifacts — resulting in images that are 60-90% smaller, faster to pull, and contain no build tools that could be exploited in production.
Use multi-stage builds, start from slim or distroless base images, combine RUN commands to minimize layers, use .dockerignore to exclude test files and documentation, and remove package manager caches in the same RUN command that installs packages. A typical Python application drops from 1.2GB to under 150MB with these techniques combined.
Never run containers as root — use the USER directive with a non-root UID. Use read-only root filesystems where possible. Scan images for CVEs with Trivy before deployment. Never store secrets in Dockerfile or image layers — inject them at runtime via orchestrator secrets. Use distroless or slim base images to reduce attack surface.
COPY simply copies files from the build context into the image — predictable and explicit. ADD does everything COPY does plus it can extract tar archives and download files from URLs. Best practice is to always use COPY unless you specifically need ADD's extraction capability. ADD's URL download feature is particularly risky in production as it fetches content at build time without verification.
How large is your current production Docker image?
Run docker images and leave a comment with the size. I will tell you exactly which practices will have the biggest impact on your specific setup.

Comments
Post a Comment