Skip to main content

Docker Best Practices for Production Deployments in 2026

Docker Best Practices for Production Deployments in 2026: The Complete Engineering Guide

Docker Best Practices for Production Deployments in 2026: The Complete Engineering Guide

I once audited a production Docker setup where the base image was ubuntu:latest, the application ran as root, the image was 3.4GB, and the Dockerfile had the application code copied before the dependencies — meaning every single code change triggered a full dependency reinstall in CI. The team had been deploying this way for two years. The security scanner flagged 847 CVEs. Build time was 18 minutes per commit. None of this was malicious — it was simply a team that had learned Docker by copying examples from the internet without understanding the principles behind them. This guide is the resource I wish they had found first.

Who Is This Guide For?

  • Developers who know how to write a Dockerfile but are not confident their images are production-ready
  • DevOps engineers auditing existing Docker configurations for security and performance issues
  • Engineering leads establishing container standards for a team or organization
  • Full-stack developers containerizing their first production application and wanting to do it correctly from the start

Practice 1: Multi-Stage Builds — The Single Biggest Improvement

If you take only one practice from this guide, make it this one. Multi-stage builds separate your build environment from your runtime environment — resulting in production images that are dramatically smaller, faster to pull, and contain zero build tooling that could be exploited.

❌ Single-stage build

1.2 GB
Includes: compiler, pip, build tools, test deps, __pycache__, .git

✅ Multi-stage build

142 MB
Includes: Python runtime, app code, production deps only
❌ Before — Single Stage
FROM python:3.13
WORKDIR /app

# Copies everything including dev deps,
# test files, .git, documentation
COPY . .
RUN pip install -r requirements.txt

# Runs as root — security risk
CMD ["python", "main.py"]

# Result: 1.2GB image
# Contains: gcc, pip cache, test files
# Running as: root (UID 0)
✅ After — Multi-Stage
FROM python:3.13-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --user --no-cache-dir \
    -r requirements.txt

FROM python:3.13-slim AS runtime
WORKDIR /app

# Non-root user for security
RUN useradd -m -u 1001 appuser

COPY --from=builder /root/.local /home/appuser/.local
COPY --chown=appuser:appuser src/ .

USER appuser
CMD ["python", "main.py"]

# Result: 142MB image
# No build tools, no pip cache
# Running as: appuser (UID 1001)

Practice 2: Layer Ordering for Maximum Cache Efficiency

Docker builds images layer by layer, caching each one. When a layer changes, all subsequent layers are invalidated and rebuilt. This sounds technical — but the practical impact is enormous. To be honest, incorrect layer ordering is the most common reason CI pipelines are unnecessarily slow. I have seen 18-minute builds drop to under 3 minutes by simply reordering four lines.

The rule is simple: order from least-changed to most-changed.

Layer 1: Base image (python:3.13-slim)
~50MB
↓ changes: almost never
Layer 2: System packages (apt-get)
~15MB
↓ changes: rarely
Layer 3: requirements.txt + pip install
~60MB
↓ changes: occasionally
Layer 4: Application source code
~5MB
↓ changes: every commit
Dockerfile — correct layer order Dockerfile
FROM python:3.13-slim

WORKDIR /app

# Layer 1: System dependencies (rarely changes)
RUN apt-get update && apt-get install -y \
    libpq-dev \
    && rm -rf /var/lib/apt/lists/*   # Clean cache in same layer!

# Layer 2: Python dependencies (changes occasionally)
# Copy ONLY requirements.txt first — not the entire app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Layer 3: Application code (changes every commit)
# This layer invalidation only rebuilds this one layer
COPY src/ .

# Metadata
LABEL maintainer="brandteam@bioquro.com"
LABEL version="1.0"

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
+

Critical detail: rm -rf /var/lib/apt/lists/* must be in the same RUN command as the apt-get install. If you put it in a separate RUN command, Docker has already committed the cache as a layer — the cleanup layer saves zero bytes.

Practice 3: Security Hardening — Never Run as Root

By default, processes inside Docker containers run as root (UID 0). If an attacker exploits a vulnerability in your application and escapes the container, they have root access on the host. This is not theoretical — container escape vulnerabilities are discovered regularly. The fix is one line in your Dockerfile.

Dockerfile.secure Dockerfile · Security
FROM python:3.13-slim

# Create non-root user and group
RUN groupadd -r -g 1001 appgroup && \
    useradd -r -u 1001 -g appgroup -m -d /app appuser

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy files with correct ownership
COPY --chown=appuser:appgroup src/ .

# Drop to non-root before running
USER appuser

# Expose non-privileged port (>1024)
EXPOSE 8000

# Use exec form (not shell form) — proper signal handling
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
🔴

CMD shell form vs exec form: CMD python main.py (shell form) runs your app as a child of /bin/sh. Kubernetes and Docker cannot send signals directly to your process — graceful shutdown breaks. Always use exec form: CMD ["python", "main.py"]. This single change fixes silent shutdown failures in production.

Practice 4: The .dockerignore File — Smaller Context, Faster Builds

Every docker build command sends a "build context" — the entire directory — to the Docker daemon before building starts. Without a .dockerignore, this includes your .git folder, test files, virtual environments, and documentation. I have seen build contexts of 800MB for applications whose actual source was 2MB.

.dockerignore — production template .dockerignore
# Version control
.git
.gitignore
.github

# Python artifacts
__pycache__
*.pyc
*.pyo
*.pyd
.Python
*.egg-info
dist/
build/
.eggs/

# Virtual environments
.venv
venv
env
.env

# Testing
tests/
.pytest_cache
.coverage
coverage.xml
htmlcov/

# Development tools
.mypy_cache
.ruff_cache
.pre-commit-config.yaml

# Documentation
docs/
*.md
*.rst

# IDE configs
.vscode
.idea
*.swp

# OS artifacts
.DS_Store
Thumbs.db

# Secrets — never in image
.env.local
.env.production
secrets/
800MB
Build context without .dockerignore
2MB
Build context with .dockerignore
400x
Context size reduction

Practice 5: Health Checks — Kubernetes Needs to Know

Without a HEALTHCHECK, Kubernetes and Docker Swarm have no way to distinguish a container that is running from one that is running but completely broken. I have debugged production incidents where containers were showing as "running" in the dashboard while returning 500 errors to every single request — because there was no health check to trigger a restart.

Dockerfile + FastAPI health endpoint Dockerfile · Python
# In Dockerfile
HEALTHCHECK --interval=30s \
            --timeout=10s \
            --start-period=15s \
            --retries=3 \
    CMD curl -f http://localhost:8000/health/live || exit 1

# ─────────────────────────────────────────
# In your FastAPI application (main.py)
from fastapi import FastAPI
from datetime import datetime, timezone
import asyncio

app = FastAPI()

@app.get("/health/live")
async def liveness():
    """Is the process alive? Used by Docker HEALTHCHECK."""
    return {"status": "alive", "timestamp": datetime.now(timezone.utc).isoformat()}

@app.get("/health/ready")
async def readiness():
    """Is the app ready to serve traffic? Used by Kubernetes readiness probe."""
    try:
        # Check database connectivity
        await db.execute("SELECT 1")
        # Check Redis connectivity
        await redis.ping()
        return {"status": "ready"}
    except Exception as e:
        # Return 503 — Kubernetes removes pod from load balancer
        from fastapi import HTTPException
        raise HTTPException(status_code=503, detail=str(e))
i

Liveness vs Readiness: Liveness checks whether the process should be restarted. Readiness checks whether it should receive traffic. A container can be alive (process running) but not ready (database unavailable). Kubernetes uses both separately — always implement both endpoints.

Practice 6: Secrets Management — What Never Goes in a Dockerfile

Docker image layers are permanent and inspectable. Anything you put in a Dockerfile — including secrets passed as build arguments — becomes part of the image history and can be extracted with docker history. Every week, security researchers find API keys and database passwords in public Docker images on Docker Hub.

Secret TypeWrong ApproachCorrect Approach 2026
Database credentialsENV DB_PASSWORD=secret123 in DockerfileRuntime environment variable injected by orchestrator
API keysARG API_KEY build argumentDocker secrets or Kubernetes secrets mounted as files
SSL certificatesCOPY into imageVolume mount or Kubernetes cert-manager
SSH keysCOPY id_rsa into imageSSH agent forwarding during build (BuildKit)
Cloud credentials.aws/credentials in imageOIDC / IAM instance roles — no stored credentials

Practice 7: Choosing the Right Base Image

The base image choice has the largest single impact on image size and security surface area. In 2026, the options from largest to smallest attack surface are:

Base ImageSizeCVEs (typical)Use Case
python:3.13~1GB200-400Development only
python:3.13-slim~130MB20-50Most production workloads ✅
python:3.13-alpine~50MB5-15Use carefully — musl libc breaks some packages
gcr.io/distroless/python3~30MB0-5Maximum security — no shell, no package manager
!

Alpine warning: Alpine uses musl libc instead of glibc. Many Python packages with C extensions (NumPy, psycopg2, Pillow) either fail to install or require compilation from source — dramatically slowing builds. Use python:3.13-slim as your default production base unless you have a specific reason for Alpine.

The Production-Ready Dockerfile — Complete Reference

Dockerfile.production — complete template Dockerfile · Python · Production
# ── Stage 1: Build dependencies ──────────────────────────────────────
FROM python:3.13-slim AS builder

WORKDIR /build

# Install build-time system dependencies
RUN apt-get update && apt-get install -y \
    gcc libpq-dev \
    && rm -rf /var/lib/apt/lists/*

# Install Python dependencies into user directory
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# ── Stage 2: Production runtime ───────────────────────────────────────
FROM python:3.13-slim AS production

# Metadata
LABEL maintainer="brandteam@bioquro.com"
LABEL org.opencontainers.image.source="https://github.com/bioquro/app"

# Install runtime system dependencies only
RUN apt-get update && apt-get install -y \
    libpq5 curl \
    && rm -rf /var/lib/apt/lists/* \
    && groupadd -r -g 1001 appgroup \
    && useradd -r -u 1001 -g appgroup -m -d /app appuser

WORKDIR /app

# Copy built dependencies from builder stage
COPY --from=builder /root/.local /home/appuser/.local

# Copy application source with correct ownership
COPY --chown=appuser:appgroup src/ .

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
    CMD curl -f http://localhost:8000/health/live || exit 1

# Drop to non-root user
USER appuser

# Expose application port
EXPOSE 8000

# Exec form — critical for proper signal handling
CMD ["uvicorn", "main:app", \
     "--host", "0.0.0.0", \
     "--port", "8000", \
     "--workers", "1", \
     "--no-access-log"]

Production Readiness Checklist

  • Multi-stage build separates build and runtime environments
  • .dockerignore excludes .git, tests, venv, and documentation
  • Dependencies copied before source code — correct layer cache order
  • Base image is slim or distroless — not full OS
  • Application runs as non-root user (UID 1001+)
  • CMD uses exec form — not shell form
  • HEALTHCHECK configured with appropriate intervals
  • No secrets, credentials, or API keys in Dockerfile or image layers
  • apt/pip caches cleared in the same RUN layer that installs them
  • Never use :latest tag in production — always pin exact versions
  • Never run docker build without scanning the result with Trivy

Frequently Asked Questions

What is a multi-stage Docker build and why should I use it?+

A multi-stage build uses multiple FROM instructions in a single Dockerfile to separate the build environment from the runtime environment. The build stage installs compilers and dev dependencies. The final stage copies only the compiled artifacts — resulting in images that are 60-90% smaller, faster to pull, and contain no build tools that could be exploited in production.

How do I reduce Docker image size?+

Use multi-stage builds, start from slim or distroless base images, combine RUN commands to minimize layers, use .dockerignore to exclude test files and documentation, and remove package manager caches in the same RUN command that installs packages. A typical Python application drops from 1.2GB to under 150MB with these techniques combined.

How do I run Docker containers securely in production?+

Never run containers as root — use the USER directive with a non-root UID. Use read-only root filesystems where possible. Scan images for CVEs with Trivy before deployment. Never store secrets in Dockerfile or image layers — inject them at runtime via orchestrator secrets. Use distroless or slim base images to reduce attack surface.

What is the difference between Docker COPY and ADD?+

COPY simply copies files from the build context into the image — predictable and explicit. ADD does everything COPY does plus it can extract tar archives and download files from URLs. Best practice is to always use COPY unless you specifically need ADD's extraction capability. ADD's URL download feature is particularly risky in production as it fetches content at build time without verification.

How large is your current production Docker image?

Run docker images and leave a comment with the size. I will tell you exactly which practices will have the biggest impact on your specific setup.


👤
Tahar Maqawil

Senior Application Developer · DevOps & Containers Specialist · Bioquro

10+ years containerizing production applications — from 3.4GB insecure monoliths running as root, to hardened 140MB distroless images with zero critical CVEs. I write at Bioquro to give engineers the production-grade Docker knowledge that tutorials skip.

Comments

Popular posts from this blog

The Evolution of Microservices Architecture in 2026

The Evolution of Microservices Architecture in 2026: Patterns, Pitfalls, and What Actually Works Architecture Microservices 2026 Guide May 3, 2026  · 10 min read The Evolution of Microservices Architecture in 2026: Patterns, Pitfalls, and What Actually Works  Tahar Maqawil — Senior Application Developer Informaticien d'Application · Systems Architect · Bioquro 10+ years designing and deploying distributed systems in production I remember the first time I recommended microservices to a client. The project was a mid-sized e-commerce platform, the team was excited, and the architecture diagrams looked clean and elegant. Eight months later, we had 23 services, a Kafka cluster no one fully understood, distributed transactions that occasionally went silent, and an on-call rotation that had become everyone's worst nightmare. The system worked — but it was fragile in w...

Maximizing Server Performance for High-Traffic Applications in 2026: A Complete Engineering Guide

Maximizing Server Performance for High-Traffic Applications in 2026: A Complete Engineering Guide Server Performance High Traffic 2026 Guide May 3, 2026  · 11 min read Maximizing Server Performance for High-Traffic Scalable Applications in 2026: A Complete Engineering Guide 👤 Tahar Maqawil — Senior Application Developer Informaticien d'Application · Infrastructure & Scalability Engineer · Bioquro 10+ years scaling production systems from hundreds to millions of requests per day The call came at 2:47am. A client's e-commerce platform had just been featured on a major news site — the kind of exposure every startup dreams of. Within eight minutes of the article going live, 40,000 simultaneous users hit the site. Within twelve minutes, the server was returning 502 errors to everyone. By the time I joined the emergency call, the traffic spike had ...

Database Encryption in 2026: A Security-First Implementation Guide for Developers

Database Encryption in 2026: A Security-First Implementation Guide for Developers Security Encryption 2026 Guide May 3, 2026  · 11 min read Database Encryption in 2026: A Security-First Implementation Guide for Developers 👤 Tahar Maqawil — Senior Application Developer Informaticien d'Application · Security-Conscious Engineer · Bioquro 10+ years implementing secure data systems across regulated and high-stakes environments In 2023, a healthcare startup I consulted for suffered a data breach. The attacker gained read access to their PostgreSQL database for approximately 11 hours before detection. The technical entry point was a misconfigured API endpoint — a classic vulnerability. What made it catastrophic was that 340,000 patient records were stored in plain text. Full names, dates of birth, medical history, contact information — all directly read...