Kubernetes Deployment Best Practices in 2026: What Actually Breaks in Production (And How We Fixed It)
Every Kubernetes best practices guide on the internet is a list. "Use resource limits. Configure health probes. Enable RBAC." They are not wrong. But they read like a pilot's checklist written by someone who has never crashed a plane. I have crashed the plane. Multiple times. This guide is written from the other direction — from the production incidents, the 3am pages, and the post-mortems that taught me why these practices exist. Each section starts with what broke, then explains the fix. Because in my experience, an engineer who understands why something fails remembers the fix permanently. An engineer who memorizes a checklist forgets it at the worst possible moment.
Who Is This Guide For?
This guide is for engineers already running Kubernetes in production — or preparing to. It is not a Kubernetes introduction. It assumes you know what a Pod, Deployment, and Service are. It is written for the engineer who has a working cluster and wants to know what will break it at scale — before it breaks.
- Platform and DevOps engineers managing Kubernetes clusters for multiple teams
- Backend engineers who own their deployment configuration and want production-grade YAML
- Engineering leads establishing Kubernetes standards for a growing organization
- Anyone who has ever been paged at 3am because of a Kubernetes issue they thought they had handled
The Production Readiness Score — A Framework for What Matters First
The Production Readiness Score is a 5-level maturity framework for Kubernetes deployments — from "it runs" to "it runs reliably at scale under adversarial conditions." Most teams skip Level 2 basics while obsessing over Level 5 concerns, which is why their clusters keep paging them.
The most common mistake I see: Teams jump directly to Level 5 — they implement GitOps with ArgoCD, signed images, and FinOps dashboards — while running pods with no resource limits and no liveness probes. Level 2 failures kill production. Level 5 omissions cause audit findings. Fix in order.
Practice 1: Resource Requests and Limits — The Mistake That Took Down an Entire Node
Resource requests tell the Kubernetes scheduler how much CPU and memory a pod needs to be placed on a node. Resource limits tell the kubelet the maximum a pod can consume. Missing requests causes the scheduler to place too many pods on a single node. Missing limits allows a single misbehaving pod to consume all node resources, evicting every other pod on that node.
In late 2024, a memory leak in a newly deployed service (no resource limits configured) consumed all 32GB of RAM on a 3-node cluster. The OOM killer started evicting pods across all namespaces — including pods from completely unrelated services. At 2am, we had 11 services down simultaneously because one pod had no memory limit. The fix took 20 minutes. The post-mortem took three hours. The "enforce resource limits" policy took one Slack message after that.
Every container gets explicit requests AND limits. Limits are set to 2x the measured average usage under load. Never set CPU limit to the same value as the CPU request — CPU throttling at limit is worse than no limit for latency-sensitive services.
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: api-service
template:
spec:
containers:
- name: api
image: ghcr.io/bioquro/api:sha-a1b2c3d # Never :latest in production
resources:
requests:
memory: "256Mi" # Scheduler uses this for placement
cpu: "100m" # 100 millicores = 0.1 CPU core
limits:
memory: "512Mi" # OOM kill if exceeded — protects the node
cpu: "500m" # Throttled if exceeded — set 4-5x above request
# Why different ratios?
# Memory: a leak that hits the limit should be killed fast.
# CPU: throttling is recoverable; set limit generously to avoid
# latency spikes from CPU throttle at limit.
How to set the right values: Run your service under realistic load for 30 minutes, then query: kubectl top pods -n production. Set requests to the observed average, limits to 2x the observed peak. Revisit every 90 days as traffic grows. [INTERNAL LINK: System Optimization Guide 2026 → bioquro.com/system-optimization-guide-2026]
Practice 2: Liveness vs Readiness Probes — The Confusion That Caused a 40-Minute Outage
A liveness probe answers: "Should Kubernetes restart this container?" A readiness probe answers: "Should Kubernetes send traffic to this container?" They are used by different Kubernetes components for different decisions. Confusing them — or using only one — is one of the most frequent sources of unnecessary restarts and traffic loss in production Kubernetes clusters.
A service took 45 seconds to initialize — it needed to load a 200MB ML model into memory before serving requests. The team had configured a liveness probe with an initial delay of 20 seconds. During high-load deployments, the pod would still be loading the model when the liveness probe fired. Kubernetes interpreted "not responding yet" as "unhealthy" and restarted the pod. Which started loading the model again. Which triggered the liveness probe again. The pod never successfully started during a rolling deployment under load — it just restarted endlessly for 40 minutes until someone manually increased the initialDelaySeconds.
containers:
- name: api
image: ghcr.io/bioquro/api:sha-a1b2c3d
# LIVENESS: restart the container if it deadlocks or crashes
# Keep this SIMPLE — only check if the process is alive
# Do NOT check downstream dependencies here
livenessProbe:
httpGet:
path: /health/live # Returns 200 if process is running
port: 8000
initialDelaySeconds: 60 # Give slow-starting apps time to init
periodSeconds: 15
failureThreshold: 3 # 3 failures = restart (45s total grace)
timeoutSeconds: 5
# READINESS: remove from Service endpoints if not ready for traffic
# Check real dependencies here — DB, cache, downstream services
readinessProbe:
httpGet:
path: /health/ready # Returns 200 only when DB + cache connected
port: 8000
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3
timeoutSeconds: 5
# STARTUP PROBE: for slow-starting applications (K8s 1.16+)
# Disables liveness probe until startup succeeds
# Eliminates the need to set a huge initialDelaySeconds on liveness
startupProbe:
httpGet:
path: /health/live
port: 8000
failureThreshold: 30 # Allow up to 5 minutes to start (30 x 10s)
periodSeconds: 10
Practice 3: Never Use :latest — The Tag That Made Our Replicas Run Different Code
The :latest image tag is mutable — it points to whatever was pushed most recently. When Kubernetes reschedules a pod to a new node, it pulls whatever :latest currently resolves to — which may be a version you never approved for production. This silently breaks the guarantee that all replicas run identical code.
After a node failure, Kubernetes rescheduled two pods to a new node. The :latest tag had advanced during the node failure window — a developer had pushed a new build to fix an unrelated staging bug. The new node pulled the newer image. For 6 hours, our 5-replica deployment was running 3 pods on v1.8.2 and 2 pods on v1.9.0-beta. The v1.9.0-beta had a schema migration that had not been applied to production. Two replicas were crashing silently. We only noticed because the error rate was elevated — not because any alert fired on "replica inconsistency."
# WRONG: mutable tag — resolves to different images over time
image: myapp:latest
image: myapp:v1.8 # Also wrong — v1.8 could be overwritten
# CORRECT: immutable SHA digest — always resolves to exact same image
image: ghcr.io/bioquro/api@sha256:a1b2c3d4e5f6...
# PRACTICAL: semantic version pinned in CI, never pushed manually
image: ghcr.io/bioquro/api:v1.8.2
# In your GitHub Actions CI (from our CI/CD guide):
# [INTERNAL LINK: CI/CD Pipeline Guide → bioquro.com/cicd-pipeline-setup-2026]
- name: Update image in deployment
run: |
IMAGE_TAG="sha-${{ github.sha }}"
kubectl set image deployment/api-service \
api=ghcr.io/bioquro/api:${IMAGE_TAG} \
-n production
# Or with GitOps — update the tag in Git and let ArgoCD sync
Practice 4: Pod Disruption Budgets — What the Autoscaler Did to Our Production Traffic
A Pod Disruption Budget (PDB) defines the minimum number of pod replicas that must remain available during voluntary disruptions — node drains, cluster upgrades, or autoscaler scale-down events. Without a PDB, Kubernetes has no constraint on how many pods it can evict simultaneously from a Deployment.
We enabled the Cluster Autoscaler on a cost-optimization project. On a Sunday morning with low traffic, the autoscaler identified two underutilized nodes and began draining them. Both nodes happened to be running all 3 replicas of our authentication service. The autoscaler drained both nodes simultaneously. For 90 seconds, the authentication service had zero available replicas. Every user login across every service failed. No PDB was configured. The autoscaler was doing exactly what it was designed to do — we just had not told it the constraint.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: auth-service-pdb
namespace: production
spec:
# minAvailable: at least 1 replica must remain during any disruption
# Use this for services where you can tolerate reduced capacity
minAvailable: 1
# Alternative: maxUnavailable
# maxUnavailable: 1 # At most 1 replica can be disrupted at a time
# Use this when you want to express disruption as a fraction of capacity
selector:
matchLabels:
app: auth-service
---
# For critical services (payment, auth, etc.) — stricter budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: payment-service-pdb
namespace: production
spec:
minAvailable: "80%" # At least 80% of replicas must stay up
selector:
matchLabels:
app: payment-service
Practice 5: The HPA vs VPA vs KEDA Decision — Getting This Wrong Wastes Money or Causes Lag
Kubernetes offers three primary autoscaling mechanisms: HPA (Horizontal Pod Autoscaler) scales replica count based on CPU/memory metrics, VPA (Vertical Pod Autoscaler) adjusts resource requests for individual pods, and KEDA (Kubernetes Event-Driven Autoscaler) scales based on external event sources like queue depth or message lag. Using the wrong one for your workload wastes money at best and causes cascading failures at worst.
Most guides say "use HPA." After seeing it fail in three different scenarios, here is my actual decision framework:
Never run HPA and VPA simultaneously on the same Deployment in automatic mode. They fight over the pod spec — HPA changes replica count, VPA changes resource requests, both trigger pod restarts. Use VPA in "Off" or "Initial" mode alongside HPA, or use them on different deployments entirely.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-service-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
minReplicas: 3 # Never below 3 — single replica = single point of failure
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60 # Scale up at 60% CPU — not 80%
# Why 60%? At 80%, you're already slow. Scaling takes 1-2 minutes.
# By the time new pods are ready, you've already degraded.
behavior:
scaleUp:
stabilizationWindowSeconds: 30 # React quickly to load spikes
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5min before scaling down
Practice 6: GitOps Over kubectl apply — The Manual Deploy That Corrupted Our Production State
GitOps means Git is the single source of truth for your cluster state. ArgoCD or Flux continuously reconcile the live cluster against the desired state declared in Git. Manual kubectl apply commands in production create configuration drift — a gap between what Git says should be running and what is actually running.
A developer applied a "quick hotfix" directly with kubectl during an incident — changed a ConfigMap value to fix a broken feature flag. The incident was resolved. Two weeks later, during a planned deployment, our CI/CD pipeline applied the Git state to the cluster. The ConfigMap reverted to the old value. The feature flag broke again. Nobody remembered the kubectl hotfix. It was not in Git, not in the deployment log, not in the incident ticket. We spent 4 hours debugging a problem that had been "solved" two weeks earlier. [INTERNAL LINK: CI/CD Pipeline Setup 2026 → bioquro.com/cicd-pipeline-setup-2026]
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: api-service
namespace: argocd
spec:
project: production
source:
repoURL: https://github.com/bioquro/k8s-manifests
targetRevision: main
path: services/api-service/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true # Delete resources removed from Git
selfHeal: true # Revert manual kubectl changes automatically
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
retry:
limit: 3
backoff:
duration: 5s
factor: 2
# selfHeal: true is the key setting.
# It means any manual kubectl change to this namespace
# will be automatically reverted within 3 minutes.
# The cluster state always matches Git. Always.
Practice 7: Namespace Isolation and RBAC — The Developer Who Deleted the Production Database Secret
Kubernetes RBAC (Role-Based Access Control) restricts what users and service accounts can do in the cluster. Namespaces provide isolation between environments and teams. Without proper RBAC, any developer with cluster access can accidentally (or intentionally) modify, delete, or read resources in any namespace — including production secrets.
A junior developer running kubectl commands to debug their staging environment accidentally ran the command against the production cluster context — they had switched contexts earlier and forgotten. They deleted a Secret while trying to recreate a stale one. The Secret contained the production database credentials. Every service that mounted that Secret immediately started failing to authenticate. Six services went down. Recovery required recreating the Secret from a secure backup and rolling all affected deployments. Proper RBAC that restricted the developer to their namespace would have made the delete command return "Forbidden" instead of executing. [INTERNAL LINK: Database Encryption Security → bioquro.com/database-encryption-security]
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: developer-role
namespace: staging # Scoped to staging only — not production
rules:
# Full access to read/debug workloads
- apiGroups: ["apps"]
resources: ["deployments", "replicasets"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: [""]
resources: ["pods", "pods/log", "pods/exec"]
verbs: ["get", "list", "watch", "create"]
# Explicitly NO delete on secrets
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "watch"]
# Secrets: read-only, never delete
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list"] # No create, update, delete, patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: developer-binding
namespace: staging
subjects:
- kind: User
name: developer@bioquro.com
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: developer-role
apiGroup: rbac.authorization.k8s.io
The Complete Production-Ready Deployment Manifest
Every practice above, combined into a single copy-paste-ready template. This is the baseline Deployment manifest Bioquro uses for every production service: [INTERNAL LINK: Docker Best Practices → bioquro.com/docker-best-practices-production-2026]
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
namespace: production
labels:
app: api-service
version: v1.8.2
team: backend
cost-center: platform # FinOps tagging
annotations:
kubernetes.io/description: "Main API service for Bioquro platform"
spec:
replicas: 3
selector:
matchLabels:
app: api-service
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0 # Zero-downtime deploys
maxSurge: 1
template:
metadata:
labels:
app: api-service
version: v1.8.2
spec:
# Pod anti-affinity: spread replicas across nodes
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: api-service
topologyKey: kubernetes.io/hostname
# Security context: non-root execution
securityContext:
runAsNonRoot: true
runAsUser: 1001
fsGroup: 1001
seccompProfile:
type: RuntimeDefault
containers:
- name: api
image: ghcr.io/bioquro/api:v1.8.2 # Pinned — never :latest
ports:
- containerPort: 8000
# Resource management (Practice 1)
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
# Health probes (Practice 2)
startupProbe:
httpGet:
path: /health/live
port: 8000
failureThreshold: 30
periodSeconds: 10
livenessProbe:
httpGet:
path: /health/live
port: 8000
initialDelaySeconds: 0
periodSeconds: 15
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 8000
periodSeconds: 10
failureThreshold: 3
# Container security
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
# Secrets from external store — never hardcoded
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: api-secrets
key: database-url
# Graceful shutdown
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
terminationGracePeriodSeconds: 30
Production Readiness Checklist — Level by Level
| Level | Practice | Impact if Skipped |
|---|---|---|
| L2 | Resource requests + limits on every container | Node eviction cascade, OOM kills |
| L2 | Liveness + readiness + startup probes | Traffic to broken pods, restart loops |
| L2 | Non-root container execution | Container escape = host root access |
| L3 | Pod Disruption Budget on all Deployments | Autoscaler takes down all replicas |
| L3 | Pinned image tags (no :latest) | Replicas run different code versions |
| L3 | Pod anti-affinity rules | All replicas on same node = node failure = full outage |
| L3 | HPA with correct scale-up threshold (60%, not 80%) | Already degraded before scaling starts |
| L4 | RBAC scoped to namespaces per team | Accidental production resource deletion |
| L4 | Secrets from external KMS, not K8s Secrets | Base64 ≠ encryption — secrets readable by anyone with cluster access |
| L5 | GitOps with selfHeal enabled | Manual kubectl changes create invisible drift |
Frequently Asked Questions
The five non-negotiable Level 2 practices: set resource requests AND limits on every container, configure liveness and readiness probes separately, run containers as non-root users, pin image tags to specific versions (never :latest), and configure Pod Disruption Budgets before enabling any autoscaling. Teams that get these five right avoid 90% of production Kubernetes incidents.
A liveness probe answers: "Is this container alive? Should Kubernetes restart it?" A readiness probe answers: "Is this container ready to receive traffic?" A container can be alive but not ready — for example, during startup while connecting to a database. Kubernetes removes unready pods from Service endpoints but does not restart them. Use a startup probe for slow-starting applications to prevent liveness probes from triggering restart loops during initialization.
The :latest tag is mutable — it resolves to a different image every time a new build is pushed. When Kubernetes reschedules a pod after a node failure, it pulls whatever :latest currently points to, which may be a version never tested or approved for production. This silently breaks the guarantee that all replicas run identical code. Always tag images with a specific SHA digest or semantic version and update it deliberately through your CI/CD pipeline.
A Pod Disruption Budget tells Kubernetes the minimum number of pod replicas that must remain available during voluntary disruptions — node drains, cluster upgrades, or autoscaler scale-down events. Without a PDB, the cluster autoscaler can drain multiple nodes simultaneously, taking all replicas of a Deployment offline at once. A PDB with minAvailable: 1 guarantees at least one replica stays running during any voluntary disruption.
The Production Readiness Score is a 5-level maturity framework: Level 1 (Deployed) — app runs in a pod. Level 2 (Stable) — resource limits, health probes, non-root user. Level 3 (Resilient) — PDB, anti-affinity, pinned image tags, HPA. Level 4 (Secure) — RBAC, NetworkPolicy, external secrets. Level 5 (Production-Grade) — GitOps, FinOps, SBOM, multi-cluster. Most teams skip to Level 5 concerns while missing Level 2 basics — which causes the majority of production incidents.
What is your current Production Readiness Score?
Go through the Level 2 checklist right now — set a 20-minute timer. If any item is missing, that is your highest-priority Kubernetes task this week. Leave a comment with your score or the specific incident that sent you looking for this guide.

Comments
Post a Comment