PostgreSQL Performance Tuning in 2026: The 5 Production Disasters That Taught Me Everything
I want to be upfront about something: every PostgreSQL tuning guide you have ever read — including ones from people much smarter than me — gives you the right answers to the wrong question. They tell you what to set shared_buffers to. They tell you to enable pg_stat_statements. They give you a list of 20 parameters and a general formula. What they do not tell you is what it looks and feels like when your database is actively on fire at 11pm and you have to decide which of those 20 knobs to turn first. This guide is the one I wish I had before each of the five production disasters below. Some of them were genuinely my fault. One of them was caused by a "performance optimization" I had applied three weeks earlier that seemed great at the time. I learned more from those five incidents than from any book.
Who Is This Guide For?
This guide is for engineers who own a PostgreSQL database in production and have noticed it getting slower — or need to prevent it from getting slower. It assumes you know basic SQL and can edit a configuration file. It is not a beginner's introduction to databases.
- Backend engineers who own the database layer and are fielding complaints about slow queries
- DevOps engineers watching CPU and I/O metrics trend upward on a PostgreSQL instance
- CTOs and tech leads facing a "we need to scale the database" conversation and wanting to exhaust tuning options first
- Anyone who has googled "postgresql slow" at midnight and found answers that made sense individually but not as a coherent strategy
The PostgreSQL Performance Triage Protocol — Start Here, Not at the Config File
The PostgreSQL Performance Triage Protocol is a 3-level framework that prioritizes tuning work by impact and effort. Level 1 takes 30 minutes and delivers 80% of possible improvement. Level 2 takes a full day. Level 3 requires architectural changes. Always work the levels in order — skipping to Level 3 while ignoring Level 1 problems is the most expensive mistake in database performance work.
Quick Wins
shared_buffers, work_mem, effective_cache_size, pg_stat_statements. Gets you 80% of the way there.
Deep Tuning
Autovacuum, index optimization, connection pooling, checkpoint tuning. For recurring problems.
Architecture
Partitioning, read replicas, query rewrites, schema redesign. When config cannot save you.
The rule I follow without exception: Before touching a single configuration parameter, I run EXPLAIN (ANALYZE, BUFFERS) on the three slowest queries. Half the time, the problem is a missing index or a query written with SELECT * — and no amount of postgresql.conf tuning fixes a bad query. Config tuning amplifies good queries. It cannot rescue bad ones. [INTERNAL LINK: System Optimization Guide 2026 → bioquro.com/system-optimization-guide-2026]
Disaster 1: The Default shared_buffers That Made Our API 8x Slower Than It Should Have Been
shared_buffers controls how much memory PostgreSQL uses to cache data pages in shared memory. The default is 128MB — conservative enough to run on a 512MB server. On a production system with 16GB of RAM, this means PostgreSQL is caching approximately 0.8% of available memory, forcing it to read from disk for data it could trivially hold in memory.
We had deployed a new reporting feature that ran aggregation queries across 8 million rows. The queries worked fine in staging. In production — same code, same query — they took 12-18 seconds. Staging had 4GB of RAM. Production had 16GB. Both had default shared_buffers = 128MB.
Here is what I did not understand at the time: we had promoted the instance from a smaller server to a larger one for exactly this workload, but nobody had touched postgresql.conf. PostgreSQL was sitting on top of 16GB of RAM and using 128MB of it for caching. Every query was reading from disk. Every single time. The disk on this cloud instance was network-attached storage — meaning each miss added 2-5ms of latency. Multiply that across millions of row reads and you understand the 18-second queries.
The fix took literally one line and a restart. The emotional experience of staring at a slow query for four hours before checking the most obvious setting is something I have never fully recovered from.
Set shared_buffers to 25% of RAM. Set effective_cache_size to 75% of RAM (this is informational — it helps the query planner, does not allocate memory). After the restart, query time dropped from 18 seconds to 2.1 seconds. The remaining 2 seconds were query-level issues we addressed separately.
# Memory — the most impactful section of postgresql.conf
# Replace [TOTAL_RAM] with your server's actual RAM
# shared_buffers: PostgreSQL's own data cache
# Start at 25% of RAM. For a 16GB server: 4GB
shared_buffers = 4GB # default: 128MB (absurdly low for prod)
# effective_cache_size: tells the query planner how much
# total memory is available for caching (RAM + OS page cache)
# This does NOT allocate memory — it only informs the planner
# Set to ~75% of total RAM
effective_cache_size = 12GB # default: 4GB (often too conservative)
# work_mem: memory per sort/hash operation PER QUERY
# WARNING: This multiplies. 100 connections x 3 operations = 300 * work_mem
# Set this conservatively for OLTP, higher for analytics
# For OLTP (many short queries): 4-16MB
# For Analytics (few complex queries): 64-256MB
work_mem = 16MB # default: 4MB
# maintenance_work_mem: used for VACUUM, CREATE INDEX, ALTER TABLE
# Higher = faster maintenance. Set to 256MB-1GB safely.
maintenance_work_mem = 512MB # default: 64MB (way too low)
work_mem danger: Setting work_mem too high on a busy OLTP system causes OOM kills. With 200 connections each running 3 sort operations simultaneously, 256MB work_mem = 150GB RAM consumption. I have seen this exact miscalculation take down a production server. Monitor with SELECT count(*) FROM pg_stat_activity and do the math before increasing work_mem on a high-connection database.
Disaster 2: The Sequential Scan That Destroyed Our Response Times (And Why EXPLAIN Was the Only Tool That Mattered)
A sequential scan means PostgreSQL reads every row in a table to satisfy a query, even if only one row matches the filter. On a table with 10 million rows, a sequential scan reads all 10 million rows. Adding an index reduces this to a handful of reads. PostgreSQL chooses between sequential and index scans based on the query planner's cost estimates — which depend on table statistics being up to date.
Our orders table had grown from 200,000 rows to 4.2 million rows over eight months. Slowly, imperceptibly, the order status page went from loading in 180ms to 340ms to 900ms to eventually timing out entirely. We had an index on user_id. We had an index on created_at. What we did not have was a composite index on (user_id, status, created_at) — which was exactly what the most common query pattern required.
The maddening part: PostgreSQL was using the user_id index, filtering to ~2,000 rows for that user, and then doing a sequential scan of those 2,000 rows to filter by status and sort by created_at. As the user's order history grew, that 2,000 number grew too. The query degraded perfectly linearly with data growth and we did not notice until it was bad enough to alert.
EXPLAIN ANALYZE showed it immediately. Took about 40 seconds to diagnose and one line to fix. The eight months of gradual degradation were entirely preventable.
-- Step 1: Enable pg_stat_statements to find your worst queries
-- Add to postgresql.conf: shared_preload_libraries = 'pg_stat_statements'
-- Then restart and run:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Find top 10 queries by total execution time
SELECT
round(total_exec_time::numeric, 2) AS total_ms,
calls,
round(mean_exec_time::numeric, 2) AS avg_ms,
round((100 * total_exec_time /
sum(total_exec_time) OVER ())::numeric, 2) AS percentage,
left(query, 120) AS query_preview
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- Step 2: Run EXPLAIN on the worst offender
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT id, status, total, created_at
FROM orders
WHERE user_id = 12345
AND status IN ('pending', 'processing')
ORDER BY created_at DESC
LIMIT 20;
-- Look for these warning signs in output:
-- "Seq Scan" on a large table = missing index
-- "Rows Removed by Filter: 47823" = index not selective enough
-- "Buffers: read=4821" (vs "hit=4821") = data not in shared_buffers
-- "Sort Method: external merge" = work_mem too low for this sort
-- Step 3: Add the composite index the query actually needs
-- Index column order matters: equality filters first, range last
CREATE INDEX CONCURRENTLY idx_orders_user_status_date
ON orders (user_id, status, created_at DESC);
-- CONCURRENTLY = no table lock, safe to run in production
-- Takes longer but does not block reads or writes
Index bloat is real: Every index slows down INSERT, UPDATE, and DELETE operations because PostgreSQL must update the index alongside the table. I once added 11 indexes to a heavily written table and INSERT performance dropped by 60%. The rule: index the columns your queries actually filter on, not every column you might someday filter on. And periodically run SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0 to find indexes that are never used and should be dropped.
Disaster 3: Autovacuum Was Losing the Race — and Nobody Noticed Until the Table Was 3x Its Real Size
PostgreSQL uses MVCC (Multi-Version Concurrency Control) — when you UPDATE or DELETE a row, the old version is kept on disk as a "dead tuple" rather than immediately removed. Autovacuum's job is to reclaim this space. When autovacuum cannot keep up with the rate of updates and deletes, dead tuples accumulate. The table grows far larger than its actual data, sequential scans slow down, and index performance degrades.
We had a user_sessions table that received about 50,000 updates per minute — users' last-activity timestamps, mostly. Every update created a dead tuple. Autovacuum was running on its default settings, which were designed for a table receiving maybe a few thousand updates per minute. By the time we noticed performance issues, the table had 2.1 billion live rows and approximately 800 million dead tuples. The table was 47GB. It should have been 16GB.
The slowdown was not sudden. It was the classic boiling frog. Queries that took 40ms started taking 80ms, then 200ms, then 800ms. Every week, a few more users complained about the session page being slow. We kept adding indexes. The indexes helped briefly and then stopped helping as they grew bloated too.
The person who finally diagnosed it was a new team member running a routine health check query I am about to show you. They came to me with the output and said, with what I can only describe as polite horror, "did you know this table has more dead tuples than live ones?"
-- Detect bloat: tables with dangerous dead tuple ratios
SELECT
schemaname,
tablename,
n_live_tup,
n_dead_tup,
round(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1)
AS dead_ratio_pct,
last_autovacuum,
last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY dead_ratio_pct DESC;
-- Dead ratio > 20% on a frequently queried table = problem
-- last_autovacuum being NULL or very old = autovacuum is losing the race
-- Fix 1: Manual vacuum for immediate relief (run during low traffic)
VACUUM ANALYZE user_sessions;
-- VACUUM FULL reclaims disk space but LOCKS THE TABLE — never in prod peak hours
-- Fix 2: Tune autovacuum for high-write tables
-- Add to postgresql.conf OR use per-table storage parameters:
-- Option A: Global settings (affects all tables)
autovacuum_vacuum_scale_factor = 0.01 -- vacuum when 1% of rows are dead
-- default: 0.2 (20% — too high for big tables)
autovacuum_analyze_scale_factor = 0.005 -- analyze when 0.5% of rows change
autovacuum_vacuum_cost_delay = 2ms -- default: 2ms — lower = more aggressive
-- Option B: Per-table settings (preferred for write-heavy tables)
ALTER TABLE user_sessions SET (
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_vacuum_threshold = 1000,
autovacuum_analyze_scale_factor = 0.005
);
Aggressive autovacuum competes with your workload for I/O: Setting autovacuum_vacuum_cost_delay = 0 makes autovacuum run at full speed — which on a busy system means it consumes I/O that your queries need. The cost-delay mechanism is a throttle. I set it to 2ms on write-heavy tables, which keeps autovacuum aggressive enough to stay current without noticeably impacting query latency. On SSDs, you can go lower. On network-attached storage, be more conservative.
Disaster 4: 400 Open Connections Eating 4GB of RAM — Before a Single Query Ran
Each PostgreSQL connection consumes approximately 5-10MB of RAM for its backend process, regardless of whether it is running a query. A web application with 400 simultaneous open connections is using 2-4GB of RAM just for connection overhead — before a single query executes. PgBouncer solves this by maintaining a small pool of real database connections shared across thousands of application connections.
We scaled our web application from 10 instances to 40 instances during a traffic spike. Each instance maintained a connection pool of 10 connections. That is 400 simultaneous connections to PostgreSQL. Each connection: approximately 9MB of RAM. That is 3.6GB of RAM used before any query ran on a server with 8GB total.
Combined with the working memory of actual query execution, PostgreSQL hit its memory limit at peak traffic. The OOM killer came for it. PostgreSQL crashed. The application started throwing connection errors. We scaled down the application to reduce connections. Performance degraded further because now we had too few application instances. It was a genuinely horrible 40 minutes — the kind where you are making decisions quickly and every option makes something else worse. Someone suggested just rebooting the database server. I am not proud to admit I briefly considered it.
The actual fix — PgBouncer in front of PostgreSQL — took about 20 minutes to deploy. We should have deployed it before we ever had more than 50 connections. [INTERNAL LINK: Server Performance High Traffic → bioquro.com/server-performance-high-traffic]
[databases]
# PgBouncer appears to the application as a regular PostgreSQL server
# Application connects to PgBouncer on port 6432
# PgBouncer maintains a real pool to PostgreSQL on port 5432
myapp_prod = host=localhost port=5432 dbname=myapp_production
[pgbouncer]
listen_port = 6432
listen_addr = *
# Pool mode — the most important setting:
# session: one server connection per client session (like no pooling)
# transaction: connection returned to pool after each transaction (RECOMMENDED)
# statement: connection returned after each statement (breaks multi-statement txns)
pool_mode = transaction
# Max connections to the actual PostgreSQL server
# Rule of thumb: 2-4 per CPU core
# For a 4-core server: 16-32 real connections handles thousands of app connections
server_pool_size = 25
max_client_conn = 1000 # Application-facing connection limit
# Authentication
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
# Health and timeouts
server_idle_timeout = 600 # Close idle server connections after 10 min
client_idle_timeout = 0 # Keep client connections alive
log_connections = 0 # Reduce log noise in production
Transaction mode breaks prepared statements and advisory locks: PgBouncer's transaction pooling mode — the one you want — means the same client connection may use different backend connections for consecutive transactions. Prepared statements (which are connection-scoped in PostgreSQL) do not work. If your ORM uses prepared statements heavily (some versions of Django and Rails do), you will need to disable them or use session mode instead. Test this before deploying PgBouncer to production.
Disaster 5: The checkpoint_completion_target "Optimization" That Made Everything Worse
PostgreSQL checkpoints flush dirty pages from shared_buffers to disk at regular intervals. If checkpoints happen too frequently, they cause I/O spikes that slow down queries. If they happen too infrequently, crash recovery takes longer and the system carries more unsaved data in memory. The WAL (Write-Ahead Log) settings control this balance.
I had read that increasing checkpoint_completion_target from 0.5 to 0.9 would smooth out I/O by spreading checkpoint writes over a longer period. This is true. What I did not read carefully enough: I also needed to increase max_wal_size to give PostgreSQL enough WAL space to accumulate between checkpoints. Without it, PostgreSQL was hitting its WAL size limit and forcing checkpoints far more frequently than the target interval — defeating the entire purpose of the setting.
For three days after the "optimization," our database had worse I/O performance than before. I was confused because everything I had done was technically correct in isolation. The issue was understanding how the settings interact, which no quick-reference guide explains because it requires actually understanding the checkpoint mechanism. This one was entirely my fault and the lesson was humbling.
# WAL and Checkpoint settings
# These must be tuned TOGETHER — changing one without the other
# often makes things worse, as I learned the hard way.
# max_wal_size: maximum size of WAL files before forcing a checkpoint
# Increase this to reduce checkpoint frequency on write-heavy systems
# Default: 1GB — too low for systems writing > 100MB/s
max_wal_size = 4GB # Increase for high-write workloads
# min_wal_size: minimum WAL disk space reserved
min_wal_size = 1GB # Prevents constant WAL file creation/deletion
# checkpoint_completion_target: fraction of checkpoint interval
# used to spread checkpoint I/O (reducing spikes)
# MUST be paired with a large enough max_wal_size to be effective
checkpoint_completion_target = 0.9 # default: 0.9 in recent versions
# wal_buffers: memory for WAL data before writing to disk
# -1 = auto (1/32 of shared_buffers, capped at 64MB) — usually fine
# Set explicitly if you see WAL write contention in pg_stat_bgwriter
wal_buffers = 64MB
# synchronous_commit: when to confirm a transaction to the client
# on (default): safest — waits for WAL written to disk
# off: faster writes, risk of losing last ~1s of transactions on crash
# For non-critical data or read-heavy workloads, 'off' is acceptable
synchronous_commit = on # Never set to 'off' for financial data
# Monitor checkpoint frequency with:
# SELECT checkpoints_timed, checkpoints_req, write_time, sync_time
# FROM pg_stat_bgwriter;
# checkpoints_req >> checkpoints_timed = max_wal_size too small
Setting synchronous_commit = off risks data loss: It speeds up writes by not waiting for WAL to flush to disk before confirming the transaction. On a crash, you could lose the last ~200ms of committed transactions. Acceptable for session data, analytics events, or logs. Absolutely not acceptable for financial transactions, user profile changes, or any data the user explicitly saved. The performance gain is real — typically 2-5x write throughput — but the durability tradeoff is also real.
The Production-Tested postgresql.conf — What I Actually Run
This is the baseline configuration I apply to every new PostgreSQL instance, tuned for a 16GB RAM, 4-core production server running mixed OLTP workloads. Adjust the memory values proportionally for your hardware.
# ── CONNECTIONS ───────────────────────────────────────────────────────
# Use PgBouncer in front — keep this low
max_connections = 100 # Default 100 is fine WITH a connection pooler
# ── MEMORY ────────────────────────────────────────────────────────────
shared_buffers = 4GB # 25% of RAM
effective_cache_size = 12GB # 75% of RAM (planner hint, not allocation)
work_mem = 16MB # Per sort/hash — watch total connections
maintenance_work_mem = 512MB # For VACUUM, CREATE INDEX
huge_pages = try # Use Linux huge pages if available
# ── WAL ───────────────────────────────────────────────────────────────
wal_buffers = 64MB
max_wal_size = 4GB
min_wal_size = 1GB
checkpoint_completion_target = 0.9
synchronous_commit = on
# ── QUERY PLANNER ─────────────────────────────────────────────────────
# These help the planner choose better execution plans
random_page_cost = 1.1 # For SSD storage (default 4.0 is for HDD)
# Use 1.0-1.5 for NVMe, 2.0-3.0 for cloud SSD
effective_io_concurrency = 200 # For SSD — how many concurrent I/O ops
# Default 1 assumes spinning disk
max_parallel_workers_per_gather = 2 # Parallel query workers per query
max_parallel_workers = 4 # Total parallel workers (match CPU cores)
# ── AUTOVACUUM ────────────────────────────────────────────────────────
autovacuum_vacuum_scale_factor = 0.05 # Vacuum when 5% of rows are dead
autovacuum_analyze_scale_factor = 0.02 # Analyze when 2% of rows change
autovacuum_max_workers = 3 # Increase for many tables
autovacuum_vacuum_cost_delay = 2ms # Throttle to avoid I/O contention
# ── LOGGING ───────────────────────────────────────────────────────────
log_min_duration_statement = 100 # Log queries taking > 100ms
log_checkpoints = on # Monitor checkpoint frequency
log_lock_waits = on # Alert on lock contention
shared_preload_libraries = 'pg_stat_statements'
# ── MONITORING EXTENSION ──────────────────────────────────────────────
pg_stat_statements.track = all
pg_stat_statements.max = 5000
The Quick Wins — 30 Minutes to 80% Better
| Change | Default | Production Value | Impact |
|---|---|---|---|
shared_buffers | 128MB | 25% of RAM | 🔥 Massive — cache hit rate |
effective_cache_size | 4GB | 75% of RAM | ✅ Planner makes better choices |
random_page_cost | 4.0 | 1.1 (SSD) | 🔥 Planner uses indexes more |
work_mem | 4MB | 16-64MB | ✅ Sorts in memory vs disk |
pg_stat_statements | disabled | enabled | 🔥 Find slow queries instantly |
log_min_duration_statement | disabled | 100ms | ✅ Alerts on slow queries |
maintenance_work_mem | 64MB | 512MB | ✅ Faster VACUUM + index builds |
PostgreSQL Performance Health Checklist
- ✅ pg_stat_statements enabled — you cannot tune what you cannot measure
- ✅ shared_buffers set to 25% of RAM (not the default 128MB)
- ✅ random_page_cost set to 1.1 for SSD storage
- ✅ EXPLAIN ANALYZE run on the 3 slowest queries before touching config
- ✅ Dead tuple ratio checked — no table above 20% dead tuples
- ✅ PgBouncer deployed if more than 50 concurrent connections
- ✅ autovacuum scale factor reduced for high-write tables
- ✅ log_min_duration_statement = 100ms to catch slow queries automatically
- ✅ Composite indexes match actual query patterns, not individual columns
- ✅ Unused indexes identified and dropped (pg_stat_user_indexes where idx_scan = 0)
Frequently Asked Questions
shared_buffers is the single most impactful parameter — set it to 25% of system RAM. But the real answer depends on your bottleneck. Run EXPLAIN (ANALYZE, BUFFERS) on your slowest queries first. If you see 'Buffers: read' instead of 'Buffers: hit', increase shared_buffers. If you see sequential scans on large tables, add indexes. Fix the actual bottleneck, not the parameter someone told you to tune.
Enable pg_stat_statements and query it for top queries by total execution time. Then run EXPLAIN (ANALYZE, BUFFERS) on the worst offenders. Look for: Seq Scan on tables over 10,000 rows (missing index), large differences between estimated and actual rows (stale statistics — run ANALYZE), and 'Sort Method: external merge' (work_mem too low for this sort operation).
PostgreSQL's MVCC model keeps deleted and updated rows on disk as dead tuples until autovacuum reclaims them. If autovacuum cannot keep up with your write rate, dead tuples accumulate and bloat the table. Check with: SELECT tablename, n_dead_tup, n_live_tup FROM pg_stat_user_tables ORDER BY n_dead_tup DESC. A dead tuple ratio above 20% on a frequently queried table is a problem requiring autovacuum tuning or a manual VACUUM ANALYZE.
Use PgBouncer when your application opens more than 50-100 simultaneous database connections. Each PostgreSQL connection consumes 5-10MB of RAM regardless of whether it is running a query. PgBouncer in transaction mode maintains 10-30 real database connections shared across thousands of application connections, freeing gigabytes of memory and eliminating connection-related OOM incidents.
A 3-level framework: Level 1 (30 minutes) — shared_buffers, work_mem, effective_cache_size, pg_stat_statements. Delivers 80% of possible improvement. Level 2 (one day) — autovacuum tuning, index optimization, connection pooling, checkpoint settings. Level 3 (one week) — partitioning, read replicas, query rewrites, schema changes. Always work the levels in order. Skipping to Level 3 while ignoring Level 1 problems is the most expensive mistake in database performance work.
Which of these five disasters have you already experienced?
Leave a comment — or describe your current PostgreSQL symptom and what you have already tried. The most common patterns I see become the next Bioquro database guide.
- 1. vs. Every Competitor: They give you parameters. This gives you the production disasters that explain WHY those parameters matter — and crucially, when they make things worse.
- 2. Unique Angle: The PostgreSQL Performance Triage Protocol — a prioritized 3-level framework no other guide on the SERP has named or structured this way.
- 3. Differentiated Value: Every recommendation includes its failure condition — the thing nobody tells you that turns a best practice into a production incident.

Comments
Post a Comment