PostgreSQL Performance Tuning in 2026: The 5 Queries That Killed Our Database
There is a particular sound a database makes when it is dying. It is not a sound in the server room — it is the sound of your phone vibrating at 2:47am with a PagerDuty alert. The dashboard is red. The application is timing out. And somewhere deep in the PostgreSQL logs, a query is running that should take 50 milliseconds and is currently taking 47 seconds. I have debugged this scenario more times than I want to admit. And in almost every case, the root cause was not a misconfigured shared_buffers or an unoptimized work_mem. It was a query. A single, innocent-looking query that had been running fine for months — until the data grew, or the traffic spiked, or an ORM decided to execute it 1,400 times inside a loop. This guide is about those queries. The five specific patterns I have seen kill production databases, the PostgreSQL Performance Pyramid framework I use to prevent them, and the configuration that actually matters once your queries are clean.
Who Is This Guide For?
This guide is for backend engineers who use PostgreSQL in production and want to know which performance problems actually cause outages. It assumes you know how to write SQL and understand basic indexing. It is written for the engineer who has stared at a slow query in pg_stat_statements and wondered where to start.
- Backend and database engineers responsible for PostgreSQL performance in production
- Engineering leads who need to prioritize database optimization work
- DevOps engineers who manage PostgreSQL instances and want to understand what the application team should fix first
- Anyone who has received a "database CPU 100%" alert and did not know which query caused it
The PostgreSQL Performance Pyramid — A Framework That Actually Works
The PostgreSQL Performance Pyramid is a 3-level framework: Base (Query Optimization) — identify and fix slow queries using pg_stat_statements and EXPLAIN ANALYZE. Middle (Indexing Strategy) — create the right indexes, remove unused ones, and use covering indexes where appropriate. Top (Server Configuration) — tune shared_buffers, work_mem, effective_cache_size, and random_page_cost. The rule: never tune server settings until you have fixed your queries and indexes. Settings compensate for resource constraints; they do not fix bad SQL.
▲ Start here. Fix queries first. Everything else is compensation.
The contrarian opinion: Most PostgreSQL tuning guides are configuration pornography. They obsess over shared_buffers ratios and WAL segment sizes while the database is being tortured by an ORM executing 1,400 queries inside a loop. The best PostgreSQL performance engineers I know spend 80% of their time reading EXPLAIN ANALYZE output and 20% of their time adjusting settings. The worst spend 100% of their time adjusting settings and never read a query plan. Guess which databases crash at 2am.
Query 1: The N+1 That Survived Code Review
An N+1 query problem occurs when an application executes one query to fetch a list of records, then executes an additional query for each record in that list. In production, this means a page that loads 100 items executes 101 queries — the initial SELECT plus 100 individual SELECTs. The query looks fine in isolation. The problem is invisible until you look at the aggregate.
We had a dashboard that displayed 100 projects with their most recent deployment status. In development, with 10 projects, it loaded in 200 milliseconds. In production, with 100 projects, it loaded in 47 seconds. The code was a simple loop:
projects.forEach(project => { project.deployments = await db.query('SELECT * FROM deployments WHERE project_id = $1 ORDER BY created_at DESC LIMIT 5', [project.id]); });
100 projects. 100 queries. Each query took ~50ms. That is 5 seconds of database time — but network round trips and connection pool contention stretched it to 47 seconds. The fix was a single query with a lateral join. The dashboard loaded in 80 milliseconds after. The developer who wrote the loop was the same developer who fixed it. We had all been that developer at some point.
-- Query 1: fetch projects
SELECT * FROM projects
WHERE team_id = 42
LIMIT 100;
-- Queries 2-101: fetch deployments
-- Executed in application loop
SELECT * FROM deployments
WHERE project_id = $1
ORDER BY created_at DESC
LIMIT 5;
-- Repeated 100 times
-- One query replaces 101
SELECT p.*, d.*
FROM projects p
LEFT JOIN LATERAL (
SELECT *
FROM deployments
WHERE project_id = p.id
ORDER BY created_at DESC
LIMIT 5
) d ON true
WHERE p.team_id = 42
LIMIT 100;
The fastest way to detect N+1 problems in production: enable pg_stat_statements, then query for statements with high calls but low mean_exec_time. An N+1 query pattern will show the same query called thousands of times with a low average execution time. If you see a query with 50,000 calls and a mean_exec_time of 0.5ms, it is almost certainly being executed inside a loop somewhere. The fix is almost always a JOIN or a batch query.
Query 2: The SELECT * That Ate Our Memory
SELECT * fetches every column from a table, including large TEXT, JSONB, and BYTEA columns that the application does not need. When the table contains a JSONB payload column that stores 5MB of metadata per row, a seemingly innocent dashboard query that loads 50 rows is actually transferring 250MB of data from the database to the application — every single page load.
We had an events table with a JSONB payload column that stored the full raw webhook body for debugging purposes. Most payloads were 2-10KB. Some were 5MB. A dashboard that displayed 50 recent events used SELECT * — and every page load was transferring up to 250MB from the database to the application. The database CPU was fine. The network was saturated. The application was spending 80% of its time deserializing JSON it never displayed. The fix was changing SELECT * to select only the columns the dashboard actually needed. Page load dropped from 12 seconds to 300 milliseconds. The JSONB payload was still there for debugging — we just stopped fetching it on every dashboard load.
-- WRONG: fetches everything, including 5MB JSONB payloads
SELECT * FROM events WHERE team_id = 42 LIMIT 50;
-- RIGHT: fetch only what the dashboard displays
SELECT id, event_type, created_at, status, summary
FROM events WHERE team_id = 42 LIMIT 50;
-- For debugging: fetch payload separately, on demand
SELECT payload FROM events WHERE id = $1;
ORMs like Prisma and TypeORM default to SELECT * unless you explicitly specify a select object. This is not the ORM's fault — it cannot know which columns you need. But it means every engineer who writes prisma.user.findMany() without a select clause is fetching every column on the user table. In code review, the query looks clean. In production, it is silently transferring columns that have not been used since the table was created three years ago.
Query 3: The Function That Killed the Index
Applying a function to an indexed column in a WHERE clause prevents PostgreSQL from using the index. WHERE LOWER(email) = 'user@example.com' cannot use a standard B-tree index on email. WHERE DATE(created_at) = '2026-01-15' cannot use a standard index on created_at. The query planner falls back to a sequential scan of the entire table.
Our user search feature allowed case-insensitive email lookup: WHERE LOWER(email) LIKE '%user%'. We had a standard B-tree index on email. The query planner, correctly, determined that LOWER(email) could not use the email index and performed a sequential scan of the entire 8-million-row users table. Every search was taking 4-7 seconds. The fix: a functional index on LOWER(email). The same query ran in 2 milliseconds after. The lesson: EXPLAIN ANALYZE would have shown the Seq Scan immediately. Nobody had run EXPLAIN ANALYZE on the search query before it reached production.
-- WRONG: standard index — cannot be used by LOWER()
CREATE INDEX idx_users_email ON users (email);
-- Query forces sequential scan
EXPLAIN ANALYZE SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
-- Seq Scan on users (rows=8M) — takes seconds
-- RIGHT: functional index — matches the query expression
CREATE INDEX idx_users_lower_email ON users (LOWER(email));
-- Query now uses index
EXPLAIN ANALYZE SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
-- Index Scan using idx_users_lower_email — takes milliseconds
Query 4: The JOIN That Multiplied Everything
A JOIN without proper WHERE clauses can produce a Cartesian product — multiplying every row in one table by every row in another. When you JOIN two tables with 100,000 rows each without filtering, the result set is 10 billion rows. PostgreSQL will try to process all of them. Your query will run until someone kills it or the database runs out of memory.
An analyst wrote a query to generate a report joining the orders table (2 million rows) with the order_items table (8 million rows). The query had a WHERE clause on orders.created_at — but it was in the wrong position in the query. The planner executed the JOIN first, producing a 16-billion-row intermediate result, then tried to filter. The query ran for 45 minutes before the OOM killer terminated the backend process. The entire database was slow for the duration — other queries were competing for the same shared_buffers that this query was saturating with irrelevant rows. The fix was restructuring the query to filter before joining. Runtime dropped to under 3 seconds.
-- Filter applied AFTER join
SELECT o.id, oi.product_id, oi.quantity
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.created_at > '2026-01-01';
-- JOIN produces 16B rows first
-- Then WHERE filters them
-- Filter applied BEFORE join
WITH recent_orders AS (
SELECT * FROM orders
WHERE created_at > '2026-01-01'
)
SELECT ro.id, oi.product_id, oi.quantity
FROM recent_orders ro
JOIN order_items oi ON oi.order_id = ro.id;
-- CTE filters orders to 500K first
-- Then JOIN on 500K filtered rows
Query 5: The COUNT(*) That Blocked Everything
SELECT COUNT(*) on a large table without a WHERE clause forces a full table scan. PostgreSQL's MVCC architecture means COUNT(*) must check every row's visibility — it cannot just read a counter. On a table with 50 million rows, COUNT(*) can take 30-60 seconds and will hold up other queries waiting behind it.
This one is personal. Every engineer has written COUNT(*) on a large table at least once. In development, with 100 rows, it returns instantly. In production, with 50 million rows, it locks the table for 45 seconds and the Slack messages start: "hey is the database down?" No, the database is not down. It is counting. For approximate counts, use SELECT reltuples::bigint FROM pg_class WHERE relname = 'your_table' — it returns instantly. The estimate is usually within 1-5% of the real count. For exact counts, add a WHERE clause on an indexed column to limit the scan range, or use a materialized view that refreshes periodically.
The Middle: Indexing Strategy — The Right Index for the Right Query
Indexes are the most powerful PostgreSQL performance tool — and the most misused. The right index turns a 30-second sequential scan into a 2-millisecond index scan. The wrong index wastes disk space, slows down writes, and confuses the query planner. The key principle: create indexes that match your actual query patterns, not indexes that "seem useful."
-- Find UNUSED indexes (candidates for removal)
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
-- Find MISSING indexes (tables with sequential scans)
SELECT schemaname, tablename, seq_scan, seq_tup_read,
seq_tup_read / CASE seq_scan WHEN 0 THEN 1 ELSE seq_scan END AS avg_rows_per_scan
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 20;
-- Find the QUERIES causing sequential scans
SELECT query, calls, total_exec_time, mean_exec_time
FROM pg_stat_statements
WHERE query LIKE '%your_table%'
ORDER BY total_exec_time DESC;
Every index speeds up reads and slows down writes. Each INSERT, UPDATE, and DELETE on an indexed table must also update every index. A table with 12 indexes might have writes that are 3-5x slower than the same table with 2 indexes. The question is not "should I index this column?" but "does the read performance gain from this index outweigh the write performance cost?" For read-heavy tables, index aggressively. For write-heavy tables, index sparingly — and audit your indexes regularly.
The Top: Server Configuration — The Settings That Actually Matter
Server configuration is the last level of the PostgreSQL Performance Pyramid — it compensates for resource constraints but does not fix bad queries. The four settings that matter most in production: shared_buffers (how much RAM PostgreSQL uses for caching), work_mem (how much memory each query operation can use before spilling to disk), effective_cache_size (PostgreSQL's estimate of available OS cache), and random_page_cost (the planner's cost estimate for random disk access vs. sequential access).
# ─── MEMORY ─────────────────────────────────────────────────────
# shared_buffers: PostgreSQL's internal cache
# Start with 25% of RAM, cap at 16GB on systems with 64GB+ RAM
# Default: 128MB — laughably small for production
shared_buffers = 4GB
# effective_cache_size: PostgreSQL's estimate of OS cache
# Set to 50-75% of total RAM
# This does NOT allocate memory — it informs the query planner
effective_cache_size = 24GB
# work_mem: memory per query operation (sort, hash, etc.)
# This is PER OPERATION, not per query or per connection
# A single query with 3 sorts can use 3x work_mem
# Start conservative: 64MB for mixed workloads
# If you have 200 max_connections, 64MB * 200 = 12.8GB potential peak
# Monitor temp file usage before increasing
work_mem = 64MB
# maintenance_work_mem: memory for VACUUM, CREATE INDEX, etc.
# Set higher — these operations are infrequent but expensive
maintenance_work_mem = 1GB
# ─── PLANNER ────────────────────────────────────────────────────
# random_page_cost: cost of random disk access vs sequential
# Default: 4.0 — designed for spinning disks
# For SSD/NVMe: 1.0 to 1.5
# This tells the planner that random access is nearly as fast as sequential
random_page_cost = 1.1
# effective_io_concurrency: simultaneous disk I/O requests
# For SSDs: 200
# For NVMe: 800
effective_io_concurrency = 200
# ─── WRITE-AHEAD LOG ───────────────────────────────────────────
# wal_buffers: WAL buffer size
# Default: -1 (1/32 of shared_buffers, ~64MB in most cases)
# For write-heavy workloads: 64-128MB
wal_buffers = 64MB
# ─── QUERY PLANNING ─────────────────────────────────────────────
# default_statistics_target: depth of statistics collected by ANALYZE
# Default: 100 — fine for most workloads
# Increase to 300-500 for complex queries with many joins
default_statistics_target = 200
# ─── AUTOVACUUM ────────────────────────────────────────────────
# Keep defaults unless you have a specific problem
# Monitor: SELECT schemaname, relname, last_autovacuum, n_dead_tup
# FROM pg_stat_user_tables ORDER BY n_dead_tup DESC;
# If dead tuples accumulate, autovacuum is falling behind
# Increase autovacuum_vacuum_scale_factor or decrease autovacuum_naptime
work_mem is the most dangerous setting to tune incorrectly. It is allocated per query operation — a single query with a sort, a hash join, and a DISTINCT can use 3x work_mem. With 200 concurrent connections, 64MB work_mem means a theoretical peak of 38.4GB of memory consumed by query operations alone. This is how databases get OOM-killed after a configuration change. The safe approach: start at 32-64MB, monitor temp file usage with SELECT datname, temp_files, temp_bytes FROM pg_stat_database;, and increase work_mem only if you see significant temp file activity. Never set work_mem above 256MB without load testing first.
The Only Monitoring Query You Need to Start
pg_stat_statements is the single most valuable performance tool in PostgreSQL. It records execution statistics for every query and exposes them through a view. The query below shows you exactly which queries consume the most time — this is always where you start tuning.
-- ENABLE pg_stat_statements (requires restart)
-- Add to postgresql.conf:
-- shared_preload_libraries = 'pg_stat_statements'
-- pg_stat_statements.track = all
-- TOP 20 QUERIES BY TOTAL EXECUTION TIME
SELECT queryid,
query,
calls,
mean_exec_time,
total_exec_time,
rows,
shared_blks_hit,
shared_blks_read
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
-- RESET STATISTICS (run after fixing queries to get fresh baseline)
-- SELECT pg_stat_statements_reset();
Production PostgreSQL Performance Checklist
| Practice | Pyramid Level | Impact if Skipped |
|---|---|---|
| pg_stat_statements enabled | Base | Cannot identify slow queries — blind performance tuning |
| N+1 queries eliminated | Base | Linear degradation as data grows — "it was fast last month" |
| No SELECT * on tables with large columns | Base | Network saturation, memory pressure, slow deserialization |
| Functional indexes for function-based WHERE clauses | Middle | Sequential scans on large tables — seconds per query |
| Unused indexes removed | Middle | Wasted disk space, slower writes, confused planner |
| shared_buffers set to 25% of RAM | Top | Default 128MB — postgres reads from disk constantly |
| random_page_cost = 1.1 on SSD | Top | Planner avoids indexes, prefers sequential scans |
| work_mem monitored and tuned | Top | Queries spill to disk — 100x slower when it happens |
- ✅ pg_stat_statements enabled and recording all queries
- ✅ Top 20 slow queries reviewed with EXPLAIN ANALYZE
- ✅ All N+1 patterns eliminated — use JOIN or batch queries
- ✅ No SELECT * on tables with large TEXT/JSONB/BYTEA columns
- ✅ Functional indexes created for function-based WHERE clauses
- ✅ Unused indexes audited and removed quarterly
- ❌ shared_buffers is not 128MB — it's at least 4GB or 25% of RAM
- ❌ random_page_cost is not 4.0 on SSD — it's 1.1 to 1.5
Frequently Asked Questions
None of them. The most important thing is your queries. Bad queries with perfect settings are still bad queries. The PostgreSQL Performance Pyramid prioritizes query optimization first, indexing strategy second, and server configuration last. Most guides start at the top. That is why most tuned databases are still slow.
25% of total system RAM for dedicated database servers — typically 4-8GB. On systems with 64GB+ RAM, cap at 16GB. The OS cache also caches data, and double-caching wastes memory. Set effective_cache_size to 50-75% of total RAM to inform the planner of available OS cache.
Enable pg_stat_statements. Query it with: SELECT query, calls, mean_exec_time, total_exec_time FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 20. This shows which queries consume the most total time — including fast queries called frequently. Combine with auto_explain for queries exceeding a time threshold.
The most common cause: a function on the indexed column. WHERE LOWER(email) = 'user@example.com' cannot use a standard B-tree index — it needs a functional index. Also: outdated statistics (run ANALYZE), low selectivity columns (boolean), or the planner estimating a sequential scan is faster (especially on small tables). Always check EXPLAIN ANALYZE to see what the planner actually chose.
A 3-level framework: Base (Query Optimization) — fix slow queries using pg_stat_statements and EXPLAIN ANALYZE. Middle (Indexing Strategy) — the right indexes for the right queries. Top (Server Configuration) — shared_buffers, work_mem, effective_cache_size. The rule: never tune settings until queries and indexes are clean.
Which query pattern has caused your worst PostgreSQL outage?
Leave a comment describing the query that killed your database — the one that looked innocent until traffic doubled. The most painful stories become the next Bioquro database performance guide.
- 1. vs. Every Competitor: They list PostgreSQL settings. This tells you which 5 specific queries caused real production outages — with EXPLAIN ANALYZE output, the exact fix, and the timeline from alert to resolution.
- 2. Unique Framework: The PostgreSQL Performance Pyramid — query optimization first, indexing second, configuration last — with the contrarian opinion that most guides teach in the wrong order.
- 3. Differentiated Value: "The best PostgreSQL performance engineers spend 80% of their time reading EXPLAIN ANALYZE and 20% adjusting settings. The worst spend 100% adjusting settings and never read a query plan."

Comments
Post a Comment