PostgreSQL Indexing Strategies in 2026: Why Your Database Ignores Your Indexes (And How to Fix It)
There is a particular silence in the database when an index exists but the query planner refuses to use it. The application is timing out. The dashboard is red. And somewhere in the PostgreSQL logs, a sequential scan is running on a table with 50 million rows — a table that has an index. The index is there. It was created months ago by a well-meaning engineer who ran a migration. But the query planner has decided, coldly and correctly, that the index is not worth using. This guide is about that moment. The moment you run EXPLAIN ANALYZE expecting to see an Index Scan and see a Seq Scan instead. I will give you the Index Performance Ladder — a 5-level framework for index design — and the exact reasons, with real production stories, why PostgreSQL ignores your indexes. Because creating an index is easy. Creating an index the planner actually trusts is an entirely different skill.
Who Is This Guide For?
This guide is for backend engineers who write SQL against PostgreSQL tables with millions of rows and wonder why their queries are still slow despite having indexes. It assumes you understand basic B-Tree indexes and have run CREATE INDEX at least once. It is written for the engineer who has stared at an EXPLAIN ANALYZE output and thought "why is it doing a sequential scan when I have an index right there?"
- Backend and database engineers responsible for PostgreSQL query performance
- Engineering leads reviewing PRs that add new indexes and wondering if they will actually help
- DevOps engineers who see high CPU on the database and need to identify which indexes are missing
- Anyone who has created an index, deployed it to production, and seen no improvement
The Index Performance Ladder — A Framework for Index Design
The Index Performance Ladder is a 5-level framework for PostgreSQL index design: Level 1 (Standard B-Tree) — handles 80% of production queries. Level 2 (Covering Index) — includes non-key columns to eliminate heap fetches. Level 3 (Partial Index) — indexes only a subset of rows, dramatically reducing index size. Level 4 (Expression Index) — indexes the result of a function or expression. Level 5 (GIN/GiST/BRIN) — specialized indexes for full-text search, JSONB, geometric data, and very large tables. Climb the ladder only when the current level is insufficient. Most performance problems are solved at Level 1 and 2.
Standard B-Tree
Handles 80% of production queries. WHERE =, >, <, BETWEEN, ORDER BY, and JOIN conditions.
Covering Index (INCLUDE)
Stores extra column values in the index. Eliminates heap fetches. Turns Index Scans into Index Only Scans.
Partial Index (WHERE)
Indexes only a subset of rows. 100x smaller for sparse conditions. Ideal for unread notifications, active users, recent orders.
Expression Index
Indexes the result of a function. WHERE LOWER(email) = 'x' needs an index on LOWER(email), not email.
GIN / GiST / BRIN
Specialized indexes. GIN for full-text search and JSONB. GiST for geometric data. BRIN for very large append-only tables.
The contrarian opinion: Most engineers create indexes at Level 1 and stop. They then wonder why their queries are still slow. The problem is not that they created the wrong index. The problem is that they created an index the query planner cannot use effectively — because the index does not cover the columns being selected, or the table statistics are stale, or the index is being skipped because random_page_cost is still set to 4.0 on an SSD. An index is a tool. The query planner is the craftsman holding it. If the craftsman does not trust the tool, the tool sits unused.
Incident 1: The Index That Was Created But Never Used — Stale Statistics
PostgreSQL's query planner relies on table statistics stored in pg_statistic to estimate how many rows an index will return. If the statistics are stale — because ANALYZE has not run recently — the planner may estimate that the index returns too many rows and choose a sequential scan instead. An index on a column with perfect selectivity is useless if the planner thinks the column has only one distinct value.
We had an orders table with 80 million rows. A developer created an index on the customer_id column — the most common filter in our reporting queries. The index was created successfully. \di showed it. But query performance did not improve. The planner was still doing sequential scans. We ran EXPLAIN ANALYZE and saw: Seq Scan on orders (cost=0.00..1845234.00 rows=79234567 width=128) (actual time=0.012..42341.543 rows=5000 loops=1). The planner estimated the index would return 79 million rows — nearly the entire table. So it chose a sequential scan. The actual query returned 5,000 rows.
The problem: ANALYZE had not run since the index was created. The statistics were stale. The planner had no idea about the distribution of customer_id values. We ran ANALYZE orders; — it took 45 seconds. Then we ran the query again. The planner estimated 5,200 rows and chose an Index Scan. The query time dropped from 42 seconds to 18 milliseconds. The index had been fine all along. The planner just did not know it.
After creating an index in production, run ANALYZE table_name; immediately. This updates the statistics the planner uses to estimate index selectivity. Without fresh statistics, the planner assumes the worst case — that the index matches every row — and chooses a sequential scan. This is the single most common reason newly created indexes are ignored.
Autovacuum runs ANALYZE automatically — but only when a threshold of rows has changed (default: 10% of the table + 50 rows). If you create an index on a large table that has not changed recently, autovacuum will not trigger an ANALYZE. The index sits there, unused, until the next manual ANALYZE or until enough rows change. After every CREATE INDEX in production, run ANALYZE. This is not optional.
Incident 2: The Covering Index That Eliminated 12 Million Heap Fetches
A covering index uses the INCLUDE clause to store additional column values directly in the index — eliminating the need to fetch rows from the heap (the table itself). When a query selects columns that are all present in the index, PostgreSQL can perform an Index Only Scan — reading the index without touching the table. This is 10-100x faster for I/O-bound queries on large tables.
Our main analytics dashboard ran a query 100,000 times per hour: SELECT id, email, last_login FROM users WHERE team_id = $1 AND active = true. We had a standard B-Tree index on (team_id, active). Every query used the index — an Index Scan. But the query still took 45ms on average. Why? The index found the matching rows, but then PostgreSQL had to fetch id, email, and last_login from the heap. That is 100,000 heap fetches per hour. The heap fetches were random I/O — the rows matching team_id = 42 were scattered across the table.
We changed the index to: CREATE INDEX ON users (team_id, active) INCLUDE (id, email, last_login). The index now contained all the columns the query needed. The planner switched to an Index Only Scan. The query time dropped from 45ms to 0.8ms. The heap fetches dropped to zero. The dashboard that used to load in 4 seconds loaded in 90 milliseconds. The difference was the INCLUDE clause.
-- Index finds rows, then fetches
-- id, email, last_login from heap
CREATE INDEX ON users (team_id, active);
EXPLAIN ANALYZE
SELECT id, email, last_login
FROM users
WHERE team_id = 42 AND active = true;
-- Index Scan using users_team_id_active_idx
-- Heap Fetches: 98432
-- Actual time: 45.231 ms
-- Index contains all columns needed
-- No heap fetch required
CREATE INDEX ON users (team_id, active)
INCLUDE (id, email, last_login);
EXPLAIN ANALYZE
SELECT id, email, last_login
FROM users
WHERE team_id = 42 AND active = true;
-- Index Only Scan using users_team_id_active_id_email_last_login_idx
-- Heap Fetches: 0
-- Actual time: 0.843 ms
Covering indexes are larger than standard indexes because they store additional column data. Adding INCLUDE (email) to an index on a 50-million-row table adds approximately 1.2GB to the index size (assuming average email length of 25 bytes). The index also becomes slightly slower to update — every INSERT or UPDATE on the covered columns must modify the index. The tradeoff is almost always worth it for read-heavy tables — the query performance gain (10-100x) outweighs the storage cost and write overhead. For write-heavy tables with frequent updates on the covered columns, measure first.
Incident 3: The Partial Index That Shrunk From 200GB to 200MB
A partial index includes a WHERE clause — it indexes only the rows that satisfy the condition. If you have a notifications table with 200 million rows, but only 1% are unread (WHERE read = false), a partial index on unread notifications is 100x smaller and 100x faster to scan than a full index. The query planner will use it automatically when the query's WHERE clause matches the index's WHERE clause.
Our notifications table had 200 million rows. We had an index on (user_id, created_at DESC) for the query: SELECT * FROM notifications WHERE user_id = $1 AND read = false ORDER BY created_at DESC LIMIT 20. The index was 200GB — larger than the table itself. And 99% of it was useless. Because 99% of notifications were read = true. The index was indexing rows that were never queried.
We replaced it with a partial index: CREATE INDEX ON notifications (user_id, created_at DESC) WHERE read = false. The index size dropped from 200GB to 200MB. The query time dropped from 12ms to 0.3ms. The write overhead dropped — every time a notification was marked as read, it was removed from the partial index automatically. The index was now 1,000x smaller and 40x faster. The query plan changed from Index Scan to Index Only Scan because the partial index perfectly matched the query.
-- PATTERN 1: Unread / active flags
-- Most rows are read=true, but queries target read=false
CREATE INDEX ON notifications (user_id, created_at DESC) WHERE read = false;
CREATE INDEX ON subscriptions (team_id) WHERE status = 'active';
-- PATTERN 2: Recent data only
-- Queries target the last 90 days, but table has years of data
CREATE INDEX ON orders (customer_id, order_date DESC)
WHERE order_date > CURRENT_DATE - INTERVAL '90 days';
-- PATTERN 3: Soft deletes
-- Most rows have deleted_at IS NULL, queries target non-deleted rows
CREATE INDEX ON documents (owner_id, updated_at DESC)
WHERE deleted_at IS NULL;
-- PATTERN 4: Specific status values
-- Queries target one or two status values among dozens
CREATE INDEX ON tasks (assignee_id, priority)
WHERE status IN ('in_progress', 'blocked');
Why the Query Planner Ignores Your Index — 5 Specific Reasons
The PostgreSQL query planner makes a cost-based decision — it estimates the cost of every possible plan and chooses the cheapest one. If the planner estimates that a sequential scan is cheaper than an index scan, it ignores the index. This happens for five specific, fixable reasons.
| Reason | Symptom | Fix |
|---|---|---|
| Stale statistics | Planner estimates 50M rows, actual is 500 | ANALYZE table_name; |
random_page_cost too high | Planner prefers Seq Scan on SSDs | Set random_page_cost = 1.1 |
| Index does not cover selected columns | Index Scan with high Heap Fetches | Add INCLUDE clause for selected columns |
| Function on indexed column | Seq Scan despite index on column | Create expression index on the function |
| Index selectivity too low | Index matches >10% of table | Use partial index or reconsider query design |
The default random_page_cost is 4.0 — a value designed for spinning disks where random I/O is 4x more expensive than sequential I/O. On SSDs and NVMe drives, random I/O is nearly as fast as sequential I/O. If random_page_cost is still 4.0 on an SSD, the planner overestimates the cost of index scans and prefers sequential scans. Set random_page_cost = 1.1 for SSDs and random_page_cost = 1.0 for NVMe. This single setting change can cause the planner to start using indexes it has been ignoring for months. I have seen this change alone reduce query times by 90% on databases with perfectly good indexes that were simply being ignored because of a 20-year-old default.
Index Maintenance — The Indexes You Should Delete
Unused indexes are not free — they consume disk space, slow down writes, and confuse the query planner. Every INSERT, UPDATE, and DELETE on a table must update every index on that table. An index with idx_scan = 0 in pg_stat_user_indexes has never been used — it is only costing you write performance and disk space.
-- 1. Find UNUSED indexes (idx_scan = 0)
SELECT
schemaname || '.' || tablename AS table_name,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan,
idx_tup_read
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelid NOT IN (
SELECT conindid FROM pg_constraint
) -- Exclude primary keys and unique constraints
ORDER BY pg_relation_size(indexrelid) DESC;
-- 2. Find DUPLICATE indexes
SELECT
a.schemaname || '.' || a.tablename AS table_name,
a.indexname AS smaller_index,
b.indexname AS larger_index,
pg_size_pretty(pg_relation_size(a.indexrelid)) AS smaller_size,
pg_size_pretty(pg_relation_size(b.indexrelid)) AS larger_size
FROM pg_stat_user_indexes a
JOIN pg_stat_user_indexes b
ON a.schemaname = b.schemaname
AND a.tablename = b.tablename
AND a.indexname != b.indexname
WHERE a.indexdef LIKE '%' || substring(b.indexdef from '\((.*)\)') || '%'
AND pg_relation_size(a.indexrelid) < pg_relation_size(b.indexrelid)
LIMIT 20;
-- 3. Find BLOATED indexes (dead tuples consuming space)
SELECT
schemaname || '.' || tablename AS table_name,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
n_dead_tup,
n_tup_ins + n_tup_upd + n_tup_del AS total_writes
FROM pg_stat_user_indexes
JOIN pg_stat_user_tables USING (relid)
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT 10;
Production Indexing Checklist
| Practice | Ladder Level | Impact if Skipped |
|---|---|---|
ANALYZE after every CREATE INDEX | L1 | New index ignored — planner uses stale statistics |
random_page_cost = 1.1 on SSDs | L1 | Planner prefers Seq Scan over Index Scan |
| Covering indexes for high-frequency queries | L2 | Millions of unnecessary heap fetches — I/O bottleneck |
| Partial indexes for sparse conditions | L3 | Giant indexes on mostly-irrelevant rows — wasted space and write overhead |
| Expression indexes for function-based WHERE clauses | L4 | Functional queries cannot use standard indexes |
| Monthly index audit — remove idx_scan = 0 indexes | All | Write performance degrades with each unused index |
GIN indexes for JSONB query operators (@>, ?, ?|) | L5 | JSONB queries do full table scans without GIN |
| BRIN indexes for very large append-only tables (time-series) | L5 | B-Tree indexes on billions of rows become unwieldy |
- ✅ Every new index is followed by ANALYZE on the target table
- ✅ random_page_cost is 1.1 (SSD) or 1.0 (NVMe), not 4.0
- ✅ Top 10 slow queries from pg_stat_statements have matching indexes
- ✅ Covering indexes exist for queries that select columns not in the index key
- ✅ Partial indexes exist for unread/active/recent subset queries
- ❌ No unused indexes (idx_scan = 0) except primary keys and constraints
- ❌ No duplicate indexes — the same column indexed multiple times
Frequently Asked Questions
Five reasons: (1) Stale statistics — run ANALYZE. (2) random_page_cost too high — set to 1.1 on SSDs. (3) Index does not cover selected columns — add INCLUDE. (4) Function on indexed column — use expression index. (5) Index selectivity too low — use partial index. Always check EXPLAIN ANALYZE to see what the planner actually chose and compare estimated rows to actual rows.
A 5-level framework: Level 1 (Standard B-Tree) for 80% of queries. Level 2 (Covering Index with INCLUDE) to eliminate heap fetches. Level 3 (Partial Index with WHERE) for sparse conditions. Level 4 (Expression Index) for function-based queries. Level 5 (GIN/GiST/BRIN) for full-text, JSONB, and time-series data. Climb only when the current level fails.
Query pg_stat_statements for slow queries: SELECT query, calls, total_exec_time FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 20. Run EXPLAIN ANALYZE on each. Look for Seq Scan on tables with more than 10,000 rows. Create indexes that match WHERE, JOIN, and ORDER BY clauses. Remove unused indexes monthly.
A covering index uses INCLUDE to store extra column values in the index. When a query selects only columns present in the index, PostgreSQL performs an Index Only Scan — no heap fetches. This is 10-100x faster for I/O-bound queries. Example: CREATE INDEX ON users (team_id) INCLUDE (email, last_login) — covers SELECT email, last_login FROM users WHERE team_id = 42.
When you frequently query a small subset of a large table. Examples: unread notifications (WHERE read = false), active subscriptions (WHERE status = 'active'), recent orders (WHERE created_at > CURRENT_DATE - 90). Partial indexes are smaller, faster to scan, and faster to update than full-table indexes.
Which index have you created that the query planner completely ignored?
Leave a comment describing the index, the query, and what EXPLAIN ANALYZE showed. The most surprising query planner decisions become the next Bioquro PostgreSQL deep-dive.
- 1. vs. Every Competitor: They list index types. This tells you why the index you already created is being ignored by the query planner — with the exact EXPLAIN ANALYZE output, the stale statistics story, and the random_page_cost trap.
- 2. Unique Framework: The Index Performance Ladder — 5 levels from standard B-Tree to GIN/GiST/BRIN — with the contrarian opinion that Level 1 and 2 solve 95% of production problems.
- 3. Differentiated Value: "Creating an index is easy. Creating an index the planner actually trusts is an entirely different skill." — the only article that teaches index trust, not just index syntax.

Comments
Post a Comment