Mastering PostgreSQL Indexing Strategies
B-tree, Hash, GIN, GiST—when to use which index. Learn composite indexes, partial indexes, and how to diagnose missing indexes.
Mastering PostgreSQL Indexing Strategies
Last Updated: November 2025
Scope: PostgreSQL 14+
The Cost of Getting It Wrong
A missing index can turn a 10ms query into a 10-second query. Too many indexes slow down writes and waste disk space.
The Audit (Find Missing Indexes)
Step 1: Enable Query Logging
-- Add to postgresql.conf
log_min_duration_statement = 100 -- Log queries >100ms
log_line_prefix = '%t [%p]: '
Step 2: Find Sequential Scans
-- Find tables with high sequential scan counts
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
idx_scan,
seq_tup_read / seq_scan AS avg_seq_read
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 10;
🚩 Red Flag: High seq_scan with low idx_scan on large tables.
Step 3: Use EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'pending';
🚩 Red Flags:
Seq Scanon tables with >10k rows- High
actual timevalues rowsestimate wildly different from actual
Index Types (When to Use What)
1. B-Tree Index (Default, 95% of Cases)
Use for:
- Equality comparisons (
=) - Range queries (
<,>,BETWEEN) - Sorting (
ORDER BY) - Pattern matching (
LIKE 'prefix%')
-- Single column
CREATE INDEX idx_users_email ON users(email);
-- Composite (order matters!)
CREATE INDEX idx_orders_customer_date
ON orders(customer_id, created_at);
-- This index helps:
-- ✅ WHERE customer_id = 123
-- ✅ WHERE customer_id = 123 AND created_at > '2025-01-01'
-- ✅ WHERE customer_id = 123 ORDER BY created_at
-- This index does NOT help:
-- ❌ WHERE created_at > '2025-01-01' (created_at is not first)
2. Hash Index (Equality Only)
Use for:
- Only equality comparisons (
=) - Never used for ranges or sorting
CREATE INDEX idx_sessions_token ON sessions USING HASH(token);
-- Good for:
-- ✅ WHERE token = 'abc123'
-- Bad for:
-- ❌ WHERE token LIKE 'abc%'
-- ❌ ORDER BY token
Note: B-tree is usually better. Hash indexes are rarely needed.
3. GIN Index (Full-Text Search, Arrays, JSONB)
Use for:
- Full-text search
- Array containment (
@>,<@) - JSONB queries
-- Full-text search
CREATE INDEX idx_articles_content
ON articles USING GIN(to_tsvector('english', content));
SELECT * FROM articles
WHERE to_tsvector('english', content) @@ to_tsquery('postgresql & performance');
-- Array containment
CREATE INDEX idx_posts_tags ON posts USING GIN(tags);
SELECT * FROM posts WHERE tags @> ARRAY['postgresql', 'performance'];
-- JSONB queries
CREATE INDEX idx_users_metadata ON users USING GIN(metadata);
SELECT * FROM users WHERE metadata @> '{"plan": "premium"}';
4. GiST Index (Geometric, Full-Text, Ranges)
Use for:
- Geometric data
- Full-text search (alternative to GIN)
- Range types
- Nearest neighbor searches
-- Geometric search
CREATE INDEX idx_locations_point ON locations USING GIST(coordinates);
-- Range overlap
CREATE INDEX idx_bookings_dates ON bookings USING GIST(date_range);
SELECT * FROM bookings
WHERE date_range && '[2025-01-01, 2025-01-31]'::daterange;
Advanced Strategies
1. Composite Indexes (Column Order Matters)
Rule: Most selective column first? Wrong. Most frequently queried column first.
-- Bad: Less common filter first
CREATE INDEX idx_orders_status_customer
ON orders(status, customer_id);
-- Good: Most common filter first
CREATE INDEX idx_orders_customer_status
ON orders(customer_id, status);
-- Why? Most queries filter by customer_id:
-- ✅ WHERE customer_id = 123
-- ✅ WHERE customer_id = 123 AND status = 'pending'
2. Partial Indexes (Index Only What You Query)
-- Instead of indexing all rows:
CREATE INDEX idx_orders_created ON orders(created_at);
-- Index only active orders:
CREATE INDEX idx_active_orders_created
ON orders(created_at)
WHERE status IN ('pending', 'processing');
-- Smaller index, faster queries for active orders
-- 90% smaller if only 10% of orders are active
3. Covering Indexes (Include Non-Key Columns)
-- Query: SELECT name, email FROM users WHERE company_id = 123;
-- Basic index (requires table lookup)
CREATE INDEX idx_users_company ON users(company_id);
-- Covering index (no table lookup needed)
CREATE INDEX idx_users_company_covering
ON users(company_id) INCLUDE (name, email);
Result: Index-only scan (much faster).
4. Expression Indexes
-- Query often uses lowercase comparison:
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
-- Index the expression:
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
Index Maintenance
1. Find Unused Indexes
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;
Action: Drop indexes with idx_scan = 0 after confirming they’re truly unused.
2. Find Duplicate/Redundant Indexes
-- If you have:
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);
-- The first index is redundant! Drop it.
3. Reindex Bloated Indexes
-- Check bloat
SELECT
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
-- Rebuild if bloated
REINDEX INDEX CONCURRENTLY idx_users_email;
Common Mistakes
Mistake 1: Over-Indexing
-- Don't do this:
CREATE INDEX idx_users_id ON users(id); -- ❌ Redundant (PRIMARY KEY)
CREATE INDEX idx_users_created ON users(created_at);
CREATE INDEX idx_users_updated ON users(updated_at);
CREATE INDEX idx_users_name ON users(name);
CREATE INDEX idx_users_email ON users(email);
-- Each index slows down INSERT/UPDATE/DELETE
Mistake 2: Wrong Column Order
-- Query: WHERE status = 'active' AND created_at > '2025-01-01'
-- Bad: Status has low cardinality (only 3 values)
CREATE INDEX idx_orders_status_created ON orders(status, created_at);
-- Good: Date first (high cardinality)
CREATE INDEX idx_orders_created_status ON orders(created_at, status);
Mistake 3: Not Using Partial Indexes
-- Bad: Index all rows
CREATE INDEX idx_users_last_login ON users(last_login_at);
-- Good: Index only active users (if that's what you query)
CREATE INDEX idx_active_users_last_login
ON users(last_login_at)
WHERE deleted_at IS NULL;
The Decision Tree
Start here: What are you querying?
- Equality on single column → B-tree index
- Range queries (dates, numbers) → B-tree index
- Full-text search → GIN index on
to_tsvector - JSONB queries → GIN index on JSONB column
- Array containment → GIN index
- Geometric data → GiST index
- Only active/recent rows → Partial B-tree index
- Need to avoid table lookup → Covering index (INCLUDE)
The Checklist
- All WHERE clause columns indexed (or first column in composite)
- Composite indexes ordered by query frequency
- Partial indexes for filtered queries (e.g., active records only)
- No redundant indexes (check for overlaps)
- Unused indexes dropped (idx_scan = 0)
- GIN indexes for JSONB/array/full-text queries
- Covering indexes for frequently-queried columns
The Bottom Line
Good indexing is about understanding your query patterns. Use EXPLAIN ANALYZE, monitor pg_stat_user_indexes, and remember: an index is only helpful if your queries actually use it.
Let's Build Something Scalable
We apply these same engineering principles to client projects. Ready to upgrade your stack?