Mastering PostgreSQL Indexing Strategies

Last Updated: November 2025
Scope: PostgreSQL 14+

The Cost of Getting It Wrong

A missing index can turn a 10ms query into a 10-second query. Too many indexes slow down writes and waste disk space.

The Audit (Find Missing Indexes)

Step 1: Enable Query Logging

-- Add to postgresql.conf
log_min_duration_statement = 100  -- Log queries >100ms
log_line_prefix = '%t [%p]: '

Step 2: Find Sequential Scans

-- Find tables with high sequential scan counts
SELECT
  schemaname,
  tablename,
  seq_scan,
  seq_tup_read,
  idx_scan,
  seq_tup_read / seq_scan AS avg_seq_read
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 10;

🚩 Red Flag: High seq_scan with low idx_scan on large tables.

Step 3: Use EXPLAIN ANALYZE

EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'pending';

🚩 Red Flags:

Seq Scan on tables with >10k rows
High actual time values
rows estimate wildly different from actual

Index Types (When to Use What)

1. B-Tree Index (Default, 95% of Cases)

Use for:

Equality comparisons (=)
Range queries (<, >, BETWEEN)
Sorting (ORDER BY)
Pattern matching (LIKE 'prefix%')

-- Single column
CREATE INDEX idx_users_email ON users(email);

-- Composite (order matters!)
CREATE INDEX idx_orders_customer_date
ON orders(customer_id, created_at);

-- This index helps:
-- ✅ WHERE customer_id = 123
-- ✅ WHERE customer_id = 123 AND created_at > '2025-01-01'
-- ✅ WHERE customer_id = 123 ORDER BY created_at

-- This index does NOT help:
-- ❌ WHERE created_at > '2025-01-01' (created_at is not first)

2. Hash Index (Equality Only)

Use for:

Only equality comparisons (=)
Never used for ranges or sorting

CREATE INDEX idx_sessions_token ON sessions USING HASH(token);

-- Good for:
-- ✅ WHERE token = 'abc123'

-- Bad for:
-- ❌ WHERE token LIKE 'abc%'
-- ❌ ORDER BY token

Note: B-tree is usually better. Hash indexes are rarely needed.

3. GIN Index (Full-Text Search, Arrays, JSONB)

Use for:

Full-text search
Array containment (@>, <@)
JSONB queries

-- Full-text search
CREATE INDEX idx_articles_content
ON articles USING GIN(to_tsvector('english', content));

SELECT * FROM articles
WHERE to_tsvector('english', content) @@ to_tsquery('postgresql & performance');

-- Array containment
CREATE INDEX idx_posts_tags ON posts USING GIN(tags);

SELECT * FROM posts WHERE tags @> ARRAY['postgresql', 'performance'];

-- JSONB queries
CREATE INDEX idx_users_metadata ON users USING GIN(metadata);

SELECT * FROM users WHERE metadata @> '{"plan": "premium"}';

4. GiST Index (Geometric, Full-Text, Ranges)

Use for:

Geometric data
Full-text search (alternative to GIN)
Range types
Nearest neighbor searches

-- Geometric search
CREATE INDEX idx_locations_point ON locations USING GIST(coordinates);

-- Range overlap
CREATE INDEX idx_bookings_dates ON bookings USING GIST(date_range);

SELECT * FROM bookings
WHERE date_range && '[2025-01-01, 2025-01-31]'::daterange;

Advanced Strategies

1. Composite Indexes (Column Order Matters)

Rule: Most selective column first? Wrong. Most frequently queried column first.

-- Bad: Less common filter first
CREATE INDEX idx_orders_status_customer
ON orders(status, customer_id);

-- Good: Most common filter first
CREATE INDEX idx_orders_customer_status
ON orders(customer_id, status);

-- Why? Most queries filter by customer_id:
-- ✅ WHERE customer_id = 123
-- ✅ WHERE customer_id = 123 AND status = 'pending'

2. Partial Indexes (Index Only What You Query)

-- Instead of indexing all rows:
CREATE INDEX idx_orders_created ON orders(created_at);

-- Index only active orders:
CREATE INDEX idx_active_orders_created
ON orders(created_at)
WHERE status IN ('pending', 'processing');

-- Smaller index, faster queries for active orders
-- 90% smaller if only 10% of orders are active

3. Covering Indexes (Include Non-Key Columns)

-- Query: SELECT name, email FROM users WHERE company_id = 123;

-- Basic index (requires table lookup)
CREATE INDEX idx_users_company ON users(company_id);

-- Covering index (no table lookup needed)
CREATE INDEX idx_users_company_covering
ON users(company_id) INCLUDE (name, email);

Result: Index-only scan (much faster).

4. Expression Indexes

-- Query often uses lowercase comparison:
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

-- Index the expression:
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

Index Maintenance

1. Find Unused Indexes

SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;

Action: Drop indexes with idx_scan = 0 after confirming they’re truly unused.

2. Find Duplicate/Redundant Indexes

-- If you have:
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);

-- The first index is redundant! Drop it.

3. Reindex Bloated Indexes

-- Check bloat
SELECT
  tablename,
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

-- Rebuild if bloated
REINDEX INDEX CONCURRENTLY idx_users_email;

Common Mistakes

Mistake 1: Over-Indexing

-- Don't do this:
CREATE INDEX idx_users_id ON users(id);          -- ❌ Redundant (PRIMARY KEY)
CREATE INDEX idx_users_created ON users(created_at);
CREATE INDEX idx_users_updated ON users(updated_at);
CREATE INDEX idx_users_name ON users(name);
CREATE INDEX idx_users_email ON users(email);

-- Each index slows down INSERT/UPDATE/DELETE

Mistake 2: Wrong Column Order

-- Query: WHERE status = 'active' AND created_at > '2025-01-01'

-- Bad: Status has low cardinality (only 3 values)
CREATE INDEX idx_orders_status_created ON orders(status, created_at);

-- Good: Date first (high cardinality)
CREATE INDEX idx_orders_created_status ON orders(created_at, status);

Mistake 3: Not Using Partial Indexes

-- Bad: Index all rows
CREATE INDEX idx_users_last_login ON users(last_login_at);

-- Good: Index only active users (if that's what you query)
CREATE INDEX idx_active_users_last_login
ON users(last_login_at)
WHERE deleted_at IS NULL;

The Decision Tree

Start here: What are you querying?

Equality on single column → B-tree index
Range queries (dates, numbers) → B-tree index
Full-text search → GIN index on to_tsvector
JSONB queries → GIN index on JSONB column
Array containment → GIN index
Geometric data → GiST index
Only active/recent rows → Partial B-tree index
Need to avoid table lookup → Covering index (INCLUDE)

The Checklist

All WHERE clause columns indexed (or first column in composite)
Composite indexes ordered by query frequency
Partial indexes for filtered queries (e.g., active records only)
No redundant indexes (check for overlaps)
Unused indexes dropped (idx_scan = 0)
GIN indexes for JSONB/array/full-text queries
Covering indexes for frequently-queried columns

The Bottom Line

Good indexing is about understanding your query patterns. Use EXPLAIN ANALYZE, monitor pg_stat_user_indexes, and remember: an index is only helpful if your queries actually use it.

Mastering PostgreSQL Indexing Strategies

Mastering PostgreSQL Indexing Strategies

The Cost of Getting It Wrong

The Audit (Find Missing Indexes)

Step 1: Enable Query Logging

Step 2: Find Sequential Scans

Step 3: Use EXPLAIN ANALYZE

Index Types (When to Use What)

1. B-Tree Index (Default, 95% of Cases)

2. Hash Index (Equality Only)

3. GIN Index (Full-Text Search, Arrays, JSONB)

4. GiST Index (Geometric, Full-Text, Ranges)

Advanced Strategies

1. Composite Indexes (Column Order Matters)

2. Partial Indexes (Index Only What You Query)

3. Covering Indexes (Include Non-Key Columns)

4. Expression Indexes

Index Maintenance

1. Find Unused Indexes

2. Find Duplicate/Redundant Indexes

3. Reindex Bloated Indexes

Common Mistakes

Mistake 1: Over-Indexing

Mistake 2: Wrong Column Order

Mistake 3: Not Using Partial Indexes

The Decision Tree

The Checklist

The Bottom Line

Let's Build Something Scalable