PostgreSQL Performance Tuning for Production Workloads
PostgreSQL's default configuration is designed for compatibility, not performance. It assumes 128MB of RAM, a few concurrent connections, and light workloads. If you're running production traffic on default settings, you're leaving 80% of your hardware idle.
We manage Postgres clusters handling everything from SaaS backends to time-series data. Here's how we tune them.
Memory Configuration
The two most impactful settings in PostgreSQL:
shared_buffers
PostgreSQL's internal cache. The default is 128MB. That's insane for a production server.
-- Default: 128MB
-- Recommendation: 25% of total RAM
-- For a 64GB server:
shared_buffers = '16GB'
Why 25% and not more? PostgreSQL also relies on the OS page cache. Setting shared_buffers too high (>40% of RAM) can actually hurt performance because you're double-caching and leaving less room for the OS.
effective_cache_size
Tells the query planner how much total cache is available (shared_buffers + OS cache). This doesn't allocate memory — it just helps the planner make better decisions:
-- Recommendation: 75% of total RAM
-- For a 64GB server:
effective_cache_size = '48GB'
work_mem
Memory allocated per-operation for sorts, hash joins, and similar operations. Be careful — this is per-operation, not per-connection. A complex query with 5 hash joins uses 5× work_mem.
-- Default: 4MB
-- Recommendation: depends on (RAM / max_connections / expected_operations)
-- For 64GB RAM, 200 connections:
work_mem = '64MB'
-- For analytical queries (batch jobs, reporting):
-- Set per-session: SET work_mem = '512MB';
maintenance_work_mem
Memory for maintenance operations: VACUUM, CREATE INDEX, ALTER TABLE ADD FOREIGN KEY.
-- Default: 64MB
-- Recommendation: 1-2GB (these run infrequently but benefit from more memory)
maintenance_work_mem = '2GB'
Connection Management
The Connection Problem
Every PostgreSQL connection is a process (not a thread). Each process uses ~5-10MB of RAM. At 500 connections, that's 2.5-5GB just for connection overhead. At 1000, you're in trouble.
-- Don't set this higher than necessary
max_connections = 200
But what if you need 1000 concurrent connections? Use a connection pooler.
PgBouncer
PgBouncer sits between your application and PostgreSQL. 1000 application connections map to 50 actual database connections:
; pgbouncer.ini
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp
[pgbouncer]
listen_port = 6432
pool_mode = transaction ; release connection after each transaction
default_pool_size = 50 ; 50 actual Postgres connections
max_client_conn = 1000 ; accept up to 1000 app connections
Transaction pooling (pool_mode = transaction) is the most efficient mode. The connection is returned to the pool after each transaction, not after the client disconnects.
Caveat: transaction pooling breaks SET commands, LISTEN/NOTIFY, and prepared statements that span transactions. If you need those, use session mode for specific pools.
Vacuum and Autovacuum
PostgreSQL's MVCC (Multi-Version Concurrency Control) means UPDATE and DELETE don't actually remove old row versions. VACUUM cleans them up.
If autovacuum falls behind, your table bloats, queries slow down, and eventually you risk transaction ID wraparound — the scariest PostgreSQL failure mode.
-- More aggressive autovacuum for high-write tables
autovacuum_vacuum_scale_factor = 0.05 -- default 0.2 (vacuum at 5% dead tuples vs 20%)
autovacuum_analyze_scale_factor = 0.025 -- default 0.1
autovacuum_vacuum_cost_delay = 2 -- default 2ms (how much vacuum sleeps)
autovacuum_max_workers = 6 -- default 3 (more workers for more tables)
autovacuum_naptime = 15 -- default 60s (check more frequently)
For large, heavily-updated tables, set per-table overrides:
ALTER TABLE events SET (
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_vacuum_threshold = 1000
);
Monitor vacuum status:
SELECT schemaname, relname,
n_dead_tup, n_live_tup,
round(n_dead_tup::numeric / greatest(n_live_tup, 1) * 100, 1) as dead_pct,
last_autovacuum, last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;
Indexing Strategy
The Missing Index Problem
The #1 performance issue we find in audits: missing indexes. PostgreSQL tells you about this — you just have to look:
-- Find sequential scans on large tables (missing indexes)
SELECT schemaname, relname, seq_scan, seq_tup_read,
idx_scan, idx_tup_fetch,
pg_size_pretty(pg_relation_size(relid)) as size
FROM pg_stat_user_tables
WHERE seq_scan > 100
AND pg_relation_size(relid) > 10485760 -- > 10MB
ORDER BY seq_tup_read DESC;
Partial Indexes
Don't index everything. If 95% of your queries filter by status = 'active', use a partial index:
-- Instead of indexing all 10M rows:
CREATE INDEX idx_orders_status ON orders(created_at);
-- Index only the rows that matter (probably 500K):
CREATE INDEX idx_orders_active ON orders(created_at)
WHERE status = 'active';
Smaller index → fits in memory → faster lookups.
Covering Indexes (INCLUDE)
If a query only needs data that's in the index, PostgreSQL can skip reading the table entirely:
-- Query: SELECT email, name FROM users WHERE email = 'user@example.com';
-- Regular index: index lookup → table lookup (2 reads)
CREATE INDEX idx_users_email ON users(email);
-- Covering index: index lookup only (1 read)
CREATE INDEX idx_users_email ON users(email) INCLUDE (name);
Index Maintenance
Indexes degrade over time, especially with heavy UPDATE/DELETE workloads. Reindex periodically:
-- Check index bloat
SELECT indexrelname,
pg_size_pretty(pg_relation_size(indexrelid)) as index_size,
idx_scan as times_used
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND pg_relation_size(indexrelid) > 1048576
ORDER BY pg_relation_size(indexrelid) DESC;
-- Reindex concurrently (no lock, but takes longer)
REINDEX INDEX CONCURRENTLY idx_orders_active;
Query Optimization
EXPLAIN ANALYZE Is Your Best Friend
Never guess. Always measure:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, u.name
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.status = 'pending'
AND o.created_at > now() - interval '7 days';
What to look for:
- Seq Scan on large tables → missing index
- Nested Loop with high row counts → consider hash join (increase
work_mem) - Buffers: shared read (high) → data not cached, need more
shared_buffersor the query touches too much data - Sort Method: external merge →
work_memtoo low for this query
pg_stat_statements
The most valuable extension. Tracks every query's performance:
CREATE EXTENSION pg_stat_statements;
-- Top 10 queries by total time
SELECT query, calls,
round(total_exec_time::numeric, 2) as total_ms,
round(mean_exec_time::numeric, 2) as avg_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
We review this weekly for every production database we manage. The top 5 queries by total time are always where the biggest wins are hiding.
WAL and Checkpoints
Write-Ahead Log (WAL) settings affect write performance and crash recovery time:
-- Larger WAL buffers for write-heavy workloads
wal_buffers = '64MB'
-- Spread checkpoints over time (reduce I/O spikes)
checkpoint_completion_target = 0.9
-- Larger checkpoint distance (fewer, larger checkpoints)
max_wal_size = '4GB'
min_wal_size = '1GB'
Our Production Baseline
For a 64GB RAM, 16-core server with NVMe storage:
-- Memory
shared_buffers = '16GB'
effective_cache_size = '48GB'
work_mem = '64MB'
maintenance_work_mem = '2GB'
-- Connections (use PgBouncer)
max_connections = 200
-- WAL
wal_buffers = '64MB'
max_wal_size = '4GB'
checkpoint_completion_target = 0.9
-- Query planner
random_page_cost = 1.1 -- NVMe: nearly same as seq read
effective_io_concurrency = 200 -- NVMe: high parallelism
-- Autovacuum
autovacuum_max_workers = 6
autovacuum_vacuum_scale_factor = 0.05
autovacuum_naptime = 15
-- Logging
log_min_duration_statement = 500 -- Log queries > 500ms
log_checkpoints = on
log_lock_waits = on
Don't Guess, Measure
Every recommendation here is a starting point. Your workload is unique. Use pg_stat_statements, EXPLAIN ANALYZE, and pg_stat_user_tables to make data-driven tuning decisions.
Running Postgres on our infrastructure? We include performance tuning as part of our managed database offering. Running it elsewhere? We do audits too.