Database PostgreSQL Performance Infrastructure

PostgreSQL Performance Tuning for Production Workloads

Kicked TeamJanuary 16, 20267 min read

PostgreSQL's default configuration is designed for compatibility, not performance. It assumes 128MB of RAM, a few concurrent connections, and light workloads. If you're running production traffic on default settings, you're leaving 80% of your hardware idle.

We manage Postgres clusters handling everything from SaaS backends to time-series data. Here's how we tune them.

Memory Configuration

The two most impactful settings in PostgreSQL:

shared_buffers

PostgreSQL's internal cache. The default is 128MB. That's insane for a production server.

-- Default: 128MB
-- Recommendation: 25% of total RAM
-- For a 64GB server:
shared_buffers = '16GB'

Why 25% and not more? PostgreSQL also relies on the OS page cache. Setting shared_buffers too high (>40% of RAM) can actually hurt performance because you're double-caching and leaving less room for the OS.

effective_cache_size

Tells the query planner how much total cache is available (shared_buffers + OS cache). This doesn't allocate memory — it just helps the planner make better decisions:

-- Recommendation: 75% of total RAM
-- For a 64GB server:
effective_cache_size = '48GB'

work_mem

Memory allocated per-operation for sorts, hash joins, and similar operations. Be careful — this is per-operation, not per-connection. A complex query with 5 hash joins uses 5× work_mem.

-- Default: 4MB
-- Recommendation: depends on (RAM / max_connections / expected_operations)
-- For 64GB RAM, 200 connections:
work_mem = '64MB'

-- For analytical queries (batch jobs, reporting):
-- Set per-session: SET work_mem = '512MB';

maintenance_work_mem

Memory for maintenance operations: VACUUM, CREATE INDEX, ALTER TABLE ADD FOREIGN KEY.

-- Default: 64MB
-- Recommendation: 1-2GB (these run infrequently but benefit from more memory)
maintenance_work_mem = '2GB'

Connection Management

The Connection Problem

Every PostgreSQL connection is a process (not a thread). Each process uses ~5-10MB of RAM. At 500 connections, that's 2.5-5GB just for connection overhead. At 1000, you're in trouble.

-- Don't set this higher than necessary
max_connections = 200

But what if you need 1000 concurrent connections? Use a connection pooler.

PgBouncer

PgBouncer sits between your application and PostgreSQL. 1000 application connections map to 50 actual database connections:

; pgbouncer.ini
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp

[pgbouncer]
listen_port = 6432
pool_mode = transaction    ; release connection after each transaction
default_pool_size = 50     ; 50 actual Postgres connections
max_client_conn = 1000     ; accept up to 1000 app connections

Transaction pooling (pool_mode = transaction) is the most efficient mode. The connection is returned to the pool after each transaction, not after the client disconnects.

Caveat: transaction pooling breaks SET commands, LISTEN/NOTIFY, and prepared statements that span transactions. If you need those, use session mode for specific pools.

Vacuum and Autovacuum

PostgreSQL's MVCC (Multi-Version Concurrency Control) means UPDATE and DELETE don't actually remove old row versions. VACUUM cleans them up.

If autovacuum falls behind, your table bloats, queries slow down, and eventually you risk transaction ID wraparound — the scariest PostgreSQL failure mode.

-- More aggressive autovacuum for high-write tables
autovacuum_vacuum_scale_factor = 0.05     -- default 0.2 (vacuum at 5% dead tuples vs 20%)
autovacuum_analyze_scale_factor = 0.025   -- default 0.1
autovacuum_vacuum_cost_delay = 2          -- default 2ms (how much vacuum sleeps)
autovacuum_max_workers = 6                -- default 3 (more workers for more tables)
autovacuum_naptime = 15                   -- default 60s (check more frequently)

For large, heavily-updated tables, set per-table overrides:

ALTER TABLE events SET (
  autovacuum_vacuum_scale_factor = 0.01,
  autovacuum_vacuum_threshold = 1000
);

Monitor vacuum status:

SELECT schemaname, relname, 
       n_dead_tup, n_live_tup,
       round(n_dead_tup::numeric / greatest(n_live_tup, 1) * 100, 1) as dead_pct,
       last_autovacuum, last_autoanalyze
FROM pg_stat_user_tables 
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

Indexing Strategy

The Missing Index Problem

The #1 performance issue we find in audits: missing indexes. PostgreSQL tells you about this — you just have to look:

-- Find sequential scans on large tables (missing indexes)
SELECT schemaname, relname, seq_scan, seq_tup_read,
       idx_scan, idx_tup_fetch,
       pg_size_pretty(pg_relation_size(relid)) as size
FROM pg_stat_user_tables
WHERE seq_scan > 100 
  AND pg_relation_size(relid) > 10485760  -- > 10MB
ORDER BY seq_tup_read DESC;

Partial Indexes

Don't index everything. If 95% of your queries filter by status = 'active', use a partial index:

-- Instead of indexing all 10M rows:
CREATE INDEX idx_orders_status ON orders(created_at);

-- Index only the rows that matter (probably 500K):
CREATE INDEX idx_orders_active ON orders(created_at)
WHERE status = 'active';

Smaller index → fits in memory → faster lookups.

Covering Indexes (INCLUDE)

If a query only needs data that's in the index, PostgreSQL can skip reading the table entirely:

-- Query: SELECT email, name FROM users WHERE email = 'user@example.com';

-- Regular index: index lookup → table lookup (2 reads)
CREATE INDEX idx_users_email ON users(email);

-- Covering index: index lookup only (1 read)
CREATE INDEX idx_users_email ON users(email) INCLUDE (name);

Index Maintenance

Indexes degrade over time, especially with heavy UPDATE/DELETE workloads. Reindex periodically:

-- Check index bloat
SELECT indexrelname, 
       pg_size_pretty(pg_relation_size(indexrelid)) as index_size,
       idx_scan as times_used
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND pg_relation_size(indexrelid) > 1048576
ORDER BY pg_relation_size(indexrelid) DESC;

-- Reindex concurrently (no lock, but takes longer)
REINDEX INDEX CONCURRENTLY idx_orders_active;

Query Optimization

EXPLAIN ANALYZE Is Your Best Friend

Never guess. Always measure:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, u.name
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.status = 'pending'
  AND o.created_at > now() - interval '7 days';

What to look for:

Seq Scan on large tables → missing index
Nested Loop with high row counts → consider hash join (increase work_mem)
Buffers: shared read (high) → data not cached, need more shared_buffers or the query touches too much data
Sort Method: external merge → work_mem too low for this query

pg_stat_statements

The most valuable extension. Tracks every query's performance:

CREATE EXTENSION pg_stat_statements;

-- Top 10 queries by total time
SELECT query, calls, 
       round(total_exec_time::numeric, 2) as total_ms,
       round(mean_exec_time::numeric, 2) as avg_ms,
       rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

We review this weekly for every production database we manage. The top 5 queries by total time are always where the biggest wins are hiding.

WAL and Checkpoints

Write-Ahead Log (WAL) settings affect write performance and crash recovery time:

-- Larger WAL buffers for write-heavy workloads
wal_buffers = '64MB'

-- Spread checkpoints over time (reduce I/O spikes)
checkpoint_completion_target = 0.9

-- Larger checkpoint distance (fewer, larger checkpoints)
max_wal_size = '4GB'
min_wal_size = '1GB'

Our Production Baseline

For a 64GB RAM, 16-core server with NVMe storage:

-- Memory
shared_buffers = '16GB'
effective_cache_size = '48GB'
work_mem = '64MB'
maintenance_work_mem = '2GB'

-- Connections (use PgBouncer)
max_connections = 200

-- WAL
wal_buffers = '64MB'
max_wal_size = '4GB'
checkpoint_completion_target = 0.9

-- Query planner
random_page_cost = 1.1        -- NVMe: nearly same as seq read
effective_io_concurrency = 200 -- NVMe: high parallelism

-- Autovacuum
autovacuum_max_workers = 6
autovacuum_vacuum_scale_factor = 0.05
autovacuum_naptime = 15

-- Logging
log_min_duration_statement = 500  -- Log queries > 500ms
log_checkpoints = on
log_lock_waits = on

Don't Guess, Measure

Every recommendation here is a starting point. Your workload is unique. Use pg_stat_statements, EXPLAIN ANALYZE, and pg_stat_user_tables to make data-driven tuning decisions.

Running Postgres on our infrastructure? We include performance tuning as part of our managed database offering. Running it elsewhere? We do audits too.

Back to all posts