Memory Management
What is it?
PostgreSQL uses a sophisticated memory management system with both shared and private memory areas. Understanding how PostgreSQL allocates and uses memory is critical for performance tuning and avoiding out-of-memory (OOM) situations.
Key Memory Areas
Shared Memory (shared across all processes)
shared_buffers
- PostgreSQL’s main data cache
- Stores frequently accessed table and index pages
- Shared by all backend processes
- Set once at server start (requires restart to change)
- Sizing depends on workload (OLTP, OLAP, mixed) and total system memory
- Balance between shared_buffers and OS page cache is workload-dependent
- Use tools like PGConfig.io for initial sizing recommendations
- Fine-tune based on cache hit ratios and workload testing
Private Memory (per backend process)
work_mem
- Used for sorting, hash tables, and merge joins
- Defines the maximum memory per operation - actual usage can be less
- Operations allocate memory on-demand up to the work_mem limit
- If data exceeds work_mem, PostgreSQL spills to temporary disk files (slow!)
- Each backend can use multiple work_mem allocations (one per operation)
- There is NO hard limit on allocations per query - complex queries can use 10+ work_mem buffers
- Hash operations use
hash_mem_multiplier(default: 2.0) × work_mem - Parallel workers each get their own work_mem allocation
- Critical: Most dangerous memory parameter - see “Memory Multiplication Danger” section below
maintenance_work_mem
- Used for maintenance operations: VACUUM, CREATE INDEX, ALTER TABLE
- Can be set much higher than work_mem since fewer processes use it
- Typical size: 1-2GB (or higher for large databases)
temp_buffers
- Used for temporary tables within a session
- Default: 8MB per session
- Rarely needs adjustment
Planning Parameters
effective_cache_size
- Not an allocation - this is a hint to the query planner
- Tells the planner how much memory is available for caching (including OS cache)
- Should be set to ~50-75% of total system RAM
- Does NOT reserve or allocate memory
Why it matters
Performance Impact
Insufficient shared_buffers
- More disk I/O (cache misses)
- Slower query performance
- Storage latency becomes a bottleneck - every cache miss exposes queries to full storage latency (especially impactful with high-latency storage like cloud EBS, network-attached storage)
Too large shared_buffers
- Less memory available for OS page cache (kernel cache)
- PostgreSQL uses both shared_buffers and OS page cache - a miss in shared_buffers can still hit OS cache (avoiding disk I/O)
- The optimal split between shared_buffers and OS cache depends on workload (OLTP vs OLAP, working set size, etc.)
- Very large shared_buffers (>100GB) can slow PostgreSQL startup and make checkpoints heavier
- Use tools like PGConfig.io for workload-specific recommendations
- See effective_cache_size docs which accounts for both kernel cache and shared_buffers
Incorrect work_mem
- Too small: Queries spill to disk (temp files), dramatically slower
- Too large: Risk of OOM kills, especially in containerized environments
- The multiplication effect makes this particularly dangerous
Memory Multiplication Danger
The most common mistake is not accounting for the maximum potential memory usage:
Maximum potential memory = work_mem × operations_per_query × parallel_workers × connections
(Note: hash operations can use up to hash_mem_multiplier × work_mem, default 2.0)
This is the worst-case scenario. Actual usage depends on data size - operations use memory on-demand up to their work_mem limit. However, planning for worst-case is critical to avoid OOM kills.
Example (worst-case):
- work_mem = 128MB
- hash_mem_multiplier = 2.0 (default)
- Query has 3 operations (2 sorts + 1 hash join)
- 2 parallel workers per query
- 100 active connections
Calculation (maximum potential):
- 2 sorts: up to 128MB × 2 = 256MB
- 1 hash join: up to 128MB × 2.0 (hash_mem_multiplier) = 256MB
- Per query per worker: up to 512MB
- With 2 workers: up to 512MB × 2 = 1024MB per query
- With 100 connections: up to 1024MB × 100 = 102.4GB
In practice, not all queries will hit these limits simultaneously, but with high concurrency and large datasets, you can approach this worst-case scenario, easily triggering OOM kills in Kubernetes pods or cloud instances with memory limits.
Key takeaway: hash_mem_multiplier makes hash operations even more memory-hungry. Check this setting:
SHOW hash_mem_multiplier; -- Default: 2.0
Interaction with OS Cache
PostgreSQL’s “double buffering” architecture:
- shared_buffers: PostgreSQL’s cache
- OS page cache: Operating system’s file cache
Data can exist in both caches, which is why extremely large shared_buffers isn’t always beneficial. The OS cache is very efficient, and PostgreSQL relies on it.
How to monitor
Check Current Memory Settings
SELECT
name,
setting,
unit,
context
FROM pg_settings
WHERE name IN (
'shared_buffers',
'work_mem',
'maintenance_work_mem',
'effective_cache_size',
'temp_buffers',
'hash_mem_multiplier'
)
ORDER BY name;
Example output:
name | setting | unit | context
-----------------------+---------+-------+-----------
effective_cache_size | 524288 | 8kB | user
hash_mem_multiplier | 2 | | user
maintenance_work_mem | 262144 | kB | user
shared_buffers | 131072 | 8kB | postmaster
temp_buffers | 1024 | 8kB | user
work_mem | 4096 | kB | user
What to look for:
hash_mem_multiplier: Default is 2.0, meaning hash operations can use 2× work_mem- This multiplier significantly increases memory usage for queries with hash joins or hash aggregates
Converting to human-readable sizes:
SELECT
name,
setting,
unit,
pg_size_pretty(
CASE
WHEN unit = '8kB' THEN setting::bigint * 8 * 1024
WHEN unit = 'kB' THEN setting::bigint * 1024
WHEN unit = 'B' THEN setting::bigint
ELSE NULL
END
) AS size
FROM pg_settings
WHERE name IN ('shared_buffers', 'work_mem', 'maintenance_work_mem', 'effective_cache_size');
Example output:
name | setting | unit | size
-----------------------+---------+------+--------
effective_cache_size | 524288 | 8kB | 4096 MB
maintenance_work_mem | 262144 | kB | 256 MB
shared_buffers | 131072 | 8kB | 1024 MB
work_mem | 4096 | kB | 4096 kB
Monitor shared_buffers Usage (Cache Hit Ratio)
High cache hit ratio = data served from memory instead of disk.
SELECT
sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
ROUND(100.0 * sum(heap_blks_hit) / NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0), 2) as cache_hit_ratio
FROM pg_statio_user_tables;
Example output:
heap_read | heap_hit | cache_hit_ratio
-----------+----------+-----------------
45623 | 8234561 | 99.45
What to look for:
cache_hit_ratio > 99%: Excellent - most data served from cachecache_hit_ratio 95-99%: Good, but might benefit from more shared_bufferscache_hit_ratio < 95%: Increase shared_buffers or investigate query patterns- Note: Low ratio on a freshly started database is normal (cold cache)
Detect Queries Using Temp Files (work_mem too small)
When work_mem is insufficient, PostgreSQL spills to disk (creates temp files). This is SLOW.
SELECT
query,
calls,
total_exec_time,
temp_blks_written,
temp_blks_read,
pg_size_pretty(temp_blks_written * 8192::bigint) as temp_space_used
FROM pg_stat_statements
WHERE temp_blks_written > 0
ORDER BY temp_blks_written DESC
LIMIT 10;
Example output:
query | calls | total_exec_time | temp_blks_written | temp_space_used
----------------------------------------------+-------+-----------------+-------------------+-----------------
SELECT * FROM large_table ORDER BY created | 142 | 45623.23 | 1245632 | 9.5 GB
SELECT * FROM users u JOIN orders o ON ... | 89 | 23451.12 | 456789 | 3.5 GB
What to look for:
temp_blks_written > 0: Queries spilling to disk due to insufficient work_mem- High
temp_blks_writtenwith slowtotal_exec_time: Strong candidate for work_mem increase (or query optimization) - WARNING: Don’t blindly increase work_mem globally - consider per-query tuning with
SET work_mem = '256MB'for specific sessions
Enable temp file logging to catch these in real-time:
-- Log temp files larger than 10MB
ALTER SYSTEM SET log_temp_files = 10240; -- in kB
SELECT pg_reload_conf();
Monitor Current Memory Usage (OS Level)
# Linux: See PostgreSQL memory usage
ps aux --sort=-%mem | grep postgres | head -20
# Total PostgreSQL memory usage
ps -C postgres -o pid,user,vsz,rss,cmd --sort=-rss | head -20
Example output:
USER PID %CPU %MEM VSZ RSS STAT STARTED TIME COMMAND
postgres 1234 2.1 8.5 2456789 876543 Ss 10:00 1:23 /usr/bin/postgres -D /var/lib/postgresql/data
postgres 5432 1.2 3.2 2345678 345678 Ss 10:23 0:45 postgres: app_user mydb SELECT (sorting)
postgres 5433 0.8 2.1 2234567 234567 Ss 10:24 0:12 postgres: app_user mydb SELECT (hashing)
What to look for:
RSScolumn: Actual physical memory usage per process- Backends using significantly more RSS than typical: Likely consuming multiple work_mem buffers
- Sum of all RSS values approaching system limits: Risk of OOM
Check for OOM Kills (Linux)
# Check system logs for OOM killer activity
dmesg -T | grep -i "killed process"
journalctl -k | grep -i "out of memory"
# Check PostgreSQL logs for unexpected process terminations
grep -i "unexpected" /var/log/postgresql/postgresql-*.log
Common problems
Problem: Database cache hit ratio is low (<95%)
Symptom: Queries are slower than expected, high disk I/O
Diagnosis:
-- Check cache hit ratio per table
SELECT
schemaname,
tablename,
heap_blks_read,
heap_blks_hit,
ROUND(100.0 * heap_blks_hit / NULLIF(heap_blks_hit + heap_blks_read, 0), 2) as cache_hit_ratio
FROM pg_statio_user_tables
WHERE heap_blks_read + heap_blks_hit > 0
ORDER BY cache_hit_ratio ASC, heap_blks_read DESC
LIMIT 10;
Solutions:
- Increase
shared_buffers(requires restart) - Check if database just started (cold cache takes time to warm)
- Investigate if working set is larger than available memory
- Consider table partitioning to reduce working set size
Problem: Queries creating large temp files
Symptom: Slow query performance, disk I/O spikes, temp file warnings in logs
Diagnosis: See “Detect Queries Using Temp Files” query above
Solutions:
- Per-session tuning (preferred):
SET work_mem = '256MB'; -- Run your query - Optimize the query (add indexes, rewrite joins)
- Increase
work_memglobally - DANGEROUS, calculate worst case first - Use connection pooling to limit number of concurrent queries
Problem: Out of memory (OOM) kills
Symptom: PostgreSQL processes killed unexpectedly, “postgres: … killed” in dmesg
Causes:
work_memset too high with many concurrent connections- Parallel queries multiplying work_mem usage
- Memory leaks in extensions or functions
- Insufficient system memory for workload
Investigation:
-- Find queries with high parallel workers
SELECT
query,
calls,
mean_exec_time,
max_exec_time
FROM pg_stat_statements
WHERE query LIKE '%parallel%' OR query LIKE '%Parallel%'
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Check current parallel settings
SHOW max_parallel_workers_per_gather;
SHOW max_worker_processes;
Solutions:
- Reduce
work_memsignificantly - Lower
max_parallel_workers_per_gather - Implement connection pooling
- Set per-query memory limits:
SET work_mem = '64MB'; - Increase system memory or use larger instance/pod
- Set memory limits in postgresql.conf conservatively
Problem: shared_buffers change not taking effect
Symptom: Changed shared_buffers in postgresql.conf but value unchanged
Cause: shared_buffers requires a full server restart (context = ‘postmaster’)
Solution:
# Check current value
psql -c "SHOW shared_buffers;"
# Edit postgresql.conf
vim /var/lib/postgresql/data/postgresql.conf
# Reload won't work - need full restart
sudo systemctl restart postgresql
# Or in Kubernetes
kubectl rollout restart statefulset postgres
Problem: Maintenance operations (VACUUM, CREATE INDEX) running slowly
Symptom: VACUUM or index creation taking hours on large tables
Diagnosis:
SHOW maintenance_work_mem;
-- Check if autovacuum is competing for resources
SELECT COUNT(*) FROM pg_stat_activity WHERE query LIKE 'autovacuum:%';
Solutions:
- Increase
maintenance_work_mem(can be set per-session):SET maintenance_work_mem = '2GB'; CREATE INDEX CONCURRENTLY idx_name ON table_name(column); - Run maintenance during off-peak hours
- Use
CREATE INDEX CONCURRENTLYto avoid blocking writes - Consider partitioning very large tables