Your PostgreSQL server starts swapping at 2 AM during batch processing, but by the time you notice the performance impact, user queries are already timing out. Database-level monitoring tools like pgstatactivity add their own overhead precisely when memory pressure is highest.
System-level monitoring catches PostgreSQL memory exhaustion before it affects application performance, without adding query overhead when your database is already struggling.
Step 1: Identify Your PostgreSQL Process Memory Footprint
Start by establishing baseline memory usage patterns for your PostgreSQL processes. Use ps with specific formatting to track shared memory and private memory separately:
ps -eo pid,ppid,comm,rss,vsz,pmem --sort=-rss | grep postgres
Record the resident set size (RSS) and virtual memory (VSZ) for your postmaster process and typical worker processes during normal operations. Create a baseline during off-peak hours, then compare during high-activity periods.
The postmaster process RSS represents your sharedbuffers allocation plus overhead, while individual backend processes show workmem usage patterns. Document these ranges for threshold setting.
Step 2: Configure work_mem Usage Detection
Monitor workmem exhaustion by tracking individual PostgreSQL backend process memory growth. Each backend that exceeds your configured workmem setting will show increasing RSS values as it falls back to disk-based operations.
Create a monitoring script that compares current backend RSS against your workmem configuration. Extract your workmem setting with SHOW work_mem once, then monitor system processes rather than repeatedly querying the database.
Set your alert threshold at 80% of workmem per backend process. Multiple backends approaching this threshold simultaneously indicates you need either more memory or workmem tuning.
Step 3: Track shared_buffers Hit Rates Through /proc
Monitor shared_buffers effectiveness by tracking PostgreSQL's total memory allocation against configured limits. Check /proc/meminfo for overall memory pressure indicators that affect buffer cache performance.
The PostgreSQL documentation on shared memory explains how shared_buffers interacts with system memory. Monitor for increasing swap usage when PostgreSQL memory allocation approaches system limits.
Watch /proc/vmstat for pgpgin/pgpgout rates. Sudden increases during normal database operations indicate shared_buffers pressure forcing disk I/O.
Step 4: Set Up Memory Pressure Early Warning
Configure monitoring that detects memory pressure before PostgreSQL performance degrades. Track these system indicators:
Monitor swap usage trends rather than absolute values. Any swap allocation by PostgreSQL processes indicates memory pressure, regardless of total system swap percentage.
Watch for OOM killer activity in /var/log/kern.log or dmesg. PostgreSQL memory exhaustion often triggers OOM conditions before you notice performance issues.
Set alerts on memory allocation rate-of-change. Gradual memory increases over hours indicate potential leaks, while sudden spikes suggest runaway queries exceeding work_mem limits.
Step 5: Build Automated Memory Usage Baselines
Establish dynamic baselines that adapt to your PostgreSQL workload patterns. Track memory usage during different operational periods - batch processing, peak user activity, maintenance windows.
Record typical memory usage ranges for different times of day and days of the week. Your monitoring system should understand that Sunday night batch processing legitimately uses more memory than Tuesday afternoon user queries.
Implement percentage-based alerts rather than fixed thresholds. Alert when current memory usage exceeds 150% of the historical average for the same time period, accounting for growth trends.
Step 6: Integrate with Production Monitoring
Connect your PostgreSQL memory monitoring to your existing alerting infrastructure. Server Scout's plugin system enables custom PostgreSQL monitoring through bash scripts that track these system-level metrics without database query overhead.
Configure escalating alerts - warning at 75% of memory thresholds, critical at 90%, and immediate notification for any swap usage by PostgreSQL processes. Include recovery notifications so you know when memory pressure resolves.
This approach gives you 15-30 minutes warning before memory exhaustion affects query performance, compared to reactive monitoring that only alerts after users report slowness. The zero-dependency monitoring approach ensures your monitoring system remains responsive even during PostgreSQL memory pressure events.
Step 7: Test and Validate Your Monitoring
Validate your monitoring setup by intentionally creating controlled memory pressure. Temporarily reduce work_mem settings or run memory-intensive queries during off-peak hours.
Verify that your alerts fire before application performance degrades. Test that recovery notifications work properly so you know when memory pressure resolves.
Document the correlation between your system-level metrics and actual PostgreSQL performance. This correlation data helps you fine-tune alert thresholds and provides context during production incidents.
Your PostgreSQL memory monitoring now provides early warning of exhaustion scenarios through system metrics, giving you time to address issues before they affect applications. This system-level approach avoids adding database query overhead during the exact moments when memory pressure is highest.
FAQ
How accurate is system-level memory monitoring compared to PostgreSQL internal metrics?
System metrics provide earlier warning signs since they detect memory pressure before it affects database performance. Internal PostgreSQL metrics are more precise but add query overhead when the database is already struggling with memory constraints.
Should I monitor individual backend processes or just the overall PostgreSQL memory usage?
Monitor both - overall usage shows sharedbuffers pressure, while individual backend monitoring reveals workmem exhaustion patterns. Backends consistently exceeding work_mem indicate configuration tuning needs.
What's the ideal alert threshold for PostgreSQL memory usage?
Set warnings at 75% of your memory limits and critical alerts at 90%. Any swap usage by PostgreSQL processes should trigger immediate alerts, regardless of total system swap percentage.