Virtual memory management is one of Linux's most sophisticated features, allowing the system to efficiently handle memory allocation, sharing, and swapping. The Server Scout agent monitors four key virtual memory metrics that provide crucial insights into your system's memory behaviour and potential performance bottlenecks.
These metrics help you distinguish between normal memory operations and concerning memory pressure that could degrade application performance.
Understanding Virtual Memory Metrics
Server Scout collects virtual memory statistics from /proc/vmstat every 30 seconds as part of the medium monitoring tier. All four metrics are cumulative counters, meaning they continuously increase over time. The dashboard charts convert these into rates (events per second) by calculating the difference between consecutive readings.
| Metric | Description | Source |
|---|---|---|
page_faults | Total page faults (minor + major) | /proc/vmstat pgfault |
page_faults_major | Major page faults requiring disk I/O | /proc/vmstat pgmajfault |
swap_in_pages | Pages read from swap space | /proc/vmstat pswpin |
swap_out_pages | Pages written to swap space | /proc/vmstat pswpout |
Page Faults: The Foundation of Virtual Memory
Minor Page Faults: Normal and Harmless
A page fault occurs whenever a process attempts to access a memory page that isn't currently mapped in its virtual address space. The vast majority of page faults are minor faults, which are resolved entirely within RAM without any disk I/O.
Minor page faults happen constantly during normal system operation for several legitimate reasons:
- Memory mapping: When a process maps a file into memory using
mmap(), the pages aren't immediately loaded. They're faulted in on first access. - Shared libraries: Multiple processes sharing the same library code fault in the same physical pages.
- Copy-on-write: When a process forks, the parent and child initially share memory pages. A fault occurs when either tries to modify a shared page.
- Demand paging: Even program code is loaded on-demand as functions are first called.
A minor page fault typically resolves in microseconds. The kernel simply updates the process's page table to point to an existing page in physical memory. A rate of 1,000-10,000 minor page faults per second is completely normal on an active system.
Major Page Faults: The Performance Killers
A major page fault occurs when the requested page isn't in physical memory at all and must be read from disk. This might be program code that hasn't been loaded yet, or data that was previously swapped out to disk.
Major page faults are expensive because they require disk I/O. Each major fault can stall the requesting process for milliseconds whilst the kernel reads the page from storage. Even on fast NVMe drives, this is thousands of times slower than a minor fault.
Sustained major fault rates above 100 per second typically indicate memory pressure. The system is thrashing—constantly reading pages from disk because there isn't enough RAM to keep frequently-accessed pages in memory.
Swap Activity: Understanding Memory Pressure
Swap space allows Linux to extend available memory by writing rarely-used pages to disk. However, it's crucial to distinguish between swap usage and swap activity.
Swap Out: Freeing Memory Under Pressure
The swap_out_pages metric tracks how many pages the kernel writes to swap space. This happens when:
- Physical memory becomes scarce
- The kernel needs to free RAM for more urgent uses
- Pages haven't been accessed recently and are good candidates for eviction
Occasional swap-out activity is normal, especially after periods of high memory usage. The kernel proactively swaps out genuinely unused pages to keep more RAM available for active processes and file system caches.
However, sustained swap-out activity indicates the system is under memory pressure and struggling to keep working sets in RAM.
Swap In: The Cost of Insufficient Memory
The swap_in_pages metric tracks pages being read back from swap space into RAM. Any non-zero swap-in rate means processes are accessing data that was previously evicted to disk.
Light swap-in activity after a period of low activity is normal—the system might swap in a few pages as you resume work on idle applications. But sustained swap-in activity is a clear sign of memory exhaustion. The system lacks sufficient RAM for current workloads and is constantly shuttling data between memory and disk.
Swap Activity vs Swap Usage
It's important to understand the relationship between these rate metrics and the static mem_swap_used_mb metric from Server Scout's fast tier:
- High swap usage, low swap I/O: Memory was tight in the past, rarely-used pages were swapped out, but the situation has stabilised. This is generally acceptable.
- Low swap usage, high swap I/O: The system is actively thrashing with frequent swap operations, even if the total amount in swap is small. This indicates severe memory pressure.
- High swap usage, high swap I/O: Classic memory exhaustion. The system has both a large swap footprint and active memory pressure.
Reading the Dashboard: Identifying Memory Problems
The Server Scout dashboard shows these metrics as rate charts (events per second) across different time ranges. Here's how to interpret the patterns:
Normal Behaviour Patterns
- Steady minor page fault rate: 1,000-10,000 faults/sec during normal operation
- Occasional major fault spikes: Brief increases when applications start or open large files
- Low background swap activity: Occasional swap-out during idle periods, minimal swap-in
Memory Pressure Warning Signs
Look for these correlated patterns that indicate developing memory problems:
- Rising major fault rate: Sustained increases above 100/sec
- Increasing swap-out activity: The kernel is aggressively freeing memory
- Decreasing
mem_available_mb: Available memory is falling (visible in memory charts) - Rising application response times: As major faults increase, applications become less responsive
Critical Memory Exhaustion
The classic memory exhaustion pattern shows:
- High major fault rate: Often 1,000+ faults/sec
- Active swap-in and swap-out: Constant bidirectional swap traffic
- Very low
mem_available_mb: Less than 5-10% of total memory available - High I/O wait times: Visible in
cpu_iowaitas the system waits for swap I/O
Correlating with Other Server Scout Metrics
Virtual memory metrics don't exist in isolation. Effective monitoring requires understanding their relationships with other system metrics:
Memory Metrics Connection
| Memory Metric | Relationship to VMstat |
|---|---|
mem_available_mb | Low values correlate with high major faults and swap activity |
mem_swap_used_mb | Shows current swap usage; compare with swap I/O rates |
mem_cached_mb | May decrease as kernel frees cache to satisfy memory pressure |
CPU and I/O Impact
Memory pressure affects other system resources:
cpu_iowait: Increases as processes wait for major fault resolution and swap I/Odisk_io_read_bytes: Swap-in activity contributes to disk read loaddisk_io_write_bytes: Swap-out activity increases disk writesload_1m: Processes blocked on memory I/O contribute to system load
Optimising Based on VMstat Metrics
Understanding these metrics helps guide system optimisation:
When Major Faults Are High
- Add more RAM: The most direct solution for memory pressure
- Optimise application memory usage: Identify memory-hungry processes
- Adjust swappiness: Lower
/proc/sys/vm/swappinessto prefer freeing cache over swapping
When Swap Activity Is Excessive
- Increase swap space: If you can't add RAM immediately, more swap space can reduce I/O pressure
- Improve swap storage: Place swap on faster storage (SSD rather than HDD)
- Consider zswap: Compress pages in RAM before writing to disk swap
Capacity Planning
Use historical trends in these metrics to plan capacity:
- Gradual increases: Growing major fault rates over weeks or months indicate growing memory requirements
- Periodic spikes: Regular patterns might indicate batch jobs or backups that need memory tuning
- Correlation with workload: Compare memory pressure metrics with application-specific metrics
Monitoring Strategy
For effective virtual memory monitoring with Server Scout:
- Set up alerting: Monitor major fault rates and swap I/O for sustained increases
- Establish baselines: Understand normal patterns for your workloads
- Correlate metrics: Always examine memory pressure in context with CPU, disk, and application metrics
- Track trends: Use Server Scout's historical data to identify gradual degradation
The virtual memory subsystem is complex, but these four key metrics provide a clear window into your system's memory health. By understanding the difference between normal paging activity and concerning memory pressure, you can maintain optimal system performance and plan capacity upgrades before problems impact your applications.
Back to Complete Reference IndexFrequently Asked Questions
What are major page faults in Linux?
What is the difference between page faults and swap activity?
When should I worry about swap activity?
How do VMstat metrics relate to memory and disk metrics?
Was this article helpful?