Virtual Memory (VMstat) Metrics Explained

Virtual memory management is one of Linux's most sophisticated features, allowing the system to efficiently handle memory allocation, sharing, and swapping. The Server Scout agent monitors four key virtual memory metrics that provide crucial insights into your system's memory behaviour and potential performance bottlenecks.

These metrics help you distinguish between normal memory operations and concerning memory pressure that could degrade application performance.

Understanding Virtual Memory Metrics

Server Scout collects virtual memory statistics from /proc/vmstat every 30 seconds as part of the medium monitoring tier. All four metrics are cumulative counters, meaning they continuously increase over time. The dashboard charts convert these into rates (events per second) by calculating the difference between consecutive readings.

MetricDescriptionSource
page_faultsTotal page faults (minor + major)/proc/vmstat pgfault
page_faults_majorMajor page faults requiring disk I/O/proc/vmstat pgmajfault
swap_in_pagesPages read from swap space/proc/vmstat pswpin
swap_out_pagesPages written to swap space/proc/vmstat pswpout

Page Faults: The Foundation of Virtual Memory

Minor Page Faults: Normal and Harmless

A page fault occurs whenever a process attempts to access a memory page that isn't currently mapped in its virtual address space. The vast majority of page faults are minor faults, which are resolved entirely within RAM without any disk I/O.

Minor page faults happen constantly during normal system operation for several legitimate reasons:

  • Memory mapping: When a process maps a file into memory using mmap(), the pages aren't immediately loaded. They're faulted in on first access.
  • Shared libraries: Multiple processes sharing the same library code fault in the same physical pages.
  • Copy-on-write: When a process forks, the parent and child initially share memory pages. A fault occurs when either tries to modify a shared page.
  • Demand paging: Even program code is loaded on-demand as functions are first called.

A minor page fault typically resolves in microseconds. The kernel simply updates the process's page table to point to an existing page in physical memory. A rate of 1,000-10,000 minor page faults per second is completely normal on an active system.

Major Page Faults: The Performance Killers

A major page fault occurs when the requested page isn't in physical memory at all and must be read from disk. This might be program code that hasn't been loaded yet, or data that was previously swapped out to disk.

Major page faults are expensive because they require disk I/O. Each major fault can stall the requesting process for milliseconds whilst the kernel reads the page from storage. Even on fast NVMe drives, this is thousands of times slower than a minor fault.

Sustained major fault rates above 100 per second typically indicate memory pressure. The system is thrashing—constantly reading pages from disk because there isn't enough RAM to keep frequently-accessed pages in memory.

Swap Activity: Understanding Memory Pressure

Swap space allows Linux to extend available memory by writing rarely-used pages to disk. However, it's crucial to distinguish between swap usage and swap activity.

Swap Out: Freeing Memory Under Pressure

The swap_out_pages metric tracks how many pages the kernel writes to swap space. This happens when:

  • Physical memory becomes scarce
  • The kernel needs to free RAM for more urgent uses
  • Pages haven't been accessed recently and are good candidates for eviction

Occasional swap-out activity is normal, especially after periods of high memory usage. The kernel proactively swaps out genuinely unused pages to keep more RAM available for active processes and file system caches.

However, sustained swap-out activity indicates the system is under memory pressure and struggling to keep working sets in RAM.

Swap In: The Cost of Insufficient Memory

The swap_in_pages metric tracks pages being read back from swap space into RAM. Any non-zero swap-in rate means processes are accessing data that was previously evicted to disk.

Light swap-in activity after a period of low activity is normal—the system might swap in a few pages as you resume work on idle applications. But sustained swap-in activity is a clear sign of memory exhaustion. The system lacks sufficient RAM for current workloads and is constantly shuttling data between memory and disk.

Swap Activity vs Swap Usage

It's important to understand the relationship between these rate metrics and the static mem_swap_used_mb metric from Server Scout's fast tier:

  • High swap usage, low swap I/O: Memory was tight in the past, rarely-used pages were swapped out, but the situation has stabilised. This is generally acceptable.
  • Low swap usage, high swap I/O: The system is actively thrashing with frequent swap operations, even if the total amount in swap is small. This indicates severe memory pressure.
  • High swap usage, high swap I/O: Classic memory exhaustion. The system has both a large swap footprint and active memory pressure.

Reading the Dashboard: Identifying Memory Problems

The Server Scout dashboard shows these metrics as rate charts (events per second) across different time ranges. Here's how to interpret the patterns:

Normal Behaviour Patterns

  • Steady minor page fault rate: 1,000-10,000 faults/sec during normal operation
  • Occasional major fault spikes: Brief increases when applications start or open large files
  • Low background swap activity: Occasional swap-out during idle periods, minimal swap-in

Memory Pressure Warning Signs

Look for these correlated patterns that indicate developing memory problems:

  1. Rising major fault rate: Sustained increases above 100/sec
  2. Increasing swap-out activity: The kernel is aggressively freeing memory
  3. Decreasing mem_available_mb: Available memory is falling (visible in memory charts)
  4. Rising application response times: As major faults increase, applications become less responsive

Critical Memory Exhaustion

The classic memory exhaustion pattern shows:

  • High major fault rate: Often 1,000+ faults/sec
  • Active swap-in and swap-out: Constant bidirectional swap traffic
  • Very low mem_available_mb: Less than 5-10% of total memory available
  • High I/O wait times: Visible in cpu_iowait as the system waits for swap I/O

Correlating with Other Server Scout Metrics

Virtual memory metrics don't exist in isolation. Effective monitoring requires understanding their relationships with other system metrics:

Memory Metrics Connection

Memory MetricRelationship to VMstat
mem_available_mbLow values correlate with high major faults and swap activity
mem_swap_used_mbShows current swap usage; compare with swap I/O rates
mem_cached_mbMay decrease as kernel frees cache to satisfy memory pressure

CPU and I/O Impact

Memory pressure affects other system resources:

  • cpu_iowait: Increases as processes wait for major fault resolution and swap I/O
  • disk_io_read_bytes: Swap-in activity contributes to disk read load
  • disk_io_write_bytes: Swap-out activity increases disk writes
  • load_1m: Processes blocked on memory I/O contribute to system load

Optimising Based on VMstat Metrics

Understanding these metrics helps guide system optimisation:

When Major Faults Are High

  • Add more RAM: The most direct solution for memory pressure
  • Optimise application memory usage: Identify memory-hungry processes
  • Adjust swappiness: Lower /proc/sys/vm/swappiness to prefer freeing cache over swapping

When Swap Activity Is Excessive

  • Increase swap space: If you can't add RAM immediately, more swap space can reduce I/O pressure
  • Improve swap storage: Place swap on faster storage (SSD rather than HDD)
  • Consider zswap: Compress pages in RAM before writing to disk swap

Capacity Planning

Use historical trends in these metrics to plan capacity:

  • Gradual increases: Growing major fault rates over weeks or months indicate growing memory requirements
  • Periodic spikes: Regular patterns might indicate batch jobs or backups that need memory tuning
  • Correlation with workload: Compare memory pressure metrics with application-specific metrics

Monitoring Strategy

For effective virtual memory monitoring with Server Scout:

  1. Set up alerting: Monitor major fault rates and swap I/O for sustained increases
  2. Establish baselines: Understand normal patterns for your workloads
  3. Correlate metrics: Always examine memory pressure in context with CPU, disk, and application metrics
  4. Track trends: Use Server Scout's historical data to identify gradual degradation

The virtual memory subsystem is complex, but these four key metrics provide a clear window into your system's memory health. By understanding the difference between normal paging activity and concerning memory pressure, you can maintain optimal system performance and plan capacity upgrades before problems impact your applications.

Back to Complete Reference Index

Frequently Asked Questions

What are major page faults in Linux?

Major page faults (page_faults_major) occur when a process accesses memory that must be loaded from disk, typically because the page was swapped out or is being accessed for the first time from a memory-mapped file. They require disk I/O and are significantly slower than minor page faults (which are served from memory). High major page fault rates indicate memory pressure or heavy memory-mapped file usage.

What is the difference between page faults and swap activity?

Page faults (page_faults) include both minor faults (resolved from memory, very fast) and major faults (require disk I/O). Swap activity (swap_in_pages, swap_out_pages) specifically tracks pages moved between RAM and swap space. Swap activity is always concerning as it means RAM is insufficient. Page faults alone are normal because minor faults happen constantly during normal memory allocation.

When should I worry about swap activity?

Active swapping (non-zero rates for swap_in_pages and swap_out_pages) indicates the system is under memory pressure and moving data between RAM and disk. Even small amounts of swap activity cause significant performance degradation because disk access is orders of magnitude slower than RAM. Investigate immediately if swap rates are sustained. Consider adding RAM or reducing the workload.

How do VMstat metrics relate to memory and disk metrics?

VMstat metrics provide the mechanism-level view: page_faults show how the kernel is handling memory access, while swap pages show memory-to-disk movement. Memory metrics (mem_percent, mem_available_mb) show capacity. Disk I/O metrics show total throughput. Together they tell a complete story: low available memory causes swapping, which causes disk I/O, which increases iowait. VMstat connects the dots between memory and disk.

Was this article helpful?