Load Average Metrics Explained

Load averages are one of the most fundamental Linux performance metrics, yet they're often misunderstood. Unlike simple CPU percentage, load averages capture the broader picture of system activity—including processes waiting for both CPU time and I/O operations. This comprehensive metric makes load averages particularly valuable for understanding overall system health.

Server Scout collects four key metrics in this category: load_1m, load_5m, load_15m, and uptime. These metrics are gathered every 5 minutes as part of the slow tier collection, reading directly from /proc/loadavg and /proc/uptime.

Understanding Load Averages

Load average represents the number of processes that are either currently running on the CPU or waiting in the run queue. The run queue includes processes that are ready to run but waiting for an available CPU core, as well as processes blocked on I/O operations.

The three load metrics provide different time perspectives:

Metric	Description	Use Case
`load_1m`	1-minute load average	Most responsive indicator, shows current activity
`load_5m`	5-minute load average	Smooths short-term spikes, better for alerts
`load_15m`	15-minute load average	Long-term trend, shows sustained load patterns

These aren't simple averages—they're exponentially damped moving averages. The 1-minute load gives more weight to recent activity, whilst the 15-minute load provides a smoother, longer-term view.

The CPU Core Reference Point

The critical reference point for interpreting load averages is your server's CPU core count. Server Scout reports this as cpu_cores in the fast tier metrics. Here's the fundamental relationship:

Load < CPU cores: System has spare capacity
Load = CPU cores: System is at full utilisation
Load > CPU cores: System is overloaded with processes queuing

For example, on a 4-core server:

Load of 2.0 = 50% utilisation
Load of 4.0 = 100% utilisation (fully loaded but not overloaded)
Load of 6.0 = 150% utilisation (overloaded—processes are waiting)

However, this interpretation has important nuances on Linux systems.

Linux Load Averages Include I/O Wait

Unlike other Unix systems, Linux includes processes waiting for I/O operations in the load average calculation. This means high load doesn't necessarily indicate CPU saturation—it might indicate I/O bottlenecks.

Consider these scenarios:

CPU-bound load: High load average with high cpu_percent and low cpu_iowait

Processes are actively consuming CPU cycles
Adding CPU cores or optimising code helps

I/O-bound load: High load average with moderate cpu_percent and high cpu_iowait

Processes are waiting for disk or network I/O
Storage optimisation or faster disks help more than additional CPU cores

This is why Server Scout's 5-second CPU metrics (cpu_iowait, cpu_percent) are essential for properly interpreting load averages. Always cross-reference high load with these CPU breakdown metrics.

Comparing Load Average Timeframes

The relationship between 1-minute, 5-minute, and 15-minute load averages reveals system trends:

Comparison	Interpretation	Action
`load_1m` >> `load_15m`	Load is increasing rapidly	Investigate developing problem
`load_1m` << `load_15m`	Load is decreasing	System recovering from spike
`load_1m` ≈ `load_5m` ≈ `load_15m`	Load is stable	Normal steady-state operation

For example, if you see load_1m = 6.2, load_5m = 4.8, load_15m = 2.1 on a 4-core system, this indicates a developing performance problem. The load has been climbing steadily and is now significantly above the CPU core count.

Conversely, load_1m = 1.5, load_5m = 3.2, load_15m = 5.8 suggests the system is recovering from a recent spike—perhaps a batch job has completed or traffic has subsided.

Practical Load Average Thresholds

Whilst load interpretation depends heavily on your specific workload, these general thresholds provide a starting point:

Threshold	Formula	Interpretation
Normal	Load < cores × 0.7	Healthy operation with headroom
Warning	Load > cores × 0.8	Approaching capacity, monitor closely
Critical	Load > cores × 1.5	Overloaded, performance degradation likely

Many monitoring setups use the 5-minute load average for alerting, as it balances responsiveness with stability. A brief spike in 1-minute load might not warrant an alert, but sustained high 5-minute load indicates a genuine problem.

However, these thresholds must be tuned per server type:

Web servers: Consistently high load may indicate insufficient capacity for traffic
Build servers: Brief spikes to 2× CPU cores during compilation are normal
Database servers: Sustained load above CPU count often indicates query optimisation needs
Batch processing servers: High load during scheduled jobs is expected

Load Spikes vs Sustained Load

The multi-timeframe view helps distinguish between different types of load issues:

Transient spikes appear as high load_1m but normal load_5m and load_15m. These might be:

Cron jobs or scheduled tasks
Brief traffic surges
Application startup processes

Sustained overload shows high values across all three timeframes, indicating:

Insufficient server capacity
Runaway processes
I/O bottlenecks affecting multiple processes

Gradual degradation appears as load_15m > load_5m > load_1m, suggesting:

Memory leaks causing increasing resource usage
Growing datasets outpacing server capacity
Progressive I/O subsystem degradation

System Uptime Context

The uptime metric provides crucial context for interpreting load patterns. Server Scout reports this as a human-readable string (e.g., "45 days, 3:22"), parsed from /proc/uptime.

Long uptimes (weeks or months) combined with stable load averages indicate:

System stability and proper capacity planning
Successful long-term operation
Well-tuned applications and configurations

Short uptimes warrant investigation, especially if unexpected:

Check the reboot_required metric for planned maintenance
Cross-reference with oom_kills for memory pressure issues
Review system logs for kernel panics or hardware problems

Frequent reboots with high pre-reboot load might indicate:

Out-of-memory conditions triggering the OOM killer
System instability under load
Hardware issues causing crashes

Integration with Other Server Scout Metrics

Load averages become most valuable when interpreted alongside other Server Scout metrics:

Memory pressure: High load with increasing mem_percent and swap usage suggests memory constraints are causing I/O wait

Disk I/O: High load with elevated disk_io_read_bytes and disk_io_write_bytes rates indicates storage bottlenecks

Network activity: High load coinciding with network transfer spikes might indicate network I/O wait

Process counts: Unusual processes_running or processes_blocked values help explain load anomalies

Context switches: Extremely high context_switches rates can contribute to load without proportional work completion

Monitoring and Alerting Best Practices

When configuring load average monitoring:

Use 5-minute load for primary alerts—it's responsive but not overly sensitive
Set thresholds based on your CPU core count—a 16-core server has very different capacity than a 2-core VPS
Include trend analysis—alert when 1-minute load significantly exceeds 15-minute load
Context matters—high load during known batch processing windows might be acceptable
Correlate with other metrics—load alerts should prompt checking CPU breakdown, memory usage, and I/O rates

Server Scout's dashboard time ranges (1h raw data, 6h with 30s averaging, 24h with 2min averaging, 7d with 15min averaging) provide excellent visibility into both immediate load issues and longer-term capacity trends.

Load averages remain one of the most reliable indicators of overall system health. By understanding their nuances—particularly Linux's inclusion of I/O wait—and interpreting them within the context of your server's CPU core count and workload patterns, you gain powerful insight into system performance and capacity planning needs.

Back to Complete Reference Index

Frequently Asked Questions

What is a good load average for a Linux server?

A healthy load average depends on the number of CPU cores. As a rule of thumb, load should stay below the CPU core count for normal operation. A 4-core server with a load average of 3.5 is busy but healthy. Sustained load above the core count means processes are queuing for CPU time. Brief spikes above core count during peak activity are normal and not necessarily a problem.

What is the difference between 1-minute and 15-minute load average?

The 1-minute load average (load_1m) reflects very recent system load and is most responsive to spikes. The 5-minute (load_5m) smooths out short bursts. The 15-minute (load_15m) shows sustained trends. Compare them to understand the direction: if load_1m is much higher than load_15m, load is increasing. If load_1m is lower, the system is recovering from a spike.

Why is my load average high when CPU usage is low?

Load average counts all processes in a runnable or uninterruptible state, not just those using CPU. Processes waiting for disk I/O (in D state) contribute to load average without consuming CPU. High load with low CPU usage typically indicates I/O bottlenecks. Check cpu_iowait, disk I/O metrics, and processes_blocked to confirm. This is one of the most common sources of confusion in Linux performance analysis.

How does uptime relate to system stability?

The uptime metric shows seconds since last boot. While long uptimes indicate stability, they can also mean the server is missing kernel security patches that require reboots. Check the reboot_required metric to see if pending updates need a reboot. Modern best practice balances uptime with regular, planned maintenance reboots for security updates.

Was this article helpful?

Search Results