Load averages are one of the most fundamental Linux performance metrics, yet they're often misunderstood. Unlike simple CPU percentage, load averages capture the broader picture of system activity—including processes waiting for both CPU time and I/O operations. This comprehensive metric makes load averages particularly valuable for understanding overall system health.
Server Scout collects four key metrics in this category: load_1m, load_5m, load_15m, and uptime. These metrics are gathered every 5 minutes as part of the slow tier collection, reading directly from /proc/loadavg and /proc/uptime.
Understanding Load Averages
Load average represents the number of processes that are either currently running on the CPU or waiting in the run queue. The run queue includes processes that are ready to run but waiting for an available CPU core, as well as processes blocked on I/O operations.
The three load metrics provide different time perspectives:
| Metric | Description | Use Case |
|---|---|---|
load_1m | 1-minute load average | Most responsive indicator, shows current activity |
load_5m | 5-minute load average | Smooths short-term spikes, better for alerts |
load_15m | 15-minute load average | Long-term trend, shows sustained load patterns |
These aren't simple averages—they're exponentially damped moving averages. The 1-minute load gives more weight to recent activity, whilst the 15-minute load provides a smoother, longer-term view.
The CPU Core Reference Point
The critical reference point for interpreting load averages is your server's CPU core count. Server Scout reports this as cpu_cores in the fast tier metrics. Here's the fundamental relationship:
- Load < CPU cores: System has spare capacity
- Load = CPU cores: System is at full utilisation
- Load > CPU cores: System is overloaded with processes queuing
For example, on a 4-core server:
- Load of 2.0 = 50% utilisation
- Load of 4.0 = 100% utilisation (fully loaded but not overloaded)
- Load of 6.0 = 150% utilisation (overloaded—processes are waiting)
However, this interpretation has important nuances on Linux systems.
Linux Load Averages Include I/O Wait
Unlike other Unix systems, Linux includes processes waiting for I/O operations in the load average calculation. This means high load doesn't necessarily indicate CPU saturation—it might indicate I/O bottlenecks.
Consider these scenarios:
CPU-bound load: High load average with high cpu_percent and low cpu_iowait
- Processes are actively consuming CPU cycles
- Adding CPU cores or optimising code helps
I/O-bound load: High load average with moderate cpu_percent and high cpu_iowait
- Processes are waiting for disk or network I/O
- Storage optimisation or faster disks help more than additional CPU cores
This is why Server Scout's 5-second CPU metrics (cpu_iowait, cpu_percent) are essential for properly interpreting load averages. Always cross-reference high load with these CPU breakdown metrics.
Comparing Load Average Timeframes
The relationship between 1-minute, 5-minute, and 15-minute load averages reveals system trends:
| Comparison | Interpretation | Action |
|---|---|---|
load_1m >> load_15m | Load is increasing rapidly | Investigate developing problem |
load_1m << load_15m | Load is decreasing | System recovering from spike |
load_1m ≈ load_5m ≈ load_15m | Load is stable | Normal steady-state operation |
For example, if you see load_1m = 6.2, load_5m = 4.8, load_15m = 2.1 on a 4-core system, this indicates a developing performance problem. The load has been climbing steadily and is now significantly above the CPU core count.
Conversely, load_1m = 1.5, load_5m = 3.2, load_15m = 5.8 suggests the system is recovering from a recent spike—perhaps a batch job has completed or traffic has subsided.
Practical Load Average Thresholds
Whilst load interpretation depends heavily on your specific workload, these general thresholds provide a starting point:
| Threshold | Formula | Interpretation |
|---|---|---|
| Normal | Load < cores × 0.7 | Healthy operation with headroom |
| Warning | Load > cores × 0.8 | Approaching capacity, monitor closely |
| Critical | Load > cores × 1.5 | Overloaded, performance degradation likely |
Many monitoring setups use the 5-minute load average for alerting, as it balances responsiveness with stability. A brief spike in 1-minute load might not warrant an alert, but sustained high 5-minute load indicates a genuine problem.
However, these thresholds must be tuned per server type:
- Web servers: Consistently high load may indicate insufficient capacity for traffic
- Build servers: Brief spikes to 2× CPU cores during compilation are normal
- Database servers: Sustained load above CPU count often indicates query optimisation needs
- Batch processing servers: High load during scheduled jobs is expected
Load Spikes vs Sustained Load
The multi-timeframe view helps distinguish between different types of load issues:
Transient spikes appear as high load_1m but normal load_5m and load_15m. These might be:
- Cron jobs or scheduled tasks
- Brief traffic surges
- Application startup processes
Sustained overload shows high values across all three timeframes, indicating:
- Insufficient server capacity
- Runaway processes
- I/O bottlenecks affecting multiple processes
Gradual degradation appears as load_15m > load_5m > load_1m, suggesting:
- Memory leaks causing increasing resource usage
- Growing datasets outpacing server capacity
- Progressive I/O subsystem degradation
System Uptime Context
The uptime metric provides crucial context for interpreting load patterns. Server Scout reports this as a human-readable string (e.g., "45 days, 3:22"), parsed from /proc/uptime.
Long uptimes (weeks or months) combined with stable load averages indicate:
- System stability and proper capacity planning
- Successful long-term operation
- Well-tuned applications and configurations
Short uptimes warrant investigation, especially if unexpected:
- Check the
reboot_requiredmetric for planned maintenance - Cross-reference with
oom_killsfor memory pressure issues - Review system logs for kernel panics or hardware problems
Frequent reboots with high pre-reboot load might indicate:
- Out-of-memory conditions triggering the OOM killer
- System instability under load
- Hardware issues causing crashes
Integration with Other Server Scout Metrics
Load averages become most valuable when interpreted alongside other Server Scout metrics:
Memory pressure: High load with increasing mem_percent and swap usage suggests memory constraints are causing I/O wait
Disk I/O: High load with elevated disk_io_read_bytes and disk_io_write_bytes rates indicates storage bottlenecks
Network activity: High load coinciding with network transfer spikes might indicate network I/O wait
Process counts: Unusual processes_running or processes_blocked values help explain load anomalies
Context switches: Extremely high context_switches rates can contribute to load without proportional work completion
Monitoring and Alerting Best Practices
When configuring load average monitoring:
- Use 5-minute load for primary alerts—it's responsive but not overly sensitive
- Set thresholds based on your CPU core count—a 16-core server has very different capacity than a 2-core VPS
- Include trend analysis—alert when 1-minute load significantly exceeds 15-minute load
- Context matters—high load during known batch processing windows might be acceptable
- Correlate with other metrics—load alerts should prompt checking CPU breakdown, memory usage, and I/O rates
Server Scout's dashboard time ranges (1h raw data, 6h with 30s averaging, 24h with 2min averaging, 7d with 15min averaging) provide excellent visibility into both immediate load issues and longer-term capacity trends.
Load averages remain one of the most reliable indicators of overall system health. By understanding their nuances—particularly Linux's inclusion of I/O wait—and interpreting them within the context of your server's CPU core count and workload patterns, you gain powerful insight into system performance and capacity planning needs.
Back to Complete Reference IndexFrequently Asked Questions
What is a good load average for a Linux server?
What is the difference between 1-minute and 15-minute load average?
Why is my load average high when CPU usage is low?
How does uptime relate to system stability?
Was this article helpful?