Why Server Scout Uses Tiered Collection
The Server Scout agent employs a sophisticated 5-tier data collection system that many customers initially find surprising. Why does CPU usage update every 5 seconds whilst package updates appear only once daily? The answer lies in the fundamental design philosophy: near-zero footprint monitoring that balances detection speed against resource cost.
Understanding these tiers will help you interpret your dashboard data more effectively and appreciate why certain metrics appear to update at different intervals.
The Resource Cost Problem
A naive monitoring approach would collect every possible metric every 5 seconds. This seems logical—more frequent data means better visibility, right? In practice, this creates substantial overhead:
- Dozens of command forks per collection cycle
- Measurable CPU impact from constant subprocess creation
- Unnecessary strain on system resources
- Identical data points for metrics that rarely change
Server Scout takes a different approach. The agent is a pure Bash script that reads primarily from /proc and /sys virtual filesystems—kernel-served data with zero disk I/O. Only when necessary does it fork external commands, and then only at intervals appropriate to each metric's rate of change.
The Five Collection Tiers
Fast Tier: Every 5 Seconds
The Fast Tier captures the most volatile and critical metrics—those that change rapidly and require immediate alerting capabilities.
What's collected:
| Metric Category | Metrics | Source |
|---|---|---|
| CPU Usage | cpu_percent, cpu_user, cpu_system, cpu_iowait, cpu_steal, cpu_nice, cpu_irq, cpu_softirq | /proc/stat |
| CPU Information | cpu_cores, cpu_model, cpu_temp | /proc/cpuinfo, /sys/class/thermal |
| Memory Core | mem_percent, mem_used_gb, mem_total_gb, mem_available_mb, mem_cached_mb, mem_buffers_mb | /proc/meminfo |
| Memory Detail | mem_swap_used_mb, mem_swap_total_mb, mem_dirty_mb, mem_shmem_mb, mem_slab_reclaimable_mb | /proc/meminfo |
Why these metrics are Fast Tier:
- CPU utilisation can spike from 5% to 95% within seconds during traffic bursts or batch jobs
- Memory pressure can escalate rapidly, especially in containerised environments
- These metrics are essential for real-time alerting on performance issues
Resource cost: ~50-100ms CPU per 5-second cycle. All data comes from reading two virtual files (/proc/stat and /proc/meminfo) served directly by the kernel—no disk I/O, no command forks.
Medium Tier: Every 30 Seconds
The Medium Tier covers metrics that change frequently but don't require 5-second granularity for effective monitoring.
What's collected:
| Metric Category | Metrics | Source |
|---|---|---|
| Network I/O | net_rx_bytes, net_tx_bytes, net_rx_errors, net_tx_errors, net_rx_dropped, net_tx_dropped | /proc/net/dev |
| Network Identity | net_interface, net_ip, net_mac | /proc/net/dev, system interfaces |
| Disk I/O | disk_io_read_bytes, disk_io_write_bytes | /proc/diskstats |
| Virtual Memory | page_faults, page_faults_major, swap_in_pages, swap_out_pages | /proc/vmstat |
| TCP Connections | tcp_connections, tcp_established, tcp_time_wait, tcp_close_wait, tcp_listen | /proc/net/tcp, /proc/net/tcp6 |
| System Activity | context_switches, open_fds, oom_kills, entropy | Various /proc files |
Why these metrics are Medium Tier:
- Network and disk I/O counters are cumulative—30-second intervals provide sufficient granularity for rate calculations
- TCP connection states change relatively frequently but don't require instant detection
- Page faults and context switches trend over minutes rather than seconds
Resource cost: ~10-50ms CPU per 30-second cycle. Still purely virtual filesystem reads with no external commands.
Slow Tier: Every 5 Minutes
The Slow Tier handles metrics that change gradually or represent inherently averaged data.
What's collected:
| Metric Category | Metrics | Source |
|---|---|---|
| System Load | load_1m, load_5m, load_15m | /proc/loadavg |
| Process Counts | processes_running, processes_blocked, processes_zombie, processes_total | /proc/stat |
| Disk Usage | disk_percent, disk_used_gb, disk_total_gb | df command |
| Mount Details | disk_mounts array with mount points, devices, filesystems, usage | df command, /proc/mounts |
Why these metrics are Slow Tier:
- Load averages are kernel-calculated averages over 1, 5, and 15 minutes—collecting them every 5 seconds adds no information
- Disk space changes gradually; 5-minute intervals catch storage issues well before they become critical
- Process counts typically trend over minutes
Resource cost: ~100-200ms CPU per 5-minute cycle. This tier requires forking the df command but only once per cycle.
Glacial Tier: Every Hour
The Glacial Tier covers metrics that rarely change but have high collection overhead.
What's collected:
| Metric Category | Metrics | Source |
|---|---|---|
| Services | services array, services_running, services_total, failed_units | systemctl commands |
| Time Sync | ntp_synced | timedatectl or NTP status |
| Updates | package_updates, reboot_required | apt/dnf/zypper commands |
Why these metrics are Glacial Tier:
- Service states typically change only during deployments or maintenance
- Package updates are discovered weekly or monthly
- These checks require multiple external command forks with non-trivial overhead
- Checking service status every 5 seconds would consume significant CPU for metrics that change perhaps once per month
Resource cost: ~500ms-2s CPU per hour. Multiple systemctl forks plus package manager queries.
Daily Tier: Every 24 Hours
The Daily Tier captures essentially static system information.
What's collected:
| Metric Category | Metrics | Source |
|---|---|---|
| System Identity | os, kernel, arch, virtualization, hostname, agent_version, device_type | Various system commands |
| Security Status | selinux_status, firewall_status | getenforce, firewall status commands |
Why these metrics are Daily Tier:
- OS version and kernel change only during major updates
- Hostname and architecture are effectively static
- Security configurations change infrequently
- These checks involve multiple command forks acceptable only at daily intervals
Resource cost: ~200ms-1s CPU per day. Several forks for system detection commands.
Understanding Counter Metrics
Several metrics (net_rx_bytes, net_tx_bytes, disk_io_read_bytes, disk_io_write_bytes, context_switches) are cumulative counters. The agent reports the raw cumulative values, but the dashboard calculates and displays rates per second by computing deltas between consecutive data points.
For example:
- Agent reports network RX bytes: 1,000,000 then 1,010,000 (30 seconds later)
- Dashboard calculates: (1,010,000 - 1,000,000) ÷ 30 = 333 bytes/second
- You see the rate, not the cumulative counter
This approach is more accurate than attempting rate calculations within the agent and aligns with industry-standard monitoring practices.
Data Retention and Downsampling
Server Scout stores and displays data at different granularities depending on the time range:
| Time Range | Data Points | Granularity | Source |
|---|---|---|---|
| 1 hour | ~720 points | Raw 5-second data | Direct from Fast/Medium tiers |
| 6 hours | ~720 points | 30-second averages | Downsampled |
| 24 hours | ~720 points | 2-minute averages | Downsampled |
| 7 days | ~672 points | 15-minute averages | Downsampled |
Raw 5-second data is retained for 24 hours, then automatically pruned. Averaged data provides historical context whilst maintaining reasonable storage requirements and dashboard performance.
Handling Network Outages
The agent includes sophisticated data spooling to ensure no metrics are lost during connectivity issues:
- When unable to reach the dashboard, payloads are stored locally in
/opt/scout-agent/spool/ - Up to 720 spool files are retained (approximately 1 hour of Fast Tier data)
- When connectivity returns, spooled data is automatically replayed with historical timestamps
- The dashboard processes replayed data to fill gaps in your charts
This ensures continuous monitoring even during network outages or dashboard maintenance.
Total Resource Footprint
The tiered approach achieves remarkable efficiency:
- Memory usage: <3 MB RSS
- CPU usage: <100ms total per 5-second cycle (average <0.1% on modern hardware)
- Disk I/O: Virtually zero (except brief spool writes during outages)
- Network traffic: ~2-5 KB per payload, compressed
Compare this to traditional monitoring agents that often consume 50-100 MB RAM and measurable CPU even when idle.
Force Collection for Troubleshooting
You can trigger all collection tiers immediately using:
/opt/scout-agent/scout-agent.sh --refresh
This is useful:
- After configuration changes (new services, mount points)
- When troubleshooting specific metrics
- To verify the agent can collect all metric types
The --refresh flag bypasses normal timing intervals and executes all five tiers in sequence.
Practical Implications
Understanding the tier system helps you interpret your dashboard effectively:
- Immediate issues (CPU spikes, memory exhaustion) appear within 5 seconds
- Performance trends (network throughput, disk I/O) update every 30 seconds
- Capacity planning (disk space, load averages) updates every 5 minutes
- Configuration changes (new services, updates) appear within an hour
- System changes (kernel updates, hostname changes) appear daily
This tiered approach ensures you get rapid alerting on critical issues whilst maintaining the lightest possible footprint on your servers. The agent intelligently matches collection frequency to each metric's characteristics—volatile metrics get frequent attention, stable metrics get occasional checks.
The result is comprehensive monitoring that's virtually invisible to your server's performance, proving that effective monitoring doesn't require heavy resource consumption.
Back to Complete Reference IndexFrequently Asked Questions
What are Server Scout metric collection tiers?
Why does Server Scout use different collection intervals?
How does the collection tier affect dashboard time ranges?
What is the performance impact of the Server Scout agent?
Was this article helpful?