Understanding Metric Collection Tiers

Q: What are Server Scout metric collection tiers?

Server Scout uses a 5-tier collection schedule to balance monitoring granularity with resource efficiency. Fast (5 seconds) collects CPU and memory from /proc. Medium (30 seconds) covers network, TCP, and VMstat. Slow (5 minutes) handles load, processes, and disk. Glacial (1 hour) checks services. Daily (24 hours) gathers system identity. Each tier is optimised for how frequently that data typically changes.

Q: Why does Server Scout use different collection intervals?

Different metrics change at different rates and have different overhead costs. CPU and memory can fluctuate rapidly and are cheap to read from /proc, so they are collected every 5 seconds. Service states rarely change and the systemd query is heavier, so hourly collection is appropriate. System identity (OS, kernel) changes only on upgrades, so daily collection is sufficient. This tiered approach keeps the agent lightweight.

Q: How does the collection tier affect dashboard time ranges?

The dashboard shows data at different resolutions depending on the time range: 1 hour shows raw data points, 6 hours shows 30-second averages, 24 hours shows 2-minute averages, and 7 days shows 15-minute averages. Metrics collected less frequently than the display resolution appear as individual points. For example, hourly service data shows as one point per hour even in the 7-day view.

Q: What is the performance impact of the Server Scout agent?

The agent uses less than 3 MB of RAM and under 100ms of CPU time per 5-second collection cycle. The fast tier reads only from /proc virtual filesystems, requiring no disk I/O. This near-zero footprint means the agent does not measurably affect the performance of monitored servers, even on small instances. The tiered collection ensures heavier operations run infrequently.

Why Server Scout Uses Tiered Collection

The Server Scout agent employs a sophisticated 5-tier data collection system that many customers initially find surprising. Why does CPU usage update every 5 seconds whilst package updates appear only once daily? The answer lies in the fundamental design philosophy: near-zero footprint monitoring that balances detection speed against resource cost.

Understanding these tiers will help you interpret your dashboard data more effectively and appreciate why certain metrics appear to update at different intervals.

The Resource Cost Problem

A naive monitoring approach would collect every possible metric every 5 seconds. This seems logical—more frequent data means better visibility, right? In practice, this creates substantial overhead:

Dozens of command forks per collection cycle
Measurable CPU impact from constant subprocess creation
Unnecessary strain on system resources
Identical data points for metrics that rarely change

Server Scout takes a different approach. The agent is a pure Bash script that reads primarily from /proc and /sys virtual filesystems—kernel-served data with zero disk I/O. Only when necessary does it fork external commands, and then only at intervals appropriate to each metric's rate of change.

The Five Collection Tiers

Fast Tier: Every 5 Seconds

The Fast Tier captures the most volatile and critical metrics—those that change rapidly and require immediate alerting capabilities.

What's collected:

Metric Category	Metrics	Source
CPU Usage	`cpu_percent`, `cpu_user`, `cpu_system`, `cpu_iowait`, `cpu_steal`, `cpu_nice`, `cpu_irq`, `cpu_softirq`	`/proc/stat`
CPU Information	`cpu_cores`, `cpu_model`, `cpu_temp`	`/proc/cpuinfo`, `/sys/class/thermal`
Memory Core	`mem_percent`, `mem_used_gb`, `mem_total_gb`, `mem_available_mb`, `mem_cached_mb`, `mem_buffers_mb`	`/proc/meminfo`
Memory Detail	`mem_swap_used_mb`, `mem_swap_total_mb`, `mem_dirty_mb`, `mem_shmem_mb`, `mem_slab_reclaimable_mb`	`/proc/meminfo`

Why these metrics are Fast Tier:

CPU utilisation can spike from 5% to 95% within seconds during traffic bursts or batch jobs
Memory pressure can escalate rapidly, especially in containerised environments
These metrics are essential for real-time alerting on performance issues

Resource cost: ~50-100ms CPU per 5-second cycle. All data comes from reading two virtual files (/proc/stat and /proc/meminfo) served directly by the kernel—no disk I/O, no command forks.

Medium Tier: Every 30 Seconds

The Medium Tier covers metrics that change frequently but don't require 5-second granularity for effective monitoring.

What's collected:

Metric Category	Metrics	Source
Network I/O	`net_rx_bytes`, `net_tx_bytes`, `net_rx_errors`, `net_tx_errors`, `net_rx_dropped`, `net_tx_dropped`	`/proc/net/dev`
Network Identity	`net_interface`, `net_ip`, `net_mac`	`/proc/net/dev`, system interfaces
Disk I/O	`disk_io_read_bytes`, `disk_io_write_bytes`	`/proc/diskstats`
Virtual Memory	`page_faults`, `page_faults_major`, `swap_in_pages`, `swap_out_pages`	`/proc/vmstat`
TCP Connections	`tcp_connections`, `tcp_established`, `tcp_time_wait`, `tcp_close_wait`, `tcp_listen`	`/proc/net/tcp`, `/proc/net/tcp6`
System Activity	`context_switches`, `open_fds`, `oom_kills`, `entropy`	Various `/proc` files

Why these metrics are Medium Tier:

Network and disk I/O counters are cumulative—30-second intervals provide sufficient granularity for rate calculations
TCP connection states change relatively frequently but don't require instant detection
Page faults and context switches trend over minutes rather than seconds

Resource cost: ~10-50ms CPU per 30-second cycle. Still purely virtual filesystem reads with no external commands.

Slow Tier: Every 5 Minutes

The Slow Tier handles metrics that change gradually or represent inherently averaged data.

What's collected:

Metric Category	Metrics	Source
System Load	`load_1m`, `load_5m`, `load_15m`	`/proc/loadavg`
Process Counts	`processes_running`, `processes_blocked`, `processes_zombie`, `processes_total`	`/proc/stat`
Disk Usage	`disk_percent`, `disk_used_gb`, `disk_total_gb`	`df` command
Mount Details	`disk_mounts` array with mount points, devices, filesystems, usage	`df` command, `/proc/mounts`

Why these metrics are Slow Tier:

Load averages are kernel-calculated averages over 1, 5, and 15 minutes—collecting them every 5 seconds adds no information
Disk space changes gradually; 5-minute intervals catch storage issues well before they become critical
Process counts typically trend over minutes

Resource cost: ~100-200ms CPU per 5-minute cycle. This tier requires forking the df command but only once per cycle.

Glacial Tier: Every Hour

The Glacial Tier covers metrics that rarely change but have high collection overhead.

What's collected:

Metric Category	Metrics	Source
Services	`services` array, `services_running`, `services_total`, `failed_units`	`systemctl` commands
Time Sync	`ntp_synced`	`timedatectl` or NTP status
Updates	`package_updates`, `reboot_required`	`apt`/`dnf`/`zypper` commands

Why these metrics are Glacial Tier:

Service states typically change only during deployments or maintenance
Package updates are discovered weekly or monthly
These checks require multiple external command forks with non-trivial overhead
Checking service status every 5 seconds would consume significant CPU for metrics that change perhaps once per month

Resource cost: ~500ms-2s CPU per hour. Multiple systemctl forks plus package manager queries.

Daily Tier: Every 24 Hours

The Daily Tier captures essentially static system information.

What's collected:

Metric Category	Metrics	Source
System Identity	`os`, `kernel`, `arch`, `virtualization`, `hostname`, `agent_version`, `device_type`	Various system commands
Security Status	`selinux_status`, `firewall_status`	`getenforce`, firewall status commands

Why these metrics are Daily Tier:

OS version and kernel change only during major updates
Hostname and architecture are effectively static
Security configurations change infrequently
These checks involve multiple command forks acceptable only at daily intervals

Resource cost: ~200ms-1s CPU per day. Several forks for system detection commands.

Understanding Counter Metrics

Several metrics (net_rx_bytes, net_tx_bytes, disk_io_read_bytes, disk_io_write_bytes, context_switches) are cumulative counters. The agent reports the raw cumulative values, but the dashboard calculates and displays rates per second by computing deltas between consecutive data points.

For example:

Agent reports network RX bytes: 1,000,000 then 1,010,000 (30 seconds later)
Dashboard calculates: (1,010,000 - 1,000,000) ÷ 30 = 333 bytes/second
You see the rate, not the cumulative counter

This approach is more accurate than attempting rate calculations within the agent and aligns with industry-standard monitoring practices.

Data Retention and Downsampling

Server Scout stores and displays data at different granularities depending on the time range:

Time Range	Data Points	Granularity	Source
1 hour	~720 points	Raw 5-second data	Direct from Fast/Medium tiers
6 hours	~720 points	30-second averages	Downsampled
24 hours	~720 points	2-minute averages	Downsampled
7 days	~672 points	15-minute averages	Downsampled

Raw 5-second data is retained for 24 hours, then automatically pruned. Averaged data provides historical context whilst maintaining reasonable storage requirements and dashboard performance.

Handling Network Outages

The agent includes sophisticated data spooling to ensure no metrics are lost during connectivity issues:

When unable to reach the dashboard, payloads are stored locally in /opt/scout-agent/spool/
Up to 720 spool files are retained (approximately 1 hour of Fast Tier data)
When connectivity returns, spooled data is automatically replayed with historical timestamps
The dashboard processes replayed data to fill gaps in your charts

This ensures continuous monitoring even during network outages or dashboard maintenance.

Total Resource Footprint

The tiered approach achieves remarkable efficiency:

Memory usage: <3 MB RSS
CPU usage: <100ms total per 5-second cycle (average <0.1% on modern hardware)
Disk I/O: Virtually zero (except brief spool writes during outages)
Network traffic: ~2-5 KB per payload, compressed

Compare this to traditional monitoring agents that often consume 50-100 MB RAM and measurable CPU even when idle.

Force Collection for Troubleshooting

You can trigger all collection tiers immediately using:

/opt/scout-agent/scout-agent.sh --refresh

This is useful:

After configuration changes (new services, mount points)
When troubleshooting specific metrics
To verify the agent can collect all metric types

The --refresh flag bypasses normal timing intervals and executes all five tiers in sequence.

Practical Implications

Understanding the tier system helps you interpret your dashboard effectively:

Immediate issues (CPU spikes, memory exhaustion) appear within 5 seconds
Performance trends (network throughput, disk I/O) update every 30 seconds
Capacity planning (disk space, load averages) updates every 5 minutes
Configuration changes (new services, updates) appear within an hour
System changes (kernel updates, hostname changes) appear daily

This tiered approach ensures you get rapid alerting on critical issues whilst maintaining the lightest possible footprint on your servers. The agent intelligently matches collection frequency to each metric's characteristics—volatile metrics get frequent attention, stable metrics get occasional checks.

The result is comprehensive monitoring that's virtually invisible to your server's performance, proving that effective monitoring doesn't require heavy resource consumption.

Back to Complete Reference Index

Frequently Asked Questions

What are Server Scout metric collection tiers?

Server Scout uses a 5-tier collection schedule to balance monitoring granularity with resource efficiency. Fast (5 seconds) collects CPU and memory from /proc. Medium (30 seconds) covers network, TCP, and VMstat. Slow (5 minutes) handles load, processes, and disk. Glacial (1 hour) checks services. Daily (24 hours) gathers system identity. Each tier is optimised for how frequently that data typically changes.

Why does Server Scout use different collection intervals?

Different metrics change at different rates and have different overhead costs. CPU and memory can fluctuate rapidly and are cheap to read from /proc, so they are collected every 5 seconds. Service states rarely change and the systemd query is heavier, so hourly collection is appropriate. System identity (OS, kernel) changes only on upgrades, so daily collection is sufficient. This tiered approach keeps the agent lightweight.

How does the collection tier affect dashboard time ranges?

The dashboard shows data at different resolutions depending on the time range: 1 hour shows raw data points, 6 hours shows 30-second averages, 24 hours shows 2-minute averages, and 7 days shows 15-minute averages. Metrics collected less frequently than the display resolution appear as individual points. For example, hourly service data shows as one point per hour even in the 7-day view.

What is the performance impact of the Server Scout agent?

The agent uses less than 3 MB of RAM and under 100ms of CPU time per 5-second collection cycle. The fast tier reads only from /proc virtual filesystems, requiring no disk I/O. This near-zero footprint means the agent does not measurably affect the performance of monitored servers, even on small instances. The tiered collection ensures heavier operations run infrequently.

Was this article helpful?

Search Results

Why Server Scout Uses Tiered Collection

The Resource Cost Problem

The Five Collection Tiers

Fast Tier: Every 5 Seconds

Medium Tier: Every 30 Seconds

Slow Tier: Every 5 Minutes

Glacial Tier: Every Hour

Daily Tier: Every 24 Hours

Understanding Counter Metrics

Data Retention and Downsampling

Handling Network Outages

Total Resource Footprint

Force Collection for Troubleshooting

Practical Implications

Frequently Asked Questions

What are Server Scout metric collection tiers?

Why does Server Scout use different collection intervals?

How does the collection tier affect dashboard time ranges?

What is the performance impact of the Server Scout agent?

Related Articles