Server Scout Metrics — Complete Reference Index

This reference index provides complete details for all metrics collected by Server Scout's agent. Use this as a quick-reference cheat sheet to understand what each metric means and identify potential issues.

CPU Metrics

Server Scout collects comprehensive CPU statistics every 5 seconds from /proc/stat and thermal sensors.

Metric	Description	Healthy Range	Unit
`cpu_percent`	Overall CPU utilisation across all cores	<70% sustained	%
`cpu_user`	Time spent in user-space processes	Varies by workload	%
`cpu_system`	Time spent in kernel/system calls	<20% typically	%
`cpu_iowait`	Time spent waiting for I/O operations	<10% ideally	%
`cpu_steal`	Time stolen by hypervisor (VMs only)	<5% normally	%
`cpu_nice`	Time spent on low-priority processes	<5% typically	%
`cpu_irq`	Time handling hardware interrupts	<2% usually	%
`cpu_softirq`	Time handling software interrupts	<5% normally	%
`cpu_temp`	Processor temperature	<80°C safe	°C
`cpu_cores`	Number of CPU cores	Static value	count
`cpu_model`	Processor model string	Static identifier	text

High cpu_iowait often indicates storage bottlenecks. Elevated cpu_steal suggests the hypervisor is overcommitted. Rising cpu_temp may indicate cooling issues or thermal throttling.

Memory Metrics

Memory statistics are gathered every 5 seconds from /proc/meminfo, providing detailed insight into RAM usage patterns.

Metric	Description	Healthy Range	Unit
`mem_percent`	Percentage of total RAM in use	<85% warning	%
`mem_used_gb`	Total RAM currently in use	Depends on capacity	GB
`mem_total_gb`	Total installed RAM capacity	Hardware limit	GB
`mem_available_mb`	RAM available for new allocations	>20% of total	MB
`mem_cached_mb`	File system cache (page cache)	High is normal	MB
`mem_buffers_mb`	Kernel buffers for block devices	Usually small	MB
`mem_swap_used_mb`	Swap space currently in use	<500MB ideally	MB
`mem_swap_total_mb`	Total configured swap space	System dependent	MB
`mem_dirty_mb`	Pages waiting to be written to disk	<100MB normally	MB
`mem_shmem_mb`	Shared memory and tmpfs usage	Application dependent	MB
`mem_slab_reclaimable_mb`	Reclaimable kernel data structures	Normal variation	MB
`mem_slab_unreclaimable_mb`	Non-reclaimable kernel structures	Watch for growth	MB
`mem_anon_pages_mb`	Anonymous pages (application memory)	Application usage	MB
`mem_page_tables_mb`	Memory management overhead	Usually small	MB
`mem_committed_mb`	Total committed virtual memory	Can exceed RAM	MB
`mem_hugepages_total`	Total huge pages allocated	Application specific	count
`mem_hugepages_free`	Unused huge pages	Depends on allocation	count

Linux uses available RAM for caching, so high mem_cached_mb is normal and healthy. The mem_available_mb metric accounts for reclaimable cache when determining actual available memory.

Disk Metrics

Disk usage and I/O statistics are collected every 5 minutes for capacity and every 30 seconds for throughput.

Metric	Description	Healthy Range	Unit
`disk_percent`	Primary mount point usage	<80% warning	%
`disk_used_gb`	Used space on primary mount	Monitor growth	GB
`disk_total_gb`	Total capacity of primary mount	Hardware limit	GB
`disk_io_read_bytes`	Cumulative bytes read from all disks	Counter, rates shown	bytes
`disk_io_write_bytes`	Cumulative bytes written to all disks	Counter, rates shown	bytes

The disk_mounts array provides per-mount details including device, filesystem type, usage percentage, inode usage, and read-only status. Watch for high inode usage on filesystems with many small files.

Network Metrics

Network interface statistics are collected every 30 seconds from /proc/net/dev and interface configuration.

Metric	Description	Healthy Range	Unit
`net_rx_bytes`	Cumulative bytes received	Counter, rates shown	bytes
`net_tx_bytes`	Cumulative bytes transmitted	Counter, rates shown	bytes
`net_rx_errors`	Receive errors on interface	Should be 0	count
`net_tx_errors`	Transmit errors on interface	Should be 0	count
`net_rx_dropped`	Dropped incoming packets	Should be 0	count
`net_tx_dropped`	Dropped outgoing packets	Should be 0	count
`net_interface`	Primary network interface name	Interface identifier	text
`net_ip`	Primary IP address	Network configuration	IP
`net_mac`	MAC address of primary interface	Hardware identifier	MAC

Network errors or drops often indicate hardware issues, driver problems, or network congestion. The dashboard shows network throughput as rates calculated from the cumulative byte counters.

Load & Uptime

System load averages are collected every 5 minutes from /proc/loadavg.

Metric	Description	Healthy Range	Unit
`load_1m`	1-minute load average	<CPU cores	load
`load_5m`	5-minute load average	<CPU cores	load
`load_15m`	15-minute load average	<CPU cores	load
`uptime`	System uptime since last boot	Stability indicator	seconds

Load average represents the average number of processes either running or waiting for resources. Values consistently above the number of CPU cores indicate the system is overloaded.

Process & System

Process and system statistics are gathered from various /proc sources every 30 seconds to 5 minutes.

Metric	Description	Healthy Range	Unit
`processes_running`	Currently running processes	<cores × 2	count
`processes_blocked`	Processes blocked on I/O	0-2 typically	count
`processes_zombie`	Zombie processes awaiting cleanup	Should be 0	count
`processes_total`	Total processes on system	System dependent	count
`context_switches`	Cumulative context switches	Counter, rates shown	count
`open_fds`	Open file descriptors	<80% of ulimit	count

High numbers of running processes may indicate system overload. Persistent zombie processes suggest application bugs. Excessive context switching can impact performance.

TCP Connections

TCP connection states are parsed from /proc/net/tcp and /proc/net/tcp6 every 30 seconds.

Metric	Description	Healthy Range	Unit
`tcp_connections`	Total TCP connections	Application dependent	count
`tcp_established`	Active established connections	Normal variation	count
`tcp_time_wait`	Connections in TIME_WAIT state	<5000 typically	count
`tcp_close_wait`	Connections waiting for close	Should be near 0	count
`tcp_listen`	Listening sockets	Service dependent	count

High tcp_close_wait counts often indicate applications not properly closing connections. Excessive tcp_time_wait may require kernel tuning for high-traffic servers.

Virtual Memory

Virtual memory statistics are collected every 30 seconds from /proc/vmstat.

Metric	Description	Healthy Range	Unit
`page_faults`	Cumulative page faults (minor + major)	Counter, rates shown	count
`page_faults_major`	Cumulative major page faults	Should be low rate	count
`swap_in_pages`	Cumulative pages swapped in	Near 0 on healthy systems	count
`swap_out_pages`	Cumulative pages swapped out	Near 0 on healthy systems	count

Major page faults require disk I/O and impact performance. Swap activity indicates memory pressure and can severely degrade performance.

Services

Service status is collected hourly using systemd state information.

Metric	Description	Healthy Range	Unit
`services_running`	Currently running services	Depends on system role	count
`services_total`	Total configured services	System configuration	count
`failed_units`	Failed systemd units	Should be 0	count

The services array provides detailed per-service information including name, state, sub-state, and enabled status for monitoring critical services.

Health & Security

Various system health and security indicators collected at different intervals.

Metric	Description	Healthy Range	Unit
`entropy`	Available entropy for cryptography	>256 minimum	bits
`oom_kills`	Out-of-memory killer activations	Should be 0	count
`ntp_synced`	NTP synchronisation status	Should be true	boolean
`reboot_required`	System requires reboot	Monitor after updates	boolean
`package_updates`	Available package updates	Regular maintenance	count
`selinux_status`	SELinux enforcement status	Security policy	text
`firewall_status`	Firewall status	Security requirement	text
`integrity`	Agent integrity verification	Tamper detection	text

Low entropy can impact cryptographic operations. OOM kills indicate severe memory pressure. Regular package updates are essential for security.

Identity

Static system identification collected daily.

Metric	Description	Healthy Range	Unit
`os`	Operating system distribution	System identifier	text
`kernel`	Linux kernel version	Version tracking	text
`arch`	System architecture	Hardware platform	text
`virtualization`	Virtualisation platform	Deployment info	text
`hostname`	System hostname	Network identity	text
`device_type`	Device classification	System categorisation	text
`agent_version`	Server Scout agent version	Update tracking	text

These metrics help identify systems and track software versions for maintenance and security purposes.

Understanding Cumulative Counters

Several metrics are cumulative counters that continuously increase: net_rx_bytes, net_tx_bytes, disk_io_read_bytes, disk_io_write_bytes, page_faults, context_switches, and others. The Server Scout dashboard automatically converts these to meaningful rates (per-second) by calculating the difference between consecutive data points.

Healthy Range Guidelines

The "healthy range" values provided are general guidelines that depend heavily on your specific workload and system role. A database server will have different normal ranges compared to a web server or development machine. Use these values as starting points for alerting thresholds, then adjust based on your system's baseline behaviour and performance requirements.

Monitor trends over time rather than focusing on instantaneous values. Gradual increases in memory usage, consistent high CPU utilisation, or growing numbers of failed services often indicate issues requiring attention before they become critical problems.

Frequently Asked Questions

What metrics does the Server Scout agent collect?

The Server Scout agent collects over 80 metrics across CPU, memory, disk, network, load, processes, TCP connections, virtual memory, services, health/security, and system identity. All metrics are gathered from the /proc and /sys virtual filesystems with near-zero overhead. This reference index lists every metric with descriptions, units, and healthy range guidelines.

How often are Server Scout metrics collected?

Metrics are collected on a 5-tier schedule: Fast (every 5 seconds) for CPU and memory, Medium (every 30 seconds) for network and TCP, Slow (every 5 minutes) for load and disk, Glacial (every hour) for services, and Daily (every 24 hours) for system identity. This tiered approach balances real-time visibility with minimal resource usage.

What do cumulative counter metrics mean in Server Scout?

Cumulative counters like net_rx_bytes, disk_io_read_bytes, and context_switches continuously increase from system boot. The Server Scout dashboard automatically converts these to per-second rates by calculating deltas between consecutive data points. You see meaningful throughput rates in the charts, not raw counter values.

How do I know if a metric value is healthy?

Each metric in this reference index includes a "Healthy Range" guideline. These are general starting points that depend on your workload and server role. A database server has different normal ranges than a web server. Monitor trends over time and adjust thresholds based on your system's baseline behaviour.

Where does Server Scout read metrics from?

The agent reads exclusively from Linux virtual filesystems: /proc/stat for CPU, /proc/meminfo for memory, /proc/net/dev for network, /proc/diskstats for disk I/O, /proc/loadavg for load, /proc/vmstat for virtual memory, and /proc/net/tcp for TCP states. No disk I/O is required in the fast collection tier, keeping the agent lightweight.

Was this article helpful?