This reference index provides complete details for all metrics collected by Server Scout's agent. Use this as a quick-reference cheat sheet to understand what each metric means and identify potential issues.
CPU Metrics
Server Scout collects comprehensive CPU statistics every 5 seconds from /proc/stat and thermal sensors.
| Metric | Description | Healthy Range | Unit |
|---|---|---|---|
cpu_percent | Overall CPU utilisation across all cores | <70% sustained | % |
cpu_user | Time spent in user-space processes | Varies by workload | % |
cpu_system | Time spent in kernel/system calls | <20% typically | % |
cpu_iowait | Time spent waiting for I/O operations | <10% ideally | % |
cpu_steal | Time stolen by hypervisor (VMs only) | <5% normally | % |
cpu_nice | Time spent on low-priority processes | <5% typically | % |
cpu_irq | Time handling hardware interrupts | <2% usually | % |
cpu_softirq | Time handling software interrupts | <5% normally | % |
cpu_temp | Processor temperature | <80°C safe | °C |
cpu_cores | Number of CPU cores | Static value | count |
cpu_model | Processor model string | Static identifier | text |
High cpu_iowait often indicates storage bottlenecks. Elevated cpu_steal suggests the hypervisor is overcommitted. Rising cpu_temp may indicate cooling issues or thermal throttling.
Memory Metrics
Memory statistics are gathered every 5 seconds from /proc/meminfo, providing detailed insight into RAM usage patterns.
| Metric | Description | Healthy Range | Unit |
|---|---|---|---|
mem_percent | Percentage of total RAM in use | <85% warning | % |
mem_used_gb | Total RAM currently in use | Depends on capacity | GB |
mem_total_gb | Total installed RAM capacity | Hardware limit | GB |
mem_available_mb | RAM available for new allocations | >20% of total | MB |
mem_cached_mb | File system cache (page cache) | High is normal | MB |
mem_buffers_mb | Kernel buffers for block devices | Usually small | MB |
mem_swap_used_mb | Swap space currently in use | <500MB ideally | MB |
mem_swap_total_mb | Total configured swap space | System dependent | MB |
mem_dirty_mb | Pages waiting to be written to disk | <100MB normally | MB |
mem_shmem_mb | Shared memory and tmpfs usage | Application dependent | MB |
mem_slab_reclaimable_mb | Reclaimable kernel data structures | Normal variation | MB |
mem_slab_unreclaimable_mb | Non-reclaimable kernel structures | Watch for growth | MB |
mem_anon_pages_mb | Anonymous pages (application memory) | Application usage | MB |
mem_page_tables_mb | Memory management overhead | Usually small | MB |
mem_committed_mb | Total committed virtual memory | Can exceed RAM | MB |
mem_hugepages_total | Total huge pages allocated | Application specific | count |
mem_hugepages_free | Unused huge pages | Depends on allocation | count |
Linux uses available RAM for caching, so high mem_cached_mb is normal and healthy. The mem_available_mb metric accounts for reclaimable cache when determining actual available memory.
Disk Metrics
Disk usage and I/O statistics are collected every 5 minutes for capacity and every 30 seconds for throughput.
| Metric | Description | Healthy Range | Unit |
|---|---|---|---|
disk_percent | Primary mount point usage | <80% warning | % |
disk_used_gb | Used space on primary mount | Monitor growth | GB |
disk_total_gb | Total capacity of primary mount | Hardware limit | GB |
disk_io_read_bytes | Cumulative bytes read from all disks | Counter, rates shown | bytes |
disk_io_write_bytes | Cumulative bytes written to all disks | Counter, rates shown | bytes |
The disk_mounts array provides per-mount details including device, filesystem type, usage percentage, inode usage, and read-only status. Watch for high inode usage on filesystems with many small files.
Network Metrics
Network interface statistics are collected every 30 seconds from /proc/net/dev and interface configuration.
| Metric | Description | Healthy Range | Unit |
|---|---|---|---|
net_rx_bytes | Cumulative bytes received | Counter, rates shown | bytes |
net_tx_bytes | Cumulative bytes transmitted | Counter, rates shown | bytes |
net_rx_errors | Receive errors on interface | Should be 0 | count |
net_tx_errors | Transmit errors on interface | Should be 0 | count |
net_rx_dropped | Dropped incoming packets | Should be 0 | count |
net_tx_dropped | Dropped outgoing packets | Should be 0 | count |
net_interface | Primary network interface name | Interface identifier | text |
net_ip | Primary IP address | Network configuration | IP |
net_mac | MAC address of primary interface | Hardware identifier | MAC |
Network errors or drops often indicate hardware issues, driver problems, or network congestion. The dashboard shows network throughput as rates calculated from the cumulative byte counters.
Read more: Network Throughput Metrics Explained Read more: Network Error Metrics ExplainedLoad & Uptime
System load averages are collected every 5 minutes from /proc/loadavg.
| Metric | Description | Healthy Range | Unit |
|---|---|---|---|
load_1m | 1-minute load average | <CPU cores | load |
load_5m | 5-minute load average | <CPU cores | load |
load_15m | 15-minute load average | <CPU cores | load |
uptime | System uptime since last boot | Stability indicator | seconds |
Load average represents the average number of processes either running or waiting for resources. Values consistently above the number of CPU cores indicate the system is overloaded.
Read more: Load Average Metrics ExplainedProcess & System
Process and system statistics are gathered from various /proc sources every 30 seconds to 5 minutes.
| Metric | Description | Healthy Range | Unit |
|---|---|---|---|
processes_running | Currently running processes | <cores × 2 | count |
processes_blocked | Processes blocked on I/O | 0-2 typically | count |
processes_zombie | Zombie processes awaiting cleanup | Should be 0 | count |
processes_total | Total processes on system | System dependent | count |
context_switches | Cumulative context switches | Counter, rates shown | count |
open_fds | Open file descriptors | <80% of ulimit | count |
High numbers of running processes may indicate system overload. Persistent zombie processes suggest application bugs. Excessive context switching can impact performance.
Read more: Process and System Metrics ExplainedTCP Connections
TCP connection states are parsed from /proc/net/tcp and /proc/net/tcp6 every 30 seconds.
| Metric | Description | Healthy Range | Unit |
|---|---|---|---|
tcp_connections | Total TCP connections | Application dependent | count |
tcp_established | Active established connections | Normal variation | count |
tcp_time_wait | Connections in TIME_WAIT state | <5000 typically | count |
tcp_close_wait | Connections waiting for close | Should be near 0 | count |
tcp_listen | Listening sockets | Service dependent | count |
High tcp_close_wait counts often indicate applications not properly closing connections. Excessive tcp_time_wait may require kernel tuning for high-traffic servers.
Virtual Memory
Virtual memory statistics are collected every 30 seconds from /proc/vmstat.
| Metric | Description | Healthy Range | Unit |
|---|---|---|---|
page_faults | Cumulative page faults (minor + major) | Counter, rates shown | count |
page_faults_major | Cumulative major page faults | Should be low rate | count |
swap_in_pages | Cumulative pages swapped in | Near 0 on healthy systems | count |
swap_out_pages | Cumulative pages swapped out | Near 0 on healthy systems | count |
Major page faults require disk I/O and impact performance. Swap activity indicates memory pressure and can severely degrade performance.
Read more: Virtual Memory (VMstat) Metrics ExplainedServices
Service status is collected hourly using systemd state information.
| Metric | Description | Healthy Range | Unit |
|---|---|---|---|
services_running | Currently running services | Depends on system role | count |
services_total | Total configured services | System configuration | count |
failed_units | Failed systemd units | Should be 0 | count |
The services array provides detailed per-service information including name, state, sub-state, and enabled status for monitoring critical services.
Health & Security
Various system health and security indicators collected at different intervals.
| Metric | Description | Healthy Range | Unit |
|---|---|---|---|
entropy | Available entropy for cryptography | >256 minimum | bits |
oom_kills | Out-of-memory killer activations | Should be 0 | count |
ntp_synced | NTP synchronisation status | Should be true | boolean |
reboot_required | System requires reboot | Monitor after updates | boolean |
package_updates | Available package updates | Regular maintenance | count |
selinux_status | SELinux enforcement status | Security policy | text |
firewall_status | Firewall status | Security requirement | text |
integrity | Agent integrity verification | Tamper detection | text |
Low entropy can impact cryptographic operations. OOM kills indicate severe memory pressure. Regular package updates are essential for security.
Read more: System Health and Security Metrics ExplainedIdentity
Static system identification collected daily.
| Metric | Description | Healthy Range | Unit |
|---|---|---|---|
os | Operating system distribution | System identifier | text |
kernel | Linux kernel version | Version tracking | text |
arch | System architecture | Hardware platform | text |
virtualization | Virtualisation platform | Deployment info | text |
hostname | System hostname | Network identity | text |
device_type | Device classification | System categorisation | text |
agent_version | Server Scout agent version | Update tracking | text |
These metrics help identify systems and track software versions for maintenance and security purposes.
Read more: System Identity Metrics ExplainedUnderstanding Cumulative Counters
Several metrics are cumulative counters that continuously increase: net_rx_bytes, net_tx_bytes, disk_io_read_bytes, disk_io_write_bytes, page_faults, context_switches, and others. The Server Scout dashboard automatically converts these to meaningful rates (per-second) by calculating the difference between consecutive data points.
Healthy Range Guidelines
The "healthy range" values provided are general guidelines that depend heavily on your specific workload and system role. A database server will have different normal ranges compared to a web server or development machine. Use these values as starting points for alerting thresholds, then adjust based on your system's baseline behaviour and performance requirements.
Monitor trends over time rather than focusing on instantaneous values. Gradual increases in memory usage, consistent high CPU utilisation, or growing numbers of failed services often indicate issues requiring attention before they become critical problems.
Frequently Asked Questions
What metrics does the Server Scout agent collect?
How often are Server Scout metrics collected?
What do cumulative counter metrics mean in Server Scout?
How do I know if a metric value is healthy?
Where does Server Scout read metrics from?
Was this article helpful?