Server Scout Metrics — Complete Reference Index

This reference index provides complete details for all metrics collected by Server Scout's agent. Use this as a quick-reference cheat sheet to understand what each metric means and identify potential issues.

CPU Metrics Memory Metrics Disk Metrics Network Throughput Network Errors Load Average Process & System TCP Connections Virtual Memory (VMstat) Service Monitoring Health & Security System Identity Plugin Metrics Thresholds & Alerts Collection Tiers

CPU Metrics

Server Scout collects comprehensive CPU statistics every 5 seconds from /proc/stat and thermal sensors.

MetricDescriptionHealthy RangeUnit
cpu_percentOverall CPU utilisation across all cores<70% sustained%
cpu_userTime spent in user-space processesVaries by workload%
cpu_systemTime spent in kernel/system calls<20% typically%
cpu_iowaitTime spent waiting for I/O operations<10% ideally%
cpu_stealTime stolen by hypervisor (VMs only)<5% normally%
cpu_niceTime spent on low-priority processes<5% typically%
cpu_irqTime handling hardware interrupts<2% usually%
cpu_softirqTime handling software interrupts<5% normally%
cpu_tempProcessor temperature<80°C safe°C
cpu_coresNumber of CPU coresStatic valuecount
cpu_modelProcessor model stringStatic identifiertext

High cpu_iowait often indicates storage bottlenecks. Elevated cpu_steal suggests the hypervisor is overcommitted. Rising cpu_temp may indicate cooling issues or thermal throttling.

Read more: CPU Metrics Explained

Memory Metrics

Memory statistics are gathered every 5 seconds from /proc/meminfo, providing detailed insight into RAM usage patterns.

MetricDescriptionHealthy RangeUnit
mem_percentPercentage of total RAM in use<85% warning%
mem_used_gbTotal RAM currently in useDepends on capacityGB
mem_total_gbTotal installed RAM capacityHardware limitGB
mem_available_mbRAM available for new allocations>20% of totalMB
mem_cached_mbFile system cache (page cache)High is normalMB
mem_buffers_mbKernel buffers for block devicesUsually smallMB
mem_swap_used_mbSwap space currently in use<500MB ideallyMB
mem_swap_total_mbTotal configured swap spaceSystem dependentMB
mem_dirty_mbPages waiting to be written to disk<100MB normallyMB
mem_shmem_mbShared memory and tmpfs usageApplication dependentMB
mem_slab_reclaimable_mbReclaimable kernel data structuresNormal variationMB
mem_slab_unreclaimable_mbNon-reclaimable kernel structuresWatch for growthMB
mem_anon_pages_mbAnonymous pages (application memory)Application usageMB
mem_page_tables_mbMemory management overheadUsually smallMB
mem_committed_mbTotal committed virtual memoryCan exceed RAMMB
mem_hugepages_totalTotal huge pages allocatedApplication specificcount
mem_hugepages_freeUnused huge pagesDepends on allocationcount

Linux uses available RAM for caching, so high mem_cached_mb is normal and healthy. The mem_available_mb metric accounts for reclaimable cache when determining actual available memory.

Read more: Memory Metrics Explained

Disk Metrics

Disk usage and I/O statistics are collected every 5 minutes for capacity and every 30 seconds for throughput.

MetricDescriptionHealthy RangeUnit
disk_percentPrimary mount point usage<80% warning%
disk_used_gbUsed space on primary mountMonitor growthGB
disk_total_gbTotal capacity of primary mountHardware limitGB
disk_io_read_bytesCumulative bytes read from all disksCounter, rates shownbytes
disk_io_write_bytesCumulative bytes written to all disksCounter, rates shownbytes

The disk_mounts array provides per-mount details including device, filesystem type, usage percentage, inode usage, and read-only status. Watch for high inode usage on filesystems with many small files.

Read more: Disk Metrics Explained

Network Metrics

Network interface statistics are collected every 30 seconds from /proc/net/dev and interface configuration.

MetricDescriptionHealthy RangeUnit
net_rx_bytesCumulative bytes receivedCounter, rates shownbytes
net_tx_bytesCumulative bytes transmittedCounter, rates shownbytes
net_rx_errorsReceive errors on interfaceShould be 0count
net_tx_errorsTransmit errors on interfaceShould be 0count
net_rx_droppedDropped incoming packetsShould be 0count
net_tx_droppedDropped outgoing packetsShould be 0count
net_interfacePrimary network interface nameInterface identifiertext
net_ipPrimary IP addressNetwork configurationIP
net_macMAC address of primary interfaceHardware identifierMAC

Network errors or drops often indicate hardware issues, driver problems, or network congestion. The dashboard shows network throughput as rates calculated from the cumulative byte counters.

Read more: Network Throughput Metrics Explained Read more: Network Error Metrics Explained

Load & Uptime

System load averages are collected every 5 minutes from /proc/loadavg.

MetricDescriptionHealthy RangeUnit
load_1m1-minute load average<CPU coresload
load_5m5-minute load average<CPU coresload
load_15m15-minute load average<CPU coresload
uptimeSystem uptime since last bootStability indicatorseconds

Load average represents the average number of processes either running or waiting for resources. Values consistently above the number of CPU cores indicate the system is overloaded.

Read more: Load Average Metrics Explained

Process & System

Process and system statistics are gathered from various /proc sources every 30 seconds to 5 minutes.

MetricDescriptionHealthy RangeUnit
processes_runningCurrently running processes<cores × 2count
processes_blockedProcesses blocked on I/O0-2 typicallycount
processes_zombieZombie processes awaiting cleanupShould be 0count
processes_totalTotal processes on systemSystem dependentcount
context_switchesCumulative context switchesCounter, rates showncount
open_fdsOpen file descriptors<80% of ulimitcount

High numbers of running processes may indicate system overload. Persistent zombie processes suggest application bugs. Excessive context switching can impact performance.

Read more: Process and System Metrics Explained

TCP Connections

TCP connection states are parsed from /proc/net/tcp and /proc/net/tcp6 every 30 seconds.

MetricDescriptionHealthy RangeUnit
tcp_connectionsTotal TCP connectionsApplication dependentcount
tcp_establishedActive established connectionsNormal variationcount
tcp_time_waitConnections in TIME_WAIT state<5000 typicallycount
tcp_close_waitConnections waiting for closeShould be near 0count
tcp_listenListening socketsService dependentcount

High tcp_close_wait counts often indicate applications not properly closing connections. Excessive tcp_time_wait may require kernel tuning for high-traffic servers.

Read more: TCP Connection Metrics Explained

Virtual Memory

Virtual memory statistics are collected every 30 seconds from /proc/vmstat.

MetricDescriptionHealthy RangeUnit
page_faultsCumulative page faults (minor + major)Counter, rates showncount
page_faults_majorCumulative major page faultsShould be low ratecount
swap_in_pagesCumulative pages swapped inNear 0 on healthy systemscount
swap_out_pagesCumulative pages swapped outNear 0 on healthy systemscount

Major page faults require disk I/O and impact performance. Swap activity indicates memory pressure and can severely degrade performance.

Read more: Virtual Memory (VMstat) Metrics Explained

Services

Service status is collected hourly using systemd state information.

MetricDescriptionHealthy RangeUnit
services_runningCurrently running servicesDepends on system rolecount
services_totalTotal configured servicesSystem configurationcount
failed_unitsFailed systemd unitsShould be 0count

The services array provides detailed per-service information including name, state, sub-state, and enabled status for monitoring critical services.

Read more: Service Monitoring Metrics Explained

Health & Security

Various system health and security indicators collected at different intervals.

MetricDescriptionHealthy RangeUnit
entropyAvailable entropy for cryptography>256 minimumbits
oom_killsOut-of-memory killer activationsShould be 0count
ntp_syncedNTP synchronisation statusShould be trueboolean
reboot_requiredSystem requires rebootMonitor after updatesboolean
package_updatesAvailable package updatesRegular maintenancecount
selinux_statusSELinux enforcement statusSecurity policytext
firewall_statusFirewall statusSecurity requirementtext
integrityAgent integrity verificationTamper detectiontext

Low entropy can impact cryptographic operations. OOM kills indicate severe memory pressure. Regular package updates are essential for security.

Read more: System Health and Security Metrics Explained

Identity

Static system identification collected daily.

MetricDescriptionHealthy RangeUnit
osOperating system distributionSystem identifiertext
kernelLinux kernel versionVersion trackingtext
archSystem architectureHardware platformtext
virtualizationVirtualisation platformDeployment infotext
hostnameSystem hostnameNetwork identitytext
device_typeDevice classificationSystem categorisationtext
agent_versionServer Scout agent versionUpdate trackingtext

These metrics help identify systems and track software versions for maintenance and security purposes.

Read more: System Identity Metrics Explained

Understanding Cumulative Counters

Several metrics are cumulative counters that continuously increase: net_rx_bytes, net_tx_bytes, disk_io_read_bytes, disk_io_write_bytes, page_faults, context_switches, and others. The Server Scout dashboard automatically converts these to meaningful rates (per-second) by calculating the difference between consecutive data points.

Healthy Range Guidelines

The "healthy range" values provided are general guidelines that depend heavily on your specific workload and system role. A database server will have different normal ranges compared to a web server or development machine. Use these values as starting points for alerting thresholds, then adjust based on your system's baseline behaviour and performance requirements.

Monitor trends over time rather than focusing on instantaneous values. Gradual increases in memory usage, consistent high CPU utilisation, or growing numbers of failed services often indicate issues requiring attention before they become critical problems.

Frequently Asked Questions

What metrics does the Server Scout agent collect?

The Server Scout agent collects over 80 metrics across CPU, memory, disk, network, load, processes, TCP connections, virtual memory, services, health/security, and system identity. All metrics are gathered from the /proc and /sys virtual filesystems with near-zero overhead. This reference index lists every metric with descriptions, units, and healthy range guidelines.

How often are Server Scout metrics collected?

Metrics are collected on a 5-tier schedule: Fast (every 5 seconds) for CPU and memory, Medium (every 30 seconds) for network and TCP, Slow (every 5 minutes) for load and disk, Glacial (every hour) for services, and Daily (every 24 hours) for system identity. This tiered approach balances real-time visibility with minimal resource usage.

What do cumulative counter metrics mean in Server Scout?

Cumulative counters like net_rx_bytes, disk_io_read_bytes, and context_switches continuously increase from system boot. The Server Scout dashboard automatically converts these to per-second rates by calculating deltas between consecutive data points. You see meaningful throughput rates in the charts, not raw counter values.

How do I know if a metric value is healthy?

Each metric in this reference index includes a "Healthy Range" guideline. These are general starting points that depend on your workload and server role. A database server has different normal ranges than a web server. Monitor trends over time and adjust thresholds based on your system's baseline behaviour.

Where does Server Scout read metrics from?

The agent reads exclusively from Linux virtual filesystems: /proc/stat for CPU, /proc/meminfo for memory, /proc/net/dev for network, /proc/diskstats for disk I/O, /proc/loadavg for load, /proc/vmstat for virtual memory, and /proc/net/tcp for TCP states. No disk I/O is required in the fast collection tier, keeping the agent lightweight.

Was this article helpful?