Server Scout continuously monitors essential system health and security indicators that provide early warning of potential issues. These metrics complement performance monitoring by offering insights into your system's stability, security posture, and maintenance requirements. Understanding these indicators helps you maintain robust, secure servers and plan preventive maintenance effectively.
System Security Health Metrics
The following table outlines the core system health and security metrics collected by Server Scout:
| Metric | Description | Collection Tier | Ideal Value |
|---|---|---|---|
entropy | Available entropy in kernel random pool | Medium (30s) | >256 |
oom_kills | Cumulative Out-Of-Memory kill count | Medium (30s) | 0 |
ntp_synced | System clock synchronisation status | Glacial (1hr) | true |
reboot_required | Whether reboot needed for updates | Glacial (1hr) | false |
package_updates | Number of pending package updates | Glacial (1hr) | Informational |
selinux_status | SELinux enforcement mode | Daily (24hr) | enforcing |
firewall_status | Host firewall active state | Daily (24hr) | active |
integrity | Agent script integrity verification | Every payload | ok |
Critical Security Indicators
Entropy Pool Health
The entropy metric tracks available randomness in the kernel's /proc/sys/kernel/random/entropy_avail pool. This entropy feeds cryptographic operations including TLS handshakes, SSH key generation, and certificate creation. Modern applications rely heavily on high-quality randomness for security.
Values consistently below 256 indicate entropy starvation, which can cause applications to block whilst waiting for sufficient randomness. This manifests as:
- Slow SSH connections during key exchange
- Web server delays during TLS handshakes
- Database connection timeouts for encrypted connections
- Application hangs during certificate generation
Modern kernels with hardware random number generators (RDRAND instruction on Intel/AMD processors) rarely experience entropy depletion. However, virtualised environments, embedded systems, or heavily loaded cryptographic services may still encounter issues.
If entropy consistently runs low, consider installing rng-tools or haveged to supplement the entropy pool, though investigate the root cause first.
Out-Of-Memory Kill Events
The oom_kills counter from /proc/vmstat tracks how many times the kernel's Out-Of-Memory killer has terminated processes. This is a cumulative counter that should remain at zero on healthy systems.
Any OOM kill represents a serious memory exhaustion event where the system forcibly terminated processes to prevent complete system failure. The OOM killer selects victims based on memory usage, process importance, and OOM score adjustments.
When oom_kills increases, immediately investigate:
# Check recent OOM events
dmesg | grep -i "killed process"
# Review system memory pressure
cat /proc/pressure/memory
# Identify memory-heavy processes
ps aux --sort=-%mem | head -20
Cross-reference OOM kills with Server Scout's memory metrics. Look for periods where mem_available_mb approached zero and mem_swap_total_mb was fully utilised. The timing correlation helps identify which applications triggered memory exhaustion.
Firewall Protection Status
The firewall_status metric indicates whether your host firewall is active. Server Scout checks multiple firewall implementations:
- firewalld (RHEL/CentOS/Fedora default)
- nftables (modern netfilter frontend)
- iptables (traditional netfilter interface)
An "inactive" firewall status on internet-facing servers represents a significant security exposure. Without host-level filtering, your server relies entirely on network firewalls and application-level security.
However, firewall status should be evaluated contextually. Servers behind well-configured network firewalls or in isolated network segments may intentionally disable host firewalls to reduce complexity. The key is ensuring appropriate network-level protection exists.
SELinux Enforcement
On RHEL-family systems (Red Hat, CentOS, Fedora), selinux_status reports the current enforcement mode from getenforce:
- enforcing: SELinux actively blocks policy violations (recommended)
- permissive: SELinux logs violations but allows them (debugging mode)
- disabled: SELinux completely inactive
"Enforcing" mode provides mandatory access controls that significantly limit attack impact even if applications are compromised. Many compliance frameworks require SELinux enforcement on production systems.
"Permissive" mode indicates a temporary debugging state—acceptable during troubleshooting but inappropriate for production. "Disabled" SELinux removes an important security layer and may violate organisational policies.
System Maintenance Indicators
Time Synchronisation
The ntp_synced metric indicates whether your system clock synchronises with authoritative time sources. Server Scout checks this via timedatectl on systemd systems or by examining chrony/ntpd status on older distributions.
Accurate time synchronisation is critical for:
- Certificate validation: SSL/TLS certificates have validity periods checked against system time
- Authentication protocols: Kerberos, OAuth, and multi-factor authentication rely on time accuracy
- Log correlation: Distributed systems require synchronised timestamps for troubleshooting
- Database replication: Many database clusters require time synchronisation
- Compliance: Audit logs must have accurate timestamps
Servers with ntp_synced: false may experience authentication failures, certificate errors, or difficulties correlating events across systems.
Reboot Requirements
The reboot_required indicator signals when installed updates require a system restart to take effect. Detection methods vary by distribution:
- Debian/Ubuntu: Presence of
/var/run/reboot-requiredfile - RHEL/CentOS: Output from
needs-restartingutility - Other distributions: Similar mechanisms checking for kernel updates or core system library changes
Kernel updates, core library updates (glibc, systemd), and some security patches require reboots. Whilst not immediately urgent, plan maintenance windows to apply these restarts and fully activate security updates.
Package Updates
The package_updates count shows pending updates from your distribution's package manager (apt, dnf, yum). This provides visibility into your patch management status without requiring separate tools.
Large numbers of pending updates may indicate:
- Infrequent maintenance schedules
- Failed automatic update mechanisms
- Manual update policies requiring review
Regular patching reduces security exposure and ensures access to bug fixes. However, production systems typically require change control processes rather than immediate automatic updates.
Agent Integrity Verification
The integrity metric provides tamper detection for the Server Scout agent itself. Each data payload includes a SHA-256 checksum of the agent script, which the dashboard compares against known-good signatures.
"ok" status confirms the agent hasn't been modified, whilst "modified" indicates potential tampering or corruption. This helps detect:
- Unauthorised agent modifications
- File system corruption affecting the agent
- Compromise attempts targeting monitoring infrastructure
Security Posture Assessment
These metrics collectively provide a basic security health check without requiring dedicated security tools. They complement performance monitoring by highlighting security-relevant system states.
Immediate Action Required:
oom_kills> 0: Investigate memory exhaustion causesselinux_status: "disabled" on production RHEL systemsfirewall_status: "inactive" on internet-facing serversintegrity: "modified" indicates potential tamperingentropyconsistently < 100: Risk of cryptographic delays
Maintenance Window Actions:
reboot_required: true — plan restart during maintenancepackage_updates> 0: schedule update installationntp_synced: false — configure time synchronisation
Monitoring Trends:
Track these metrics over time to identify patterns. Gradually increasing oom_kills suggests growing memory pressure. Frequent reboot_required states may indicate aggressive update policies need refinement.
Correlation with Performance Metrics
System health metrics gain context when correlated with performance indicators:
- OOM kills vs. memory usage: Compare
oom_killstiming withmem_available_mband swap utilisation to identify memory pressure patterns - Entropy depletion vs. network activity: Low entropy during high
net_tx_bytesperiods may indicate TLS-heavy workloads - Update requirements vs. system stability: Correlate pending updates with system error rates or performance degradation
Best Practices
Establish baseline values for your environment and set appropriate alerting thresholds. Security metrics often have binary good/bad states, making them suitable for immediate notifications rather than trend analysis.
Document your organisation's acceptable states for each metric. Some environments intentionally disable certain security features for performance or compatibility reasons—the key is making these decisions consciously rather than by oversight.
Regular review of these metrics during maintenance windows ensures your servers maintain good security hygiene alongside optimal performance.
Back to Complete Reference IndexFrequently Asked Questions
What is entropy and why does it matter for server security?
What does an OOM kill mean and how serious is it?
Why should NTP synchronisation be monitored?
What does the integrity metric check?
How do I use the reboot_required and package_updates metrics?
Was this article helpful?