The Server Health Summary provides a quick, at-a-glance view of your server's overall status on the Server Scout detail page. This intelligent summary automatically evaluates key system metrics and alerts you to potential issues that require attention, helping you maintain optimal server performance.
How the Health Summary Works
The health summary operates on a simple principle: when everything is running smoothly, you'll see the reassuring "All systems normal" message. However, when Server Scout detects issues that could impact your server's performance or reliability, it will display specific alerts with clear descriptions of what needs attention.
This summary is generated from the latest metrics snapshot and updates in real-time, ensuring you always have current information about your server's health status.
Understanding Health Issues
Reboot Required
When you see a "reboot required" alert, it indicates that your system has a pending OS reboot, typically after installing security updates or kernel patches. Whilst your server continues to run normally, the reboot ensures all updates take effect properly.
# Check if reboot is required on Ubuntu/Debian
ls /var/run/reboot-required
# Check on CentOS/RHEL
needs-restarting -r
High CPU Temperature
Server Scout monitors your CPU temperature and raises an alert when it exceeds 85 degrees Celsius. Elevated temperatures can lead to thermal throttling, reduced performance, and potential hardware damage.
High CPU temperatures often indicate:
- Inadequate cooling or ventilation
- Dust accumulation in cooling systems
- Failing thermal paste or cooling components
- Excessive CPU load over extended periods
Failed Systemd Units
When more than 10 systemd units are in a failed state, the health summary will flag this as a concern. Failed units can indicate service crashes, configuration issues, or dependency problems that may affect system functionality.
# View failed systemd units
systemctl --failed
# Check specific unit status
systemctl status unit-name
Agent Integrity Status
The agent integrity check ensures your Server Scout monitoring agent hasn't been compromised. You'll see one of three states:
- Verified: Checksums match expected values - your agent is authentic and unmodified
- Unverified: Indicates an older agent version that may need updating
- Tampered: Checksums don't match, suggesting the agent files have been modified
If you see "tampered" status, investigate immediately as this could indicate a security issue.
High CPU Steal Percentage
CPU steal time becomes a concern when it remains consistently high. This metric is particularly relevant for virtual machines and indicates that your VM is waiting for the hypervisor to allocate CPU resources. High steal percentages suggest:
- VM resource contention on the physical host
- Oversubscribed virtualisation environment
- Need for resource allocation review
High IO Wait Percentage
Elevated IO wait percentages signal that your CPU is frequently waiting for disk operations to complete. This typically indicates a disk bottleneck that can significantly impact system performance.
Common causes include:
- Slow or failing storage devices
- Insufficient disk IOPS for current workload
- Poorly optimised database queries
- Inadequate storage configuration
# Monitor IO wait in real-time
iostat -x 1
# Check disk usage patterns
iotop
Taking Action on Health Alerts
When health issues appear in your summary, prioritise them based on severity and potential impact. Critical issues like high temperatures or agent tampering require immediate attention, whilst others like pending reboots can often be scheduled during maintenance windows.
The real-time nature of the health summary means that as you resolve issues, the alerts will disappear and you'll return to the "All systems normal" status, providing immediate feedback on your remediation efforts.
Regular monitoring of the Server Health Summary helps maintain proactive server management, allowing you to address potential problems before they impact your services or users.
Frequently Asked Questions
How does ServerScout's server health summary work?
What does reboot required alert mean in ServerScout?
When does ServerScout alert for high CPU temperature?
What does agent integrity status mean in ServerScout?
How to fix failed systemd units alert in ServerScout?
What causes high IO wait percentage alerts?
What is CPU steal time in ServerScout monitoring?
Was this article helpful?