Understanding Server Scout's Default Alert Conditions
When you add a new server to Server Scout, default alert conditions are automatically created to get you started:
- CPU usage: Warning at 80%, Critical at 90%
- Memory usage: Warning at 80%, Critical at 90%
- Disk usage: Warning at 80%, Critical at 90%
- Server offline detection: Immediate alerts when the agent stops reporting
These defaults provide a solid foundation, but every server environment is unique. Fine-tuning these thresholds based on your specific infrastructure will dramatically reduce false positives and ensure you're alerted to genuine issues.
Establish Baselines Before Adjusting
Before modifying any thresholds, observe your servers' normal operating patterns for at least a week. This baseline period reveals crucial insights about your infrastructure's behaviour.
A database server that consistently operates at 70% memory utilisation needs higher thresholds than the default 80% warning level. Conversely, a lightly-loaded web server that normally runs at 20% CPU might benefit from lower thresholds to catch unusual activity earlier.
Use Server Scout's historical charts to identify:
- Peak usage periods (backup windows, batch processing times)
- Normal operational ranges for each metric
- Regular patterns that might trigger false alerts
Leverage Sustain Periods to Prevent False Alarms
Sustain periods are your first line of defence against alert fatigue. Setting a sustain period of 60-300 seconds ensures that brief, normal spikes don't trigger unnecessary notifications.
Consider these common scenarios:
- Cron jobs: Scheduled tasks often cause temporary CPU or I/O spikes
- Application deployments: Brief periods of high resource usage during updates
- Backup operations: Temporary increases in disk I/O and CPU usage
A 5-minute sustain period for CPU alerts typically strikes the right balance between catching genuine issues and avoiding noise from routine operations.
Choose Severity Levels Strategically
Server Scout's two severity levels serve distinct purposes:
Warning alerts are for "investigate when convenient" situations. These might include:
- Disk usage reaching 80% (you have time to clean up or expand storage)
- CPU consistently above 85% (performance may be degraded but service continues)
Critical alerts demand immediate attention:
- Disk usage at 90% (risk of service failure)
- Memory usage at 95% (potential for application crashes)
- Server offline (service unavailable)
Implement Per-Server Overrides for Special Cases
Global defaults work well for most servers, but some systems require special consideration. Server Scout's per-server condition overrides handle these exceptions elegantly.
Build servers that regularly hit 95% CPU during compilation need higher CPU thresholds than standard web servers. Database servers with large buffer pools may normally operate at 90% memory usage. File servers might need different disk usage thresholds for different mount points.
Create per-server conditions for these outliers whilst maintaining sensible global defaults for your standard infrastructure.
Configure Cooldown Periods Appropriately
Cooldown periods prevent notification spam for ongoing issues. Once an alert triggers, the cooldown period determines how long Server Scout waits before sending another notification for the same condition.
A 30-60 minute cooldown works well for most metrics. This gives you time to investigate and address the issue without being bombarded with repeated alerts, whilst ensuring you're reminded if the problem persists.
Recommended Threshold Guidelines
Based on common server behaviours, these thresholds work well for most environments:
Disk usage: Warning at 80%, Critical at 90% with 5-minute sustain
- Disk space fills gradually, providing time for cleanup
CPU usage: Warning at 85%, Critical at 95% with 5-minute sustain
- Accommodates normal spikes whilst catching sustained high usage
Memory usage: Warning at 85%, Critical at 95% with 2-minute sustain
- Linux systems naturally use available RAM for caching, so high usage is often normal
Test Your Alert Configuration
Use Server Scout's test notification feature to verify your alerts reach the intended recipients. Test both email and webhook notifications to ensure your escalation procedures work correctly.
Regular testing ensures that when a genuine issue occurs, your team receives notifications promptly through the expected channels.
Well-configured alerts transform Server Scout from a monitoring tool into a proactive guardian of your infrastructure, providing early warning of issues whilst respecting your team's time and attention.
Frequently Asked Questions
What are ServerScout's default alert thresholds for new servers?
How long should I observe servers before adjusting alert thresholds?
What is a sustain period and how does it prevent false alarms?
When should I use warning vs critical alert levels?
How do I set different thresholds for special server types?
What are the recommended alert threshold settings for most servers?
How do cooldown periods work in ServerScout alerts?
Was this article helpful?