Server Scout's alert system uses a sophisticated state machine to manage alert transitions, ensuring that notifications are both timely and reliable whilst minimising false positives. Understanding how alerts move between states will help you configure monitoring that responds appropriately to genuine issues.
The Three Alert States
Every alert in Server Scout exists in one of three states:
OK State: The monitored metric is within its normal operating range. No action is required, and the system is functioning as expected.
Pending State: A threshold has been breached, but the sustain period hasn't elapsed yet. Server Scout is monitoring the situation to determine if this represents a genuine issue or a temporary spike.
Firing State: The threshold has been breached for the required sustain duration, and a notification has been sent to alert administrators.
State Transition Rules
OK to Pending
When a metric first breaches its configured threshold, the alert transitions from OK to Pending—but only if you've configured a sustain period greater than zero. If no sustain period is set, the alert moves directly from OK to Firing.
The sustain period acts as a buffer against temporary spikes or brief anomalies that don't represent genuine problems. For example, CPU usage might briefly spike to 95% during a legitimate process, but this doesn't necessarily warrant an alert.
Pending to Firing
The transition from Pending to Firing occurs when 80% or more of the readings during the sustain window exceed the configured threshold. This 80% rule is crucial for preventing false negatives—it ensures that a single good reading amongst predominantly bad ones doesn't cancel a legitimate alert.
Consider this scenario: your disk usage alert has a 90% threshold with a 10-minute sustain period. If 8 out of 10 readings during that window show usage above 90%, the alert will fire, even if 2 readings were below the threshold.
Pending to OK
If the metric recovers before the sustain period completes—meaning fewer than 80% of readings exceed the threshold—the alert returns to the OK state without ever firing. This prevents temporary issues from generating unnecessary notifications.
Firing to OK
When a firing alert's metric returns to normal values, the alert transitions back to OK and triggers a recovery notification. This lets you know that the issue has resolved without requiring manual intervention to clear the alert.
The 80% Threshold Logic
The 80% threshold during the sustain period is designed to balance sensitivity with reliability. Without this mechanism, a single anomalous "good" reading could prevent a genuine alert from firing, even when the system is clearly experiencing problems.
For instance, if your server's memory usage is consistently at 95% but one reading shows 85% due to a brief garbage collection event, you'd still want the alert to fire. The 80% rule ensures this happens whilst still allowing alerts to clear naturally when conditions genuinely improve.
Cooldown Behaviour
Once an alert reaches the Firing state, Server Scout implements a cooldown mechanism to prevent notification spam. The alert will not send additional notifications until the cooldown period expires, even if the metric continues to breach thresholds.
However, if the alert is still firing when the cooldown period ends, Server Scout sends a "reminder" notification. This ensures that persistent issues aren't forgotten whilst preventing your inbox from being flooded with repeated alerts for the same problem.
Practical Implications
Understanding these state transitions helps you configure more effective monitoring:
- Set appropriate sustain periods for metrics prone to temporary spikes
- Configure reasonable cooldown periods that balance awareness with notification fatigue
- Trust the 80% rule to handle occasional anomalous readings appropriately
The state machine approach ensures that Server Scout's alerting system remains both responsive to genuine issues and resistant to false positives, providing reliable monitoring without overwhelming administrators with unnecessary notifications.
Frequently Asked Questions
How do I set up sustain periods for ServerScout alerts?
What are the three alert states in ServerScout?
Why is my ServerScout alert not firing even though the threshold was breached?
How does the 80% threshold rule work in ServerScout alerts?
What happens when a ServerScout alert recovers?
How does ServerScout prevent alert notification spam?
When does a ServerScout alert skip the Pending state?
Was this article helpful?