Understanding the Alert State Machine

Server Scout's alert system uses a sophisticated state machine to manage alert transitions, ensuring that notifications are both timely and reliable whilst minimising false positives. Understanding how alerts move between states will help you configure monitoring that responds appropriately to genuine issues.

The Three Alert States

Every alert in Server Scout exists in one of three states:

OK State: The monitored metric is within its normal operating range. No action is required, and the system is functioning as expected.

Pending State: A threshold has been breached, but the sustain period hasn't elapsed yet. Server Scout is monitoring the situation to determine if this represents a genuine issue or a temporary spike.

Firing State: The threshold has been breached for the required sustain duration, and a notification has been sent to alert administrators.

State Transition Rules

OK to Pending

When a metric first breaches its configured threshold, the alert transitions from OK to Pending—but only if you've configured a sustain period greater than zero. If no sustain period is set, the alert moves directly from OK to Firing.

The sustain period acts as a buffer against temporary spikes or brief anomalies that don't represent genuine problems. For example, CPU usage might briefly spike to 95% during a legitimate process, but this doesn't necessarily warrant an alert.

Pending to Firing

The transition from Pending to Firing occurs when 80% or more of the readings during the sustain window exceed the configured threshold. This 80% rule is crucial for preventing false negatives—it ensures that a single good reading amongst predominantly bad ones doesn't cancel a legitimate alert.

Consider this scenario: your disk usage alert has a 90% threshold with a 10-minute sustain period. If 8 out of 10 readings during that window show usage above 90%, the alert will fire, even if 2 readings were below the threshold.

Pending to OK

If the metric recovers before the sustain period completes—meaning fewer than 80% of readings exceed the threshold—the alert returns to the OK state without ever firing. This prevents temporary issues from generating unnecessary notifications.

Firing to OK

When a firing alert's metric returns to normal values, the alert transitions back to OK and triggers a recovery notification. This lets you know that the issue has resolved without requiring manual intervention to clear the alert.

The 80% Threshold Logic

The 80% threshold during the sustain period is designed to balance sensitivity with reliability. Without this mechanism, a single anomalous "good" reading could prevent a genuine alert from firing, even when the system is clearly experiencing problems.

For instance, if your server's memory usage is consistently at 95% but one reading shows 85% due to a brief garbage collection event, you'd still want the alert to fire. The 80% rule ensures this happens whilst still allowing alerts to clear naturally when conditions genuinely improve.

Cooldown Behaviour

Once an alert reaches the Firing state, Server Scout implements a cooldown mechanism to prevent notification spam. The alert will not send additional notifications until the cooldown period expires, even if the metric continues to breach thresholds.

However, if the alert is still firing when the cooldown period ends, Server Scout sends a "reminder" notification. This ensures that persistent issues aren't forgotten whilst preventing your inbox from being flooded with repeated alerts for the same problem.

Practical Implications

Understanding these state transitions helps you configure more effective monitoring:

  1. Set appropriate sustain periods for metrics prone to temporary spikes
  2. Configure reasonable cooldown periods that balance awareness with notification fatigue
  3. Trust the 80% rule to handle occasional anomalous readings appropriately

The state machine approach ensures that Server Scout's alerting system remains both responsive to genuine issues and resistant to false positives, providing reliable monitoring without overwhelming administrators with unnecessary notifications.

Frequently Asked Questions

How do I set up sustain periods for ServerScout alerts?

Configure a sustain period greater than zero in your alert settings. If no sustain period is set, alerts move directly from OK to Firing when thresholds are breached. Sustain periods act as buffers against temporary spikes, requiring the threshold to be exceeded for the specified duration before firing.

What are the three alert states in ServerScout?

ServerScout alerts exist in three states: OK (metric within normal range), Pending (threshold breached but sustain period not elapsed), and Firing (threshold breached for required duration with notification sent). Alerts transition between these states based on metric readings and configured thresholds.

Why is my ServerScout alert not firing even though the threshold was breached?

Your alert may be in Pending state waiting for the sustain period to complete. For an alert to fire, 80% or more readings during the sustain window must exceed the threshold. If fewer than 80% exceed it, the alert returns to OK state without firing.

How does the 80% threshold rule work in ServerScout alerts?

During the sustain period, 80% or more of the readings must exceed the configured threshold for an alert to transition from Pending to Firing. This prevents single anomalous good readings from cancelling legitimate alerts while still allowing natural recovery when conditions genuinely improve.

What happens when a ServerScout alert recovers?

When a firing alert's metric returns to normal values, it transitions back to OK state and triggers a recovery notification. This automatic recovery lets you know the issue has resolved without requiring manual intervention to clear the alert.

How does ServerScout prevent alert notification spam?

ServerScout implements a cooldown mechanism once alerts reach Firing state. No additional notifications are sent until the cooldown period expires. If the alert is still firing when cooldown ends, a reminder notification is sent to prevent persistent issues from being forgotten.

When does a ServerScout alert skip the Pending state?

Alerts skip the Pending state and move directly from OK to Firing when no sustain period is configured (set to zero). With a sustain period greater than zero, alerts must remain in Pending state until the sustain duration completes and 80% of readings exceed the threshold.

Was this article helpful?