Device Alerts and Notifications

Understanding Device Alerts in Server Scout

Server Scout's alerting system extends beyond traditional server monitoring to include comprehensive device monitoring capabilities. Whether you're managing network switches, DRAC/IPMI controllers, or UPS units, you can configure intelligent alerts that notify you of critical issues before they impact your infrastructure.

Alert Conditions for Devices

Device alerts work similarly to server alerts, but focus on device-specific metrics rather than traditional CPU or memory usage. Each monitored device can have custom alert conditions tailored to its type and role in your infrastructure.

To access device alert settings:

  1. Navigate to the Notifications settings page
  2. Select your target device from the device dropdown
  3. Configure alert conditions specific to that device type

Common Device Alert Scenarios

Network Switch Monitoring

Network switches require monitoring of port status and performance metrics:

Port Status Alerts:

  • Port Down: Alert when critical uplink ports or server connections go offline
  • Port Error Rate: Monitor for excessive packet errors or drops that indicate hardware issues

Configure these alerts with appropriate thresholds based on your network's normal operating patterns. A single packet drop isn't concerning, but sustained error rates above 0.1% typically indicate problems.

Switch Performance Metrics:

  • CPU Utilisation: Alert when switch CPU exceeds 80% for extended periods
  • Memory Usage: Monitor for memory exhaustion that could affect switching performance

DRAC/IPMI Controller Alerts

Hardware management controllers provide detailed system health data:

Temperature Monitoring:

  • CPU Temperature: Set alerts for temperatures exceeding manufacturer specifications (typically 70-80°C)
  • Chassis Temperature: Monitor ambient temperature within the server chassis
  • Hard Drive Temperature: Alert on drive temperatures that could indicate cooling issues

Power Supply Monitoring:

  • PSU Failure: Immediate alerts when power supplies report fault conditions
  • Power Consumption: Monitor for unusual power draw that might indicate hardware issues

Physical Security:

  • Chassis Intrusion: Alert when server cases are opened unexpectedly
  • Fan Failure: Monitor cooling system status to prevent overheating

UPS System Alerts

Uninterruptible Power Supplies require careful monitoring to ensure power continuity:

Battery Management:

  • Battery Charge Level: Alert when charge drops below 80% during normal operation
  • Battery Runtime: Monitor estimated runtime remaining during power events
  • Battery Age: Track battery health and replacement requirements

Power Event Monitoring:

  • Switching to Battery: Alert when UPS switches to battery power
  • Utility Power Restored: Notification when mains power returns
  • Input Voltage Fluctuations: Monitor for power quality issues

Configuring Device Alert Thresholds

When setting up device alerts, access the Notifications settings with your specific device selected. This ensures alert conditions apply only to that device rather than globally.

Setting Threshold Values

  1. Identify Critical Metrics: Focus on metrics that indicate genuine problems rather than normal operational variations
  2. Set Appropriate Thresholds: Use manufacturer specifications and historical data to set meaningful alert levels
  3. Configure Alert Timing: Set appropriate delays to avoid false alarms from temporary fluctuations

Example configuration for a UPS battery alert:

Condition: Battery Charge < 75%
Duration: 5 minutes
Severity: Warning

Global vs Per-Device Alert Conditions

Global alerts apply to all monitored systems, but many global conditions don't translate meaningfully to devices. For instance, CPU usage thresholds appropriate for servers may not suit network switches with different processing patterns.

Per-device conditions offer more precise monitoring:

  • Device-Specific Thresholds: Tailor alert levels to each device's normal operating parameters
  • Relevant Metrics Only: Focus on metrics that matter for each device type
  • Contextual Alerting: Consider the device's role in your infrastructure when setting criticality levels

Practical Device Alerting Advice

Focus on Meaningful Metrics: Not every measurable parameter requires an alert. Concentrate on metrics that indicate actual problems requiring intervention.

Avoid Expected Event Alerts: Don't alert on normal operational events like UPS battery tests or planned maintenance modes. Configure your monitoring to recognise these expected state changes.

Implement Tiered Alerting: Use warning levels for developing issues and critical alerts for immediate problems. This helps prioritise response efforts effectively.

Regular Review: Periodically review alert thresholds and conditions to ensure they remain relevant as your infrastructure evolves.

By implementing thoughtful device alerting strategies, you'll maintain better visibility into your infrastructure's health whilst avoiding alert fatigue from unnecessary notifications.

Frequently Asked Questions

How do I set up device alerts in ServerScout

To set up device alerts, navigate to the Notifications settings page, select your target device from the device dropdown, and configure alert conditions specific to that device type. This ensures alerts apply only to the selected device rather than globally.

What device types can ServerScout monitor with alerts

ServerScout can monitor network switches, DRAC/IPMI controllers, and UPS systems with custom alerts. Each device type has specific metrics like port status for switches, temperature monitoring for IPMI controllers, and battery levels for UPS units.

How do device alerts differ from server alerts

Device alerts focus on device-specific metrics rather than traditional CPU or memory usage. They monitor parameters like network port status, hardware temperatures, power supply health, and UPS battery levels that are relevant to each device type.

What temperature thresholds should I set for server monitoring

Set CPU temperature alerts for temperatures exceeding manufacturer specifications, typically 70-80°C. Also monitor chassis ambient temperature and hard drive temperatures, as elevated temperatures often indicate cooling system issues that require immediate attention.

Why aren't my device alerts working properly

Check that you've selected the specific device in the Notifications settings rather than using global conditions. Ensure thresholds are appropriate for the device type and set adequate duration delays to avoid false alarms from temporary fluctuations.

What UPS metrics should I monitor with alerts

Monitor battery charge levels (alert below 80%), battery runtime estimates, power switching events, and input voltage fluctuations. Set up alerts for when the UPS switches to battery power and when utility power is restored to track power events.

Should I use global or per-device alert conditions

Use per-device conditions for more precise monitoring. Global alerts often don't translate meaningfully to devices since different device types have varying normal operating parameters. Per-device conditions allow device-specific thresholds and relevant metrics only.

What network switch metrics need monitoring

Monitor port status for critical uplinks and server connections, port error rates (alert above 0.1% sustained errors), switch CPU utilization (alert above 80%), and memory usage to detect performance issues before they affect network operations.

Was this article helpful?