Understanding Device Alerts in Server Scout
Server Scout's alerting system extends beyond traditional server monitoring to include comprehensive device monitoring capabilities. Whether you're managing network switches, DRAC/IPMI controllers, or UPS units, you can configure intelligent alerts that notify you of critical issues before they impact your infrastructure.
Alert Conditions for Devices
Device alerts work similarly to server alerts, but focus on device-specific metrics rather than traditional CPU or memory usage. Each monitored device can have custom alert conditions tailored to its type and role in your infrastructure.
To access device alert settings:
- Navigate to the Notifications settings page
- Select your target device from the device dropdown
- Configure alert conditions specific to that device type
Common Device Alert Scenarios
Network Switch Monitoring
Network switches require monitoring of port status and performance metrics:
Port Status Alerts:
- Port Down: Alert when critical uplink ports or server connections go offline
- Port Error Rate: Monitor for excessive packet errors or drops that indicate hardware issues
Configure these alerts with appropriate thresholds based on your network's normal operating patterns. A single packet drop isn't concerning, but sustained error rates above 0.1% typically indicate problems.
Switch Performance Metrics:
- CPU Utilisation: Alert when switch CPU exceeds 80% for extended periods
- Memory Usage: Monitor for memory exhaustion that could affect switching performance
DRAC/IPMI Controller Alerts
Hardware management controllers provide detailed system health data:
Temperature Monitoring:
- CPU Temperature: Set alerts for temperatures exceeding manufacturer specifications (typically 70-80°C)
- Chassis Temperature: Monitor ambient temperature within the server chassis
- Hard Drive Temperature: Alert on drive temperatures that could indicate cooling issues
Power Supply Monitoring:
- PSU Failure: Immediate alerts when power supplies report fault conditions
- Power Consumption: Monitor for unusual power draw that might indicate hardware issues
Physical Security:
- Chassis Intrusion: Alert when server cases are opened unexpectedly
- Fan Failure: Monitor cooling system status to prevent overheating
UPS System Alerts
Uninterruptible Power Supplies require careful monitoring to ensure power continuity:
Battery Management:
- Battery Charge Level: Alert when charge drops below 80% during normal operation
- Battery Runtime: Monitor estimated runtime remaining during power events
- Battery Age: Track battery health and replacement requirements
Power Event Monitoring:
- Switching to Battery: Alert when UPS switches to battery power
- Utility Power Restored: Notification when mains power returns
- Input Voltage Fluctuations: Monitor for power quality issues
Configuring Device Alert Thresholds
When setting up device alerts, access the Notifications settings with your specific device selected. This ensures alert conditions apply only to that device rather than globally.
Setting Threshold Values
- Identify Critical Metrics: Focus on metrics that indicate genuine problems rather than normal operational variations
- Set Appropriate Thresholds: Use manufacturer specifications and historical data to set meaningful alert levels
- Configure Alert Timing: Set appropriate delays to avoid false alarms from temporary fluctuations
Example configuration for a UPS battery alert:
Condition: Battery Charge < 75%
Duration: 5 minutes
Severity: Warning
Global vs Per-Device Alert Conditions
Global alerts apply to all monitored systems, but many global conditions don't translate meaningfully to devices. For instance, CPU usage thresholds appropriate for servers may not suit network switches with different processing patterns.
Per-device conditions offer more precise monitoring:
- Device-Specific Thresholds: Tailor alert levels to each device's normal operating parameters
- Relevant Metrics Only: Focus on metrics that matter for each device type
- Contextual Alerting: Consider the device's role in your infrastructure when setting criticality levels
Practical Device Alerting Advice
Focus on Meaningful Metrics: Not every measurable parameter requires an alert. Concentrate on metrics that indicate actual problems requiring intervention.
Avoid Expected Event Alerts: Don't alert on normal operational events like UPS battery tests or planned maintenance modes. Configure your monitoring to recognise these expected state changes.
Implement Tiered Alerting: Use warning levels for developing issues and critical alerts for immediate problems. This helps prioritise response efforts effectively.
Regular Review: Periodically review alert thresholds and conditions to ensure they remain relevant as your infrastructure evolves.
By implementing thoughtful device alerting strategies, you'll maintain better visibility into your infrastructure's health whilst avoiding alert fatigue from unnecessary notifications.
Frequently Asked Questions
How do I set up device alerts in ServerScout
What device types can ServerScout monitor with alerts
How do device alerts differ from server alerts
What temperature thresholds should I set for server monitoring
Why aren't my device alerts working properly
What UPS metrics should I monitor with alerts
Should I use global or per-device alert conditions
What network switch metrics need monitoring
Was this article helpful?