Systemd is the backbone of modern Linux distributions, managing services, processes, and system resources. When systemd units fail, they can indicate serious problems with your server's health. Server Scout provides comprehensive monitoring for failed systemd units, helping you catch and resolve issues before they impact your users.
What Are Failed Systemd Units?
A failed systemd unit represents a service that attempted to start but crashed or exited with an error code. This could be due to configuration problems, missing dependencies, resource constraints, or application bugs. Unlike stopped services (which are intentionally inactive), failed units indicate something has gone wrong and requires attention.
Common causes of failed units include:
- Misconfigured service files
- Missing executable files or dependencies
- Permission issues
- Resource exhaustion (memory, disk space)
- Network connectivity problems
Enabling Systemd Monitoring in Server Scout
To monitor failed systemd units, enable the systemd_failed metric in your Server Scout configuration:
sudo nano /opt/serverscout/scout.conf
Add or uncomment the following line:
systemd_failed=1
Restart the Server Scout agent to apply the changes:
sudo systemctl restart serverscout
The agent will now count units in the "failed" state every hour as part of the glacial monitoring tier. This frequency is appropriate since systemd failures typically require immediate attention when they occur, but don't need constant monitoring once identified.
Viewing Failed Units in Server Scout
Once enabled, you can monitor failed systemd units through the Server Scout dashboard:
Server Detail Page
Navigate to your server's detail page and locate the System panel. Here you'll find the failed unit count alongside other system metrics. This gives you an at-a-glance view of systemd health across all your monitored servers.
Services List
The services section provides detailed information about individual systemd units, showing:
- Status: Active, failed, inactive, or other states
- Enabled/Disabled: Whether the service starts automatically at boot
- Unit type: Service, socket, timer, etc.
This granular view helps you quickly identify which specific services are experiencing problems.
Health Summary Alerts
Server Scout's health summary automatically flags servers when more than 10 systemd units are in a failed state. This threshold indicates a potentially serious system-wide issue that requires immediate investigation.
Investigating Failed Units
When Server Scout identifies failed systemd units, use these commands to investigate:
List All Failed Units
systemctl list-units --failed
This command shows all currently failed units with their load state and active status.
Examine Specific Unit Status
systemctl status <unit-name>
Replace with the specific service name to see detailed status information, including recent log entries and the reason for failure.
Check Service Logs
journalctl -u <unit-name>
Use journalctl to examine the full log history for a specific unit. Add -f to follow logs in real-time or --since "1 hour ago" to limit the timeframe.
View Recent System Logs
journalctl --since "1 hour ago" --priority=err
This shows recent error-level messages across all systemd units, helping identify patterns or related failures.
Setting Up Alerts
Configure alerts in Server Scout to notify you when failed systemd units exceed your defined threshold:
- Navigate to the Alerts section in your Server Scout dashboard
- Create a new alert rule for the "Systemd Failed Units" metric
- Set your desired threshold (consider starting with 1-3 failed units for critical servers)
- Configure notification channels (email, Slack, webhooks)
- Define alert frequency to avoid spam during extended outages
For production servers, consider setting a low threshold (1-2 failed units) with immediate notifications. Development or staging environments might tolerate higher thresholds.
Best Practices
- Review failed units promptly—they often indicate underlying system problems
- Investigate patterns in failures across multiple servers
- Document solutions for recurring issues to speed future resolution
- Consider setting different alert thresholds based on server criticality
- Regularly audit your systemd services to remove unnecessary units
By monitoring failed systemd units with Server Scout, you'll maintain better visibility into your server health and catch problems before they escalate into service disruptions.
Frequently Asked Questions
How do I enable systemd monitoring in ServerScout
What causes systemd units to fail
How do I troubleshoot failed systemd units
When does ServerScout alert for failed systemd units
How often does ServerScout check for failed systemd units
Where can I view failed systemd units in ServerScout
What threshold should I set for systemd unit alerts
Was this article helpful?