The 90-Second Mystery
You restart a critical service and watch the terminal. Thirty seconds pass. A minute. At the 90-second mark, suddenly everything springs to life and works perfectly. But why did it take so long when systemd-analyze blame shows your service starting in under 200ms?
This isn't about slow application startup. It's about systemd sitting around waiting for something that may never happen, or happening much later than expected.
Two Different Problems, Same Symptom
When a service takes ages to start, you're usually looking at one of two issues: a timeout that's too generous, or a dependency that's lying about when it's actually ready.
Timeout problems happen when systemd gives your service plenty of time to start, but the service fails quickly and systemd waits the full duration before giving up and trying something else. The default TimeoutStartSec is 90 seconds, which explains that magic number.
Dependency race conditions occur when Service A depends on Service B, but Service B reports itself as "started" before it's actually ready to accept connections. Service A then fails to connect, gets restarted by systemd, and eventually succeeds on a retry.
Spotting the Difference
Check your journal logs with journalctl -u your-service.service -f during startup. If you see multiple "Starting" entries for the same service, you've got a dependency race. The service is failing and getting restarted.
If you see a single "Starting" entry followed by a long pause, then either "Started" or "Failed", you're dealing with a timeout issue.
For dependency races, look at what your service connects to during startup. Database connections, API calls, file system mounts - anything that involves waiting for another service. Then check if those dependencies use Type=forking or Type=notify properly.
The ExecStartPre Trap
A common mistake is putting connectivity checks in ExecStartPre without understanding that these run with the same timeout as the main service. If your ExecStartPre script waits 60 seconds for a database connection, and your main service also waits 30 seconds, you're already at the default 90-second timeout.
Instead of long waits in ExecStartPre, use proper dependencies with After= and Wants= directives. Let systemd handle the ordering, and make your service robust enough to handle temporary connection failures.
Fixing Timeout Issues
If your service genuinely needs more time to start, adjust TimeoutStartSec. But first, question whether it really does. Most services that think they need 5 minutes to start actually need better error handling and retry logic.
For services that make network calls during startup, consider making those calls asynchronous or moving them to a health check that runs after the service reports itself as started.
Fixing Dependency Races
The nuclear option is adding artificial delays with ExecStartPre=/bin/sleep 10. Don't do this. Instead, fix the lying dependency.
If you control the dependency service, switch it to Type=notify and use sdnotify() to signal when it's truly ready. If you don't control it, you might need a wrapper script that polls for actual readiness.
Similar to how missing environment variables can make cron jobs silently fail, systemd's dependency system can mask the real problem behind successful status codes.
Monitoring Service Startup Times
Once you've fixed the immediate problem, you want to catch regressions. Server Scout's service monitoring tracks not just whether services are running, but can alert you when startup patterns change - like when a service that normally starts in seconds suddenly takes minutes.
For production environments, this kind of early warning is often more valuable than post-failure alerts.
If you're dealing with complex service dependencies and want to monitor startup performance across your infrastructure, Server Scout's free trial includes service monitoring without the overhead of heavyweight agents that might themselves contribute to startup delays.