Application Health Checks for Production

The Problem with Process-Only Monitoring

Your web application is running. The process is active, consuming memory, responding to signals. But users are getting 500 errors because the database connection pool is exhausted, or the Redis cache has gone stale, or the SSL certificate expired yesterday.

Process monitoring tells you if something is running. Application health checks tell you if it's actually working.

Most monitoring setups stop at "is the process alive?" But in production, applications fail in more creative ways than simple crashes. They hang on external API calls. They exhaust connection pools. They run out of file descriptors. They keep running, but they stop working.

Building Real Health Checks

A proper health check exercises the critical path your users depend on. For a web application, that usually means:

Database connectivity (not just "can I connect" but "can I execute a simple query")
Cache availability and responsiveness
External service dependencies
File system write permissions
SSL certificate validity

Here's a bash script that checks a typical LAMP application:

#!/bin/bash

# Test database connection with actual query
mysql -u healthcheck -p"$DB_PASS" -h localhost -e "SELECT 1" > /dev/null 2>&1
if [ $? -ne 0 ]; then
    echo "Database connection failed"
    exit 1
fi

# Test web server response time
response_time=$(curl -w "%{time_total}" -s -o /dev/null http://localhost/health)
if (( $(echo "$response_time > 5.0" | bc -l) )); then
    echo "Web server response too slow: ${response_time}s"
    exit 1
fi

# Check SSL certificate expiry
cert_days=$(echo | openssl s_client -connect localhost:443 2>/dev/null | openssl x509 -noout -dates | grep notAfter | cut -d= -f2 | xargs -I {} date -d "{}" +%s)
current_time=$(date +%s)
days_until_expiry=$(( (cert_days - current_time) / 86400 ))

if [ $days_until_expiry -lt 30 ]; then
    echo "SSL certificate expires in $days_until_expiry days"
    exit 1
fi

echo "Application healthy"

Making Health Checks Meaningful

The best health checks mirror real user behaviour. If your application serves API requests, your health check should make an API request. If it processes files from a queue, test that the queue is reachable and processable.

Avoid the temptation to make health checks too comprehensive. A health check that takes 30 seconds to run isn't useful for real-time monitoring. Focus on the components that, when they fail, make your application unusable.

For applications with external dependencies, consider implementing circuit breaker patterns in your health checks. If the external payment API is down, you might want different alerting behaviour than if your core database is unreachable.

Integration with System Monitoring

Once you've built meaningful health checks, integrate them with your monitoring system. Server Scout's plugin system makes this straightforward with bash-based custom metrics.

Create a plugin that runs your health check and reports both success/failure status and response times. This gives you both binary "is it working" alerts and trending data to spot performance degradation before it becomes an outage.

The kernel documentation covers many system-level metrics, but application health is something only you can define based on your specific stack and user requirements.

Beyond Basic Checks

As your monitoring matures, consider checks that predict problems rather than just detecting them. Monitor connection pool utilisation, queue depths, cache hit ratios, and disk space growth rates.

These leading indicators often give you hours or days of warning before a system fails completely.

If you want to implement comprehensive application health monitoring without the overhead of heavyweight agents, Server Scout's approach lets you build custom checks that integrate seamlessly with system metrics.

Building Application Health Checks That Actually Work in Production

The Problem with Process-Only Monitoring

Building Real Health Checks

Making Health Checks Meaningful

Integration with System Monitoring

Beyond Basic Checks

Ready to Try Server Scout?