💾

When Backup Scripts Exit Zero but Nothing Got Backed Up: Why rsync's Partial Success Codes Break Your Recovery Plans

· Server Scout

The 3 AM Discovery

You're restoring from what should be yesterday's backup when you realise half the files are missing. The backup script shows successful completion in the logs. No alerts fired. systemd reports the service ran without errors. But somehow, your most critical application data never made it to the backup destination.

This isn't a rare edge case. rsync's exit codes can mask serious backup failures that leave you with incomplete data when you need it most.

When Success Isn't Really Success

rsync uses a bitmask for exit codes, and this is where things get dangerous. Exit code 0 means "no errors occurred", but codes like 23 and 24 indicate partial transfers that many consider "successful enough". Here's the problem: these partial success codes often hide critical failures.

Exit code 23 means "some files/attrs were not transferred". This could be permission issues on a few unimportant files, or it could mean your entire database dump directory was skipped due to a mount point issue. rsync doesn't distinguish between the two.

Exit code 24 indicates "some files vanished before they could be transferred". Again, this might be temporary files that don't matter, or it could signal that your application's data directory was unmounted mid-backup.

The real issue emerges when backup scripts use simple success/failure logic. A common pattern looks like this:

#!/bin/bash
rsync -avz /var/www/ backup@remote:/backups/
if [ $? -eq 0 ]; then
    echo "Backup completed successfully"
    exit 0
else
    echo "Backup failed"
    exit 1
fi

This script treats exit code 23 or 24 as failures, but what if your monitoring setup or systemd service only cares about the final exit status? You might have backup monitoring that accepts these "partial success" codes as acceptable, creating a false sense of security.

The systemd Service Masking Problem

When backup scripts run as systemd services, another layer of complexity appears. systemd's SuccessExitStatus directive lets you define which exit codes should be considered successful. Many backup service files include settings like:

[Service]
SuccessExitStatus=0 23 24

This tells systemd to treat partial transfers as successful, which prevents service failure alerts. The backup "completes" from systemd's perspective, even when critical data is missing.

Detecting systemd service failures that status checks miss becomes crucial here, because the standard service monitoring approaches won't catch these scenarios.

Building Better Backup Validation

The solution requires moving beyond exit codes to actual backup validation. Instead of trusting rsync's return value, verify that your critical files actually made it to the destination.

Start by identifying your non-negotiable files. Database dumps, application configurations, user data directories. For each backup run, explicitly check that these critical components were transferred successfully:

#!/bin/bash
rsync -avz /var/www/ backup@remote:/backups/
RSYNC_EXIT=$?

# Check critical files exist at destination
critical_files=("/var/www/config/database.php" "/var/www/uploads/")
for file in "${critical_files[@]}"; do
    if ! ssh backup@remote "test -e /backups${file}"; then
        echo "CRITICAL: ${file} missing from backup"
        exit 2
    fi
done

echo "Backup validation completed"
exit 0

This approach validates that your most important data actually reached the backup destination, regardless of what rsync's exit code suggests.

Monitoring Backup Health Beyond Process Success

Process-level monitoring often misses backup content validation entirely. Your alerts might fire at 3 AM but recovery notifications never arrive when the underlying issue is data integrity rather than service availability.

Server Scout's plugin system handles this scenario well by letting you create bash-based validation scripts that check backup content, not just process completion. You can monitor backup file sizes, verify critical directories exist, and alert when backup validation fails even if the rsync process technically succeeded.

For production environments, consider implementing backup validation as a separate monitoring check that runs after your backup scripts complete. This gives you independent verification that your data protection strategy is actually protecting data.

Don't let rsync's nuanced exit codes create gaps in your disaster recovery planning. When your business depends on that backup working, "good enough" isn't good enough.

If you want to start monitoring backup validation properly, Server Scout's free trial includes plugin support for custom backup verification scripts.

Ready to Try Server Scout?

Start monitoring your servers and infrastructure in under 60 seconds. Free for 3 months.

Start Free Trial