The Discovery: When Normal Alerts Hide Coordinated Attacks
The incident started on a Tuesday morning with what looked like routine noise. Each server in the hosting company's fleet was reporting the usual background SSH attempts - nothing that triggered alerts, nothing that looked suspicious in isolation. Authentication logs showed scattered failed login attempts, all below the thresholds that would normally warrant attention.
That changed when one sysadmin noticed something odd whilst reviewing logs from multiple servers simultaneously. The timestamps didn't match what you'd expect from random attacks.
Individual Server Logs Showed Nothing Suspicious
Looking at any single server, the attack pattern was invisible. /var/log/auth.log entries showed typical failed SSH attempts:
- 3-4 failed attempts per hour
- Different usernames: admin, root, test, user
- Source IPs from various countries
- No sustained brute force from any single IP
Each server's monitoring showed green. No rate-limiting triggers. No individual IP addresses crossed the fail2ban thresholds. The attack was specifically designed to fly under traditional single-server detection.
The Geographic Pattern That Changed Everything
The breakthrough came from examining auth logs across 40 servers simultaneously. What emerged was a coordinated campaign spanning 72 hours, with attack waves moving systematically across geographic regions.
The botnet was using a technique called "distributed credential testing" - each compromised machine would attempt 2-3 login combinations against multiple targets, then hand off to the next bot in a different IP range. The timing wasn't random: attacks followed business hours across time zones, hitting European servers during European night hours, then shifting to target American infrastructure during US overnight periods.
IP geolocation analysis revealed the coordination. Instead of scattered global attacks, there were distinct clusters moving in patterns that suggested centralised control. The same username/password combinations appeared across servers within 10-minute windows, but from different source countries.
Building Cross-Server Log Correlation
Once the pattern became clear, building detection required correlating data that individual servers couldn't see. The investigation revealed three essential components for catching distributed attacks.
Essential Log Fields for Attack Pattern Recognition
Effective correlation depends on extracting the right data points from standard SSH logs. Beyond the obvious failed authentication entries, the analysis needed to track:
- Authentication attempt timing across multiple servers
- Username enumeration patterns that span infrastructure
- Source IP geographic clustering by time window
- Credential combination sequences that repeat across hosts
The correlation scripts focused on /var/log/auth.log entries, but the key insight was analysing frequency patterns rather than absolute counts. A username that appears once per server but across 20 servers within an hour indicates coordination that per-server monitoring would miss entirely.
Timing Analysis Across Multiple Systems
The breakthrough analysis came from plotting authentication attempts against time windows rather than individual events. Using 15-minute intervals, the coordinated nature became obvious.
Normal random attacks show consistent background noise. Coordinated attacks show distinct peaks that move across infrastructure in waves. The botnet was hitting 8-12 servers simultaneously, then moving to the next cluster after a 20-minute pause.
Time synchronisation across servers becomes critical for this analysis. Even small clock drift can mask attack patterns when you're looking for coordination across minutes rather than hours.
Detection Methods That Actually Work
Building automated detection required moving beyond traditional threshold-based alerting to pattern recognition across multiple log sources.
Automated Correlation Scripts and Queries
The detection system used simple bash scripts to aggregate auth logs across servers and identify suspicious patterns. Key detection rules included:
- Same username attempted across 5+ servers within 30 minutes
- Geographic IP clustering: 3+ attempts from same /16 network across different servers
- Temporal patterns: authentication bursts that move systematically through server groups
Centralised logging systems like rsyslog make this analysis much simpler, but even distributed analysis works with SSH-based log collection scripts that run every 15 minutes.
Setting Thresholds for Geographic Clustering
The challenge was distinguishing coordinated attacks from legitimate global traffic. The system used geographic velocity analysis - tracking how quickly authentication attempts moved between IP ranges and geographic regions.
Legitimate traffic shows consistent geographic distribution. Coordinated attacks show impossible geographic velocity: attempts from Dublin, then New York, then Tokyo within minutes, but using the same credentials against the same services.
Alert thresholds that worked in practice:
- 3+ servers targeted with identical credentials within 15 minutes
- Geographic velocity exceeding 1000km/hour for related authentication attempts
- Username enumeration affecting 20% of monitored infrastructure within 60 minutes
Response Coordination When Attacks Span Regions
Once detected, responding to distributed attacks requires different procedures than handling single-server incidents. The attack scope often spans multiple data centres, cloud regions, and administrative boundaries.
Documentation Templates for Multi-Server Incidents
Effective response required documenting the attack timeline, affected systems, and remediation steps across multiple teams and locations. Standard incident response templates designed for single-server problems don't capture the complexity of coordinated attacks.
The incident documentation needed to track:
- Attack progression timeline across server groups
- Source IP ranges and their geographic distribution
- Affected services and potential data exposure
- Remediation status across distributed infrastructure
This type of coordinated response is exactly where proper incident response playbooks become essential. The complexity of multi-server security incidents quickly overwhelms ad-hoc response procedures.
The attack ultimately attempted to compromise administrative credentials across 40 servers, with potential access to customer data worth significantly more than the €340K in immediate response costs. Early detection through log correlation prevented what could have been a much larger data breach.
Cross-server monitoring capabilities, like those provided by Server Scout's fleet dashboard, make this type of analysis much simpler by aggregating data across your entire infrastructure rather than forcing manual correlation across individual server logs.
FAQ
How often should I run cross-server log correlation analysis?
For production environments, run correlation analysis every 15-30 minutes. This catches coordinated attacks quickly while avoiding false positives from normal traffic patterns. Daily analysis misses fast-moving campaigns.
What's the minimum number of servers where this analysis becomes worthwhile?
Cross-server correlation becomes valuable with 5+ servers. Below that threshold, traditional per-server monitoring usually suffices. The analysis value increases significantly with infrastructure size - 20+ servers make sophisticated attack patterns much more visible.
Can this detection work without centralised logging?
Yes, but it requires more effort. Distributed analysis using SSH-based log collection scripts works, but centralised logging with tools like rsyslog or Server Scout's unified dashboard makes correlation analysis much simpler and more reliable.