The alerts started trickling in on a Tuesday morning. Nothing dramatic — just the usual SSH authentication failures that every Linux administrator sees dozens of times per day. Our fail2ban logs showed the expected pattern: a few failed attempts from 103.45.12.8, then silence as the IP got banned for 10 minutes.
What we didn't realise was that we were witnessing the opening moves of a coordinated campaign that would attempt over 15,000 login combinations across our infrastructure over the next 48 hours — all while staying completely invisible to our rate-limiting defences.
The Attack Pattern That fail2ban Couldn't See
The breakthrough came when Sarah, our junior sysadmin, noticed something odd in the authentication logs. She'd been tracking failed login attempts as part of a security audit and spotted a curious pattern: the same usernames were being tried across different servers, but always from different IP addresses.
"Look at this," she said, pulling up logs from three different web servers. "Someone tried 'admin' on all three machines within a five-minute window, but from completely different countries."
That observation changed everything. Instead of looking at individual server logs, we started correlating authentication attempts across our entire infrastructure.
Initial Signs: Scattered Failed Logins
The individual server logs looked perfectly normal. Each machine was seeing 2-3 failed SSH attempts per hour — well below any reasonable alert threshold. fail2ban was working exactly as designed, blocking IPs after 5 failed attempts within 10 minutes.
But when we aggregated the data across all 23 servers in our fleet, a disturbing picture emerged:
- Same 15 usernames being tested systematically
- Attempts coordinated within 3-4 minute windows
- Source IPs spread across 47 different countries
- Each IP never exceeded 2-3 attempts per target
Log Correlation Reveals the Coordination
We pulled authentication logs from all servers and started building a timeline. The pattern was sophisticated: whoever was behind this understood exactly how traditional rate limiting works and had engineered their approach to stay below detection thresholds.
The attack followed a clear sequence:
- Test common usernames (admin, root, user) with obvious passwords
- Move to service accounts (mysql, postgres, www-data)
- Try variations of the server hostname as both username and password
- Attempt dictionary attacks with common corporate usernames
Each phase used completely different IP ranges, suggesting access to a substantial botnet infrastructure.
Detective Work: Mapping the Attack Timeline
Once we understood we were dealing with a coordinated campaign, the investigation became a matter of forensic log analysis. We needed to understand not just what was happening, but how sophisticated this operation really was.
Geographic Distribution Analysis
The source IP analysis revealed the true scale of the operation. Attack origins included:
- Compromised home routers across Eastern Europe
- Cloud instances from major providers (likely using stolen credentials)
- Mobile network ranges from Southeast Asia
- Residential broadband connections from South America
The geographic distribution wasn't random — it followed timezone patterns that suggested human coordination. Attacks intensified during European business hours, then shifted to Asian IP ranges as the day progressed.
Identifying Common Attack Signatures
Beyond the geographic patterns, we found several technical signatures that confirmed coordination:
- Identical SSH client version strings across different continents
- Consistent timing intervals (exactly 180 seconds between attempts)
- Shared password lists with identical typos and variations
- Sequential testing of accounts in the same order across all targets
This wasn't amateur hour. The campaign showed clear signs of professional planning and substantial infrastructure investment.
Why Traditional Rate Limiting Failed
The attack succeeded precisely because it understood and exploited the assumptions built into tools like fail2ban. Traditional SSH protection assumes that attacks come from concentrated sources that can be identified and blocked based on individual IP behaviour.
Single-IP Thresholds vs Distributed Patterns
fail2ban's default configuration triggers after 5 failed attempts from a single IP within 10 minutes. But this attack never exceeded 2-3 attempts per IP per server. Each attacking machine would test a few credentials, then disappear, leaving no trace in the individual server's logs.
From any single server's perspective, the traffic looked like normal internet background noise — the kind of random SSH probing that every publicly accessible Linux machine experiences.
The attackers had essentially weaponised the internet's baseline security noise level.
Building Better Detection Rules
Understanding Smart Alerts became crucial once we recognised the distributed nature of the threat. Single-server monitoring simply couldn't provide the visibility needed to detect this type of coordinated campaign.
Time-Window Correlation Techniques
We developed new detection rules that looked for patterns across our entire infrastructure within specific time windows:
- More than 10 authentication failures across the fleet within 5 minutes
- Identical usernames tested on multiple servers within 30 minutes
- Unusual geographic diversity in SSH connection sources
- Repeated password patterns across different source IPs
These rules caught the next wave of attacks within minutes rather than hours.
Implementing Multi-IP Rate Limiting
The traditional approach of blocking individual IPs needed enhancement. We implemented rate limiting based on:
- Total authentication failures per time window across all servers
- Geographic clustering of failed attempts
- Username pattern matching across the infrastructure
- Coordinated response that shared threat intelligence between servers
This broader perspective revealed attack patterns that individual server monitoring simply couldn't detect.
Prevention Strategies for Enterprise Teams
The experience taught us that modern SSH attacks require modern detection approaches. Building Monitoring System Redundancy: A Complete Multi-Region Alert Infrastructure Guide became essential reading as we rebuilt our security monitoring with distributed threats in mind.
Key changes included:
- Centralised logging with cross-server correlation
- Geographic IP analysis and reputation scoring
- Shared threat intelligence between all infrastructure components
- Alert rules based on fleet-wide patterns rather than individual server behaviour
- Integration with Server Scout's alerting system to provide real-time visibility across our entire infrastructure
The most important lesson: security monitoring can't remain server-centric when attacks are increasingly fleet-centric. Traditional tools like fail2ban remain valuable for obvious threats, but sophisticated attackers now require monitoring solutions that can correlate data across entire infrastructures.
This experience highlighted why enterprises need monitoring platforms capable of cross-server pattern detection — because the next generation of security threats won't conveniently limit themselves to attacking one server at a time.
FAQ
How long did it take to identify the coordinated attack pattern?
About 18 hours from the first alerts. The individual server logs looked normal, so it wasn't until we started correlating data across our entire infrastructure that the coordination became obvious.
Could this attack have succeeded with weak passwords?
Absolutely. The attackers were testing thousands of common username/password combinations. Any server with default credentials or weak passwords would have been compromised quickly.
How can teams detect similar attacks proactively?
Implement fleet-wide monitoring that correlates SSH attempts across all servers within time windows. Look for patterns in usernames, timing, and geographic distribution rather than just individual IP behaviour.