You've got a web application that intermittently hangs during large file uploads. The connection stays active, netstat shows ESTABLISHED, and basic connectivity tests pass. Yet the upload progress bar sits motionless for minutes before timing out.
This isn't a routing problem or a firewall rule. It's TCP window scaling gone wrong, and it's one of the more subtle network issues you'll encounter in production.
What Window Scaling Actually Does
TCP's receive window tells the sender how much data it can accept before needing an acknowledgement. The original TCP specification limited this to 65,535 bytes, which worked fine for dialup connections but becomes a bottleneck on modern high-bandwidth, high-latency links.
Window scaling multiplies the advertised window size by a power of two. When both endpoints negotiate scaling during the three-way handshake, they can achieve much larger effective windows and better throughput.
The problem emerges when middleboxes interfere with this negotiation.
Spotting the Symptoms
The classic presentation is selective application failures. SSH works fine, small HTTP requests succeed, but large transfers hang partway through. You might see:
- File uploads that start normally then freeze
- Database replication that establishes connections but never syncs
- API calls that work for small payloads but timeout on larger responses
Use ss -i to examine the window scaling parameters on active connections:
ss -i dst 192.168.1.100
State Local Address:Port Peer Address:Port
ESTAB 10.0.0.50:22 192.168.1.100:54321
cubic wscale:7,6 rto:204 rtt:3.8/2.1
The wscale:7,6 shows the local and remote scaling factors. If you see wscale:0,0, scaling isn't active despite both endpoints supporting it.
The Firewall Problem
Many firewalls and load balancers strip TCP options they don't recognise, including window scale options. This forces connections back to the original 64KB window limit.
To test if a middlebox is interfering, capture packets during connection establishment:
tcpdump -i eth0 -s 0 'tcp[tcpflags] & tcp-syn != 0'
Look for the window scale option (kind 3) in SYN and SYN-ACK packets. If it appears in the client SYN but disappears from the server's SYN-ACK, something in the path is stripping it.
You can also check if the issue affects bandwidth utilisation patterns differently than connection counts, particularly when dealing with fewer concurrent connections that should be achieving higher per-connection throughput.
Working Around the Issue
If you can't fix the middlebox behaviour, you have several options:
Disable window scaling entirely on the affected systems:
echo 0 > /proc/sys/net/ipv4/tcp_window_scaling
This isn't ideal for performance, but it eliminates the negotiation mismatch.
Tune the initial window size to work better within the 64KB limit:
echo 16 > /proc/sys/net/ipv4/tcp_slow_start_after_idle
Configure application-level chunking to work around the smaller effective window.
Long-term Monitoring
Window scaling issues tend to be intermittent, appearing only when traffic patterns hit the right combination of connection duration and data volume. A monitoring system that tracks both connection states and actual throughput can help you spot these problems before they affect users.
Server Scout's network monitoring capabilities include connection tracking alongside bandwidth metrics, giving you the visibility to correlate connection health with actual data flow. The lightweight bash agent won't interfere with your network tuning efforts while providing the data you need to diagnose these subtle TCP issues.
If you're dealing with mysterious connection hangs that don't fit the usual patterns, window scaling mismatches deserve a spot on your troubleshooting checklist.