TCP Connection Metrics Explained

TCP connections are one of the most revealing metrics for understanding your server's network behaviour and application health. Server Scout tracks five key TCP connection states that together provide a complete picture of your server's networking activity and can quickly highlight application bugs or performance issues.

Understanding TCP Connection States

The Server Scout agent collects TCP connection metrics every 30 seconds as part of the medium tier, parsing connection state information from /proc/net/tcp and /proc/net/tcp6. These virtual files contain a real-time snapshot of all TCP sockets on your system, with state values encoded in hexadecimal format.

The five metrics tracked provide different insights into your server's networking behaviour:

MetricDescriptionWhat It Reveals
tcp_connectionsTotal connections across all statesOverall networking activity level
tcp_establishedActive data-transferring connectionsCurrent load and user activity
tcp_time_waitRecently closed connections in cleanupConnection turnover rate
tcp_close_waitConnections closed by remote end, pending local closeApplication socket handling bugs
tcp_listenListening sockets accepting new connectionsNumber of active services

The Five TCP Connection Metrics

tcp_connections: Total Network Activity

This metric represents the sum of all TCP connections in any state. It provides a high-level view of your server's networking activity and can help identify overall traffic patterns or sudden spikes in connection volume.

For most servers, this number will be dominated by tcp_established and tcp_time_wait connections. A sudden increase often indicates either a traffic surge or a potential issue with connection handling.

tcp_established: Active Connections

These are connections actively transferring data between your server and clients. The tcp_established count reflects real-time concurrent users or services connected to your server.

This is typically the most important metric for understanding current load. For web servers, it correlates directly with active user sessions. For database servers, it represents active client connections. A gradual increase might indicate growing usage, while sudden spikes could suggest traffic surges or potential attacks.

tcp_time_wait: The Cleanup State

Connections in TIME_WAIT state have been properly closed but are maintained by the kernel for 60 seconds (by default) to prevent stale packets from being misinterpreted by future connections using the same port combination.

This is a normal and healthy part of the TCP protocol. Seeing hundreds or even low thousands of TIME_WAIT connections is expected behaviour for busy servers. The kernel handles these automatically—no application intervention is required.

However, very high counts (>5000) may indicate connection churn issues where your server is creating and destroying connections faster than the TIME_WAIT period can expire. This can eventually lead to port exhaustion.

tcp_close_wait: The Application Bug Indicator

CLOSE_WAIT connections are the most actionable metric in this group. These represent connections where the remote end has sent a FIN packet (indicating they want to close the connection), but your local application has not yet closed its side of the socket.

This almost always indicates an application bug. Well-behaved applications should close sockets promptly when the remote end closes its connection. CLOSE_WAIT connections will never clear on their own—only the application can resolve them by properly closing the socket.

Even a small sustained number of CLOSE_WAIT connections (>10) warrants immediate investigation, as they represent resource leaks that will accumulate over time.

tcp_listen: Available Services

This metric counts sockets in the LISTEN state—ports where your server is accepting incoming connections. The number typically correlates with how many services you're running: web servers, SSH, databases, monitoring agents, etc.

A sudden decrease might indicate a service has crashed or stopped listening. An unexpected increase could suggest a new service has started or potentially unwanted software is running.

Why CLOSE_WAIT Demands Immediate Attention

Of all TCP connection states, CLOSE_WAIT is the one that always requires human intervention. Unlike other states that are managed automatically by the kernel, CLOSE_WAIT connections persist until the application explicitly closes them.

Common causes of CLOSE_WAIT accumulation include:

  • Resource leaks: Applications opening sockets but failing to close them in error conditions
  • Blocking operations: Applications stuck in long-running operations and not processing socket events
  • Poor error handling: Applications not properly handling connection failures or timeouts
  • Framework bugs: Issues in web frameworks, database connectors, or HTTP clients

When you see sustained CLOSE_WAIT connections, the problem is always in your application code or the libraries it uses. The operating system cannot resolve these automatically.

Investigating Connection Issues

Server Scout's real-time monitoring will alert you to unusual connection patterns, but you'll often need to dig deeper to identify the root cause.

Finding CLOSE_WAIT Culprits

Use the ss command to identify which processes have CLOSE_WAIT connections:

ss -tap state close-wait

This shows the process ID and name for each CLOSE_WAIT socket, helping you identify which application needs attention.

Monitoring TIME_WAIT Levels

Count current TIME_WAIT connections to assess if you have a connection churn issue:

ss -tan state time-wait | wc -l

You can also check your system's TIME_WAIT timeout setting:

cat /proc/sys/net/ipv4/tcp_fin_timeout

Examining All Connection States

For a complete picture of your server's TCP state distribution:

ss -tan | awk '{print $1}' | sort | uniq -c

Connection Patterns by Server Role

Different types of servers exhibit characteristic TCP connection patterns:

Web Servers

Web servers typically show:

  • High tcp_established: Correlates with concurrent users
  • Moderate to high tcp_time_wait: Normal result of HTTP request/response cycles
  • Low tcp_close_wait: Should be near zero for healthy applications
  • Moderate tcp_listen: HTTP, HTTPS, SSH, and monitoring ports

Peak traffic periods will show increased tcp_established followed by corresponding increases in tcp_time_wait as connections complete.

Database Servers

Database servers usually exhibit:

  • Steady tcp_established: Represents persistent application connections
  • Low tcp_time_wait: Database connections are typically long-lived
  • Very low tcp_close_wait: Well-behaved database clients close connections properly
  • Low tcp_listen: Usually just the database port, SSH, and monitoring

Load Balancers and Proxies

Load balancers show the most complex patterns:

  • Very high tcp_established: Handling many simultaneous client and backend connections
  • High tcp_time_wait: Rapid connection turnover as requests are proxied
  • Variable tcp_close_wait: May indicate issues with backend applications
  • Moderate tcp_listen: Multiple ports for different services or protocols

Tuning TIME_WAIT Behaviour

If your server experiences legitimate TIME_WAIT socket exhaustion (typically >30,000 connections), you can adjust kernel parameters:

# Allow reuse of TIME_WAIT sockets for new connections
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse

# Reduce TIME_WAIT timeout (default 60 seconds)
echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout

However, be cautious with these changes as they can affect network reliability. It's often better to address the root cause of excessive connection creation.

Using Server Scout's TCP Metrics

Server Scout displays TCP connection metrics with multiple time ranges, allowing you to identify both short-term spikes and long-term trends. The 1-hour view shows raw 30-second data points, perfect for investigating immediate issues, while the 7-day view with 15-minute averages helps identify patterns and growth trends.

Set up alerting rules for:

  • tcp_close_wait > 10 (any sustained level indicates problems)
  • tcp_time_wait > 5000 (potential connection churn issues)
  • Sudden drops in tcp_listen (service availability issues)

By understanding these five TCP connection metrics and their implications, you can quickly identify application bugs, assess server load, and maintain healthy networking behaviour across your infrastructure.

Back to Complete Reference Index

Frequently Asked Questions

What are TCP TIME_WAIT connections?

TIME_WAIT connections (tcp_time_wait) are TCP sockets waiting for potential late packets after the connection has been closed. This is a normal part of the TCP protocol and prevents data from a previous connection being misinterpreted by a new one. High TIME_WAIT counts (above 5000) on busy servers may require kernel tuning of net.ipv4.tcp_tw_reuse to allow faster socket recycling.

What does a high TCP CLOSE_WAIT count indicate?

CLOSE_WAIT connections (tcp_close_wait) mean the remote end has closed the connection but the local application has not yet closed its socket. This almost always indicates an application bug where connections are not being properly closed. CLOSE_WAIT connections should be near zero. Investigate the application holding these sockets using commands like ss or lsof.

What does a sudden increase in TCP connections mean?

A sudden spike in tcp_connections can indicate increased legitimate traffic, a DDoS attack, a connection leak in an application, or a service being overwhelmed. Check tcp_established for active connections and tcp_time_wait for recently closed ones. Cross-reference with network throughput metrics to distinguish between genuine traffic increases and connection floods without significant data transfer.

How many TCP listening sockets should my server have?

The tcp_listen count depends on how many services are running. Each network service (web server, database, SSH, etc.) opens at least one listening socket. A typical web server might have 5-10 listening sockets. A sudden change in tcp_listen could indicate a service crashing (decrease) or an unauthorised service starting (increase). Monitor for unexpected changes from your baseline.

Was this article helpful?