🔌

Building PostgreSQL Connection Pool Monitoring Through /proc/net/tcp: Step-by-Step Socket State Analysis Tutorial

· Server Scout

Your PostgreSQL application reports "connection pool exhausted" errors at 2 PM on a Tuesday. The database looks fine - CPU at 40%, memory normal, disk I/O steady. By the time your application monitoring catches the problem, users are already seeing timeouts.

The warning signs were there 3 minutes earlier, hidden in your system's network stack. Socket states in /proc/net/tcp show connection patterns that predict pool exhaustion before any application-level monitoring tool reports trouble.

This tutorial walks through building comprehensive PostgreSQL connection pool monitoring using system-level analysis. You'll learn to read socket states, establish meaningful baselines, and create early warning systems that give your team time to respond before users notice problems.

Understanding Connection Pool Exhaustion Warning Signs

Connection pool exhaustion doesn't happen instantly. It builds gradually as connections accumulate in various TCP states, creating a measurable pattern before the pool reaches capacity.

Reading Socket States in /proc/net/tcp

The /proc/net/tcp file contains real-time socket information that reveals connection health before application metrics detect issues. Each line represents an active TCP connection with its current state encoded as hexadecimal values:

  • 01 = ESTABLISHED (active connection)
  • 06 = TIME_WAIT (connection closed, waiting for cleanup)
  • 08 = CLOSE_WAIT (application hasn't closed connection)
  • 0A = LISTEN (accepting new connections)

For PostgreSQL monitoring, focus on ESTABLISHED and TIMEWAIT states. ESTABLISHED connections count toward your pool limit, while TIMEWAIT accumulation suggests connection handling problems.

Identifying PostgreSQL Connection Patterns

PostgreSQL typically runs on port 5432. In /proc/net/tcp, addresses appear in hexadecimal format. Port 5432 converts to hex as 1538. Look for entries where the local or remote address contains :1538.

Healthy PostgreSQL connection patterns show:

  • Steady ESTABLISHED connection count during normal operation
  • Quick TIME_WAIT cleanup (sockets disappear within 60 seconds)
  • Minimal CLOSE_WAIT accumulation

Problematic patterns include:

  • Growing ESTABLISHED count approaching max_connections
  • TIME_WAIT states accumulating faster than cleanup
  • CLOSE_WAIT states indicating application connection leaks

Setting Up Baseline Connection Monitoring

Effective connection pool monitoring requires understanding your normal patterns before setting alert thresholds.

Establishing Normal Connection Thresholds

Start by collecting baseline data during typical operation periods. Create a simple monitoring script that counts PostgreSQL connections by state every minute for a week:

# Count ESTABLISHED connections to PostgreSQL
established_count=$(awk '$4 ~ /:1538$/ && $4 ~ /01$/ {count++} END {print count+0}' /proc/net/tcp)

# Count TIME_WAIT connections
time_wait_count=$(awk '$4 ~ /:1538$/ && $4 ~ /06$/ {count++} END {print count+0}' /proc/net/tcp)

Track these metrics alongside your application load - requests per minute, active user sessions, background job queue depth. Connection patterns correlate with application activity, not just database performance.

During baseline collection, note:

  • Peak connection count during busy periods
  • How quickly connections return to baseline after traffic spikes
  • Whether TIME_WAIT accumulation correlates with connection pool resets
  • Daily and weekly patterns in connection usage

Creating Alert Boundaries

Set your first alert threshold at 70% of max_connections. If PostgreSQL is configured for 100 connections, alert when ESTABLISHED connections reach 70. This provides 2-3 minutes of warning before pool exhaustion.

Create a second alert at 85% for escalation. At this level, start investigating connection leaks and consider increasing pool size temporarily.

For TIMEWAIT monitoring, alert when the count exceeds twice your normal ESTABLISHED baseline. High TIMEWAIT accumulation indicates connection churn that can exhaust pools even with normal application load.

Document these thresholds with your team and include the reasoning. Alert boundaries that make sense during setup need context when someone investigates at 3 AM.

Practical Socket State Analysis Examples

Real-world connection pool problems create distinctive patterns in socket state data.

TIME_WAIT State Accumulation Patterns

A poorly configured connection pool might establish new connections for each query instead of reusing existing ones. This creates a characteristic pattern:

  • ESTABLISHED connections stay low (5-10 connections)
  • TIME_WAIT connections accumulate rapidly (50+ connections)
  • Database performance appears normal
  • Application response times increase gradually

This pattern indicates connection pool configuration issues, not database problems. The solution involves fixing pool settings, not database tuning.

ESTABLISHED Connection Growth Trends

Connection leaks show different patterns. ESTABLISHED connections grow steadily without corresponding TIME_WAIT cleanup:

  • ESTABLISHED count increases throughout the day
  • TIME_WAIT count remains normal
  • Memory usage in application servers grows slowly
  • Database shows increasing connection count but normal query performance

This pattern suggests application code that opens connections without proper cleanup. Look for missing connection.close() calls or exception handling that bypasses connection cleanup.

Building Your Early Warning System

Effective PostgreSQL connection monitoring requires integration with your broader infrastructure monitoring strategy.

Automated Threshold Monitoring

Create monitoring scripts that check socket states every 60 seconds. Faster polling creates noise; slower polling misses brief spikes that indicate developing problems.

Integrate with your existing alert infrastructure. If you're using Server Scout's monitoring capabilities, the lightweight agent can track these patterns alongside CPU, memory, and disk metrics without additional resource overhead.

Store historical socket state data for trend analysis. Connection patterns change as applications evolve, and alert thresholds need regular review based on actual usage patterns.

Team Notification Workflows

Connection pool alerts require different response workflows than traditional infrastructure alerts. Database connection exhaustion often indicates application problems, not database issues.

Create alert runbooks that guide investigation:

  1. Check current connection count in PostgreSQL (SELECT count(*) FROM pgstatactivity)
  2. Verify socket state patterns match historical baselines
  3. Review application logs for connection errors or unusual query patterns
  4. Check connection pool configuration in application servers

For teams managing multiple applications, organize servers into logical groups that reflect application architecture. PostgreSQL connection problems often affect multiple servers in the same application cluster.

Connection pool monitoring works best when integrated with comprehensive infrastructure visibility. Understanding server metrics holistically helps teams distinguish between database performance issues and connection management problems.

For detailed PostgreSQL monitoring implementation, including query-level analysis integration, see our complete PostgreSQL connection pool monitoring guide. The knowledge base covers advanced topics like cross-server connection correlation and integration with database-specific monitoring tools.

FAQ

How often should I check socket states for PostgreSQL monitoring?

Check every 60 seconds for production monitoring. More frequent checks create unnecessary load, while longer intervals miss brief connection spikes that predict pool exhaustion.

What's the difference between monitoring /proc/net/tcp and using PostgreSQL's built-in connection tracking?

/proc/net/tcp shows system-level socket states before database connection limits are reached, providing 2-3 minutes of early warning. PostgreSQL's pgstatactivity only shows active database connections after they're established.

Should connection pool alerts escalate differently than other infrastructure alerts?

Yes. Connection pool exhaustion often indicates application configuration or code issues rather than database hardware problems. Route these alerts to application teams alongside infrastructure teams for faster resolution.

Ready to Try Server Scout?

Start monitoring your servers and infrastructure in under 60 seconds. Free for 3 months.

Start Free Trial