Understanding your server's process and system metrics is crucial for maintaining optimal performance and diagnosing issues before they become critical. Server Scout's agent collects several key indicators that reveal how your system is managing processes, context switching, and file descriptors. These metrics work together to paint a comprehensive picture of your server's operational health.
Core Process State Metrics
Linux processes exist in various states at any given moment, and tracking these states helps identify performance bottlenecks and resource contention issues.
Running Processes
The processes_running metric shows the number of processes currently executing on a CPU core. This value comes from the procs_running field in /proc/stat and represents processes in the "R" state that are either actively using CPU time or waiting in the run queue for their turn.
On a lightly loaded server, you'll typically see 1-4 running processes. This baseline includes essential system processes and any active workloads. However, when processes_running consistently exceeds your CPU core count by a factor of two or more, it indicates CPU contention. Your processes are competing for processing time, and some are waiting longer than optimal in the run queue.
For example, on a 4-core server, sustained values above 8 running processes suggest your CPU is becoming a bottleneck. Users may experience slower response times, and batch jobs will take longer to complete.
Blocked Processes
The processes_blocked metric tracks processes waiting for I/O operations to complete. These processes are in an uninterruptible sleep state, typically waiting for disk reads, network responses, or other I/O operations. The metric comes from procs_blocked in /proc/stat.
Under normal circumstances, you should see 0-2 blocked processes. Brief spikes are entirely normal as processes perform routine I/O operations. However, sustained high values indicate I/O bottlenecks somewhere in your system.
Common causes of elevated blocked processes include:
- Slow disk subsystems struggling with heavy read/write operations
- Network file systems (NFS, CIFS) experiencing latency or connectivity issues
- Database servers waiting for disk-bound queries
- Backup operations saturating storage bandwidth
When investigating high blocked process counts, examine your disk I/O metrics (disk_io_read_bytes, disk_io_write_bytes) and network activity to identify the bottleneck source.
Zombie Processes
Zombies, tracked by processes_zombie, are processes that have completed execution but remain in the process table because their parent process hasn't collected their exit status. While zombies don't consume CPU time or memory, they do occupy process ID slots and can indicate application bugs.
A healthy system should show 0 zombie processes most of the time. During normal operation, you might see transient zombies that appear and disappear quickly as processes are created and destroyed. This is perfectly normal.
However, persistent or growing zombie counts indicate a problem with parent processes that aren't properly calling wait() or similar system calls to clean up their children. This typically points to:
- Poorly written applications with improper child process handling
- Parent processes that have crashed or become unresponsive
- Signal handling issues in daemon processes
To investigate zombie accumulation, use ps aux | grep Z to identify zombie processes and their parent PIDs, then examine why the parent isn't performing proper cleanup.
Total Process Count
The processes_total metric provides important context by showing the overall number of processes on your system. This value comes from /proc/loadavg and represents all processes regardless of their current state.
Process counts vary dramatically based on your server's role and applications. A typical baseline might be:
- Minimal Linux server: 50-100 processes
- Web server with PHP-FPM: 200-500 processes
- Database server: 100-200 processes
- Containerised environment: highly variable
Understanding your baseline total is crucial for interpreting other process metrics. A server normally running 500 processes will have different expectations for running and blocked processes compared to one running 50.
System Activity Indicators
Beyond process states, Server Scout monitors system-level activity that reflects how efficiently your kernel is managing resources.
Context Switching
The context_switches metric measures how frequently your CPU switches between processes or threads. This cumulative counter from /proc/stat (the ctxt field) appears as a rate in Server Scout's dashboard, showing switches per second.
Context switching is a fundamental part of multitasking systems, but excessive switching can impact performance. Normal rates vary enormously based on workload characteristics:
| Workload Type | Typical Context Switch Rate |
|---|---|
| Low-activity server | 1,000-5,000/sec |
| Web server | 10,000-50,000/sec |
| Database server | 5,000-25,000/sec |
| High-concurrency applications | 50,000-100,000+/sec |
More important than absolute values are sudden changes in context switch rates. A dramatic increase often indicates:
- Runaway process creation (fork bombs or application bugs)
- Increased application concurrency or user load
- System resource contention forcing more frequent scheduling decisions
- Changes in workload patterns or deployed applications
File Descriptor Usage
The open_fds metric tracks system-wide open file descriptors from /proc/sys/fs/file-nr. In Linux, file descriptors represent not just open files, but also network sockets, pipes, and other I/O resources.
Understanding your file descriptor usage helps prevent resource exhaustion. Most Linux systems have default limits between 65,536 and 1,048,576 file descriptors. You should investigate when usage exceeds 80% of your system's limit.
File descriptor leaks are a common application issue. Symptoms include:
- Steadily growing
open_fdscount without corresponding workload increases - Applications eventually failing with "too many open files" errors
- Network connection failures as socket creation fails
Interpreting Metrics in Context
These process and system metrics work together to provide insights into your server's behaviour. Understanding their relationships helps you diagnose issues more effectively.
Process State Relationships
The relationship between running and blocked processes reveals your system's current bottleneck:
Active processes ≈ processes_running + processes_blocked
When running processes are high but blocked processes remain low, you're experiencing CPU pressure. Conversely, high blocked processes with moderate running processes suggests I/O bottlenecks.
The total process count provides the denominator for these calculations. On a system with 500 total processes, 20 running processes represents 4% CPU activity. On a system with 50 total processes, those same 20 running processes indicate much higher relative activity.
Workload-Specific Baselines
Different server roles have characteristic process patterns:
Web Servers typically show high process counts due to worker processes (Apache prefork, PHP-FPM pools, Nginx worker processes). Context switching rates are often elevated due to request handling concurrency.
Database Servers usually have fewer total processes but may show higher blocked process counts during heavy query loads. Context switching patterns often correlate with transaction rates.
Application Servers running Java or .NET applications might show fewer processes but higher thread activity, reflected in context switching rates.
Troubleshooting Common Issues
Persistent High Running Processes
When processes_running remains consistently high:
- Check CPU utilisation metrics to confirm CPU pressure
- Identify CPU-intensive processes using system tools
- Consider whether increased capacity or workload optimisation is needed
- Look for runaway processes or infinite loops
Growing Zombie Count
For accumulating zombie processes:
- Use
ps aux | grep Zto identify zombie processes - Note the parent process IDs (PPID column)
- Investigate why parent processes aren't cleaning up children
- Consider restarting problematic parent processes after identifying the root cause
File Descriptor Leaks
When open_fds grows steadily:
- Identify processes with high file descriptor usage:
lsof | awk '{print $2}' | sort | uniq -c | sort -nr - Check application logs for file handling errors
- Review application code for proper file/socket cleanup
- Monitor the trend to determine leak severity and timeline
Server Scout's 5-minute collection interval for process metrics provides the right balance between granularity and system impact. The 30-second collection for context switches and file descriptors offers more responsive monitoring for these faster-changing indicators.
Understanding these metrics helps you maintain optimal server performance and quickly identify issues before they impact users. Combined with Server Scout's other monitoring data, these process and system metrics form a comprehensive view of your server's operational health.
Back to Complete Reference IndexFrequently Asked Questions
What are zombie processes and are they dangerous?
What does a high open file descriptor count indicate?
What are context switches and why do they matter?
What does processes_blocked mean?
How many total processes is normal for a Linux server?
Was this article helpful?