Network errors and dropped packets are critical indicators of network health that can signal everything from failing hardware to insufficient kernel buffers. When your server experiences network issues, these metrics often provide the first clues about whether the problem lies in the physical layer, the network interface card (NIC), or system resource constraints.
Server Scout monitors four key network error metrics every 30 seconds as part of its medium-tier collection. These metrics read directly from /proc/net/dev, which provides per-interface statistics maintained by the Linux kernel. Understanding what these numbers mean—and when they should concern you—is essential for maintaining reliable network performance.
The Four Network Error Metrics
Server Scout tracks network errors and drops on your server's primary network interface. All four metrics are cumulative counters, meaning they increase over time. In the dashboard, you'll see them displayed as rates (events per second) calculated from the differences between consecutive readings.
| Metric | Description | Source | Normal Value |
|---|---|---|---|
net_rx_errors | Receive errors including CRC, frame alignment, and layer-2 failures | /proc/net/dev errs column | 0 |
net_tx_errors | Transmit errors including carrier, heartbeat, and window errors | /proc/net/dev errs column | 0 |
net_rx_dropped | Packets dropped during receive due to buffer or processing limits | /proc/net/dev drop column | 0 |
net_tx_dropped | Packets dropped during transmit due to queue overflow or driver issues | /proc/net/dev drop column | 0 |
Why Zero is the Target
In an ideal network environment, all four metrics should remain at zero. Any non-zero values indicate that your network stack is experiencing problems, though the severity and urgency vary depending on the specific metric and rate of increase.
Unlike some server metrics where occasional spikes are normal, network errors and drops represent actual packet loss or corruption. Even small numbers can indicate underlying issues that may worsen over time or cause intermittent application problems.
Understanding Errors vs Drops
The distinction between errors and drops is crucial for diagnosing network problems effectively.
Network errors (net_rx_errors and net_tx_errors) represent fundamental failures in the network communication process:
- Receive errors occur when the network interface receives malformed packets. This includes CRC (Cyclic Redundancy Check) failures where packet corruption is detected, frame alignment errors where packet boundaries are incorrect, and other layer-2 protocol violations.
- Transmit errors happen when the NIC cannot successfully send packets. Common causes include carrier sense errors (cable disconnected), heartbeat errors (collision detection failures in older Ethernet), and window errors (timing issues).
Network errors almost always indicate hardware or physical layer problems. They suggest that the fundamental communication channel between your server and the network is compromised.
Dropped packets (net_rx_dropped and net_tx_dropped) represent capacity or resource exhaustion rather than corruption:
- Receive drops occur when the system cannot process incoming packets fast enough. The packets arrive correctly but are discarded because receive buffers are full or the CPU cannot handle the processing load.
- Transmit drops happen when outgoing packets cannot be queued for transmission, typically due to output queue overflow.
Drops often indicate that your network throughput is approaching or exceeding your system's processing capacity, rather than pointing to hardware failures.
Common Causes and Diagnosis
Physical Layer Problems
Network errors, particularly receive errors, frequently stem from physical connectivity issues:
Cable problems are surprisingly common. Damaged Ethernet cables, loose connections, or cables that exceed maximum length specifications can cause intermittent packet corruption. Even cables that appear to work may generate errors under high load or specific environmental conditions.
Switch port issues can manifest as negotiation failures between your server's NIC and the switch port. Auto-negotiation problems, duplex mismatches, or faulty switch ports often produce patterns of both receive and transmit errors.
Network interface card failures may start subtly with occasional errors before progressing to complete failure. Overheating, driver bugs, or hardware degradation can cause error rates to increase over time.
Buffer and Capacity Issues
Dropped packets typically indicate resource constraints rather than hardware problems:
Insufficient receive buffers are a common cause of net_rx_dropped increases. The Linux kernel maintains receive buffers controlled by parameters like net.core.rmem_max and net.core.rmem_default. When high-throughput network traffic exceeds these buffer sizes, packets get dropped.
CPU processing limitations can cause receive drops when the system cannot process incoming packets quickly enough. This is particularly common on high-bandwidth connections or systems with limited CPU resources.
NIC ring buffer exhaustion occurs when the network interface's hardware buffer fills up before the kernel can process packets. Modern NICs have configurable ring buffer sizes that may need tuning for high-throughput applications.
Correlating with Other Metrics
Network error metrics become much more meaningful when analysed alongside other Server Scout metrics.
High Throughput Scenarios
When you observe high net_rx_dropped values, check the corresponding net_rx_bytes metric:
- High drops + high throughput suggests your receive buffers are too small for the traffic volume. This is often fixable through kernel parameter tuning.
- High drops + normal throughput may indicate CPU processing bottlenecks. Check
cpu_percentandcpu_systemto see if the system is struggling with interrupt handling.
Error Patterns Without High Load
If you see net_rx_errors or net_tx_errors increasing when net_rx_bytes and net_tx_bytes are at normal levels, this strongly suggests physical layer problems rather than capacity issues. The network isn't particularly busy, but the packets being transmitted are getting corrupted or failing to transmit properly.
Memory Pressure Correlation
Sometimes network drops correlate with memory pressure. Check mem_percent and mem_available_mb when investigating dropped packets, as insufficient system memory can impact network buffer allocation.
When Drops Might Be Acceptable
While the goal is zero network errors and drops, reality is sometimes more nuanced.
Brief traffic bursts in high-throughput environments might cause temporary receive drops that don't impact application performance. If your applications use TCP (which handles retransmission automatically), occasional drops during peak traffic may be tolerable.
Acceptable drop scenarios include:
- Infrequent, brief spikes during known high-traffic periods
- Applications designed to handle packet loss gracefully
- Non-critical traffic where occasional retransmission is acceptable
Unacceptable drop patterns include:
- Sustained dropping over extended periods
- Drops occurring during normal traffic levels
- Any drops affecting critical real-time applications
Monitoring and Response Strategy
Effective network error monitoring requires both immediate alerting and trend analysis.
Immediate attention is warranted when:
- Any error metric shows sustained non-zero values
- Drop rates exceed your application's tolerance thresholds
- Errors or drops correlate with application performance problems
Trend monitoring helps identify:
- Gradual increases in error rates suggesting hardware degradation
- Patterns correlating with specific times or traffic loads
- Seasonal or cyclical variations in network performance
Server Scout's dashboard provides multiple time ranges for this analysis. Use the 1-hour view for immediate troubleshooting, the 24-hour view for daily pattern analysis, and the 7-day view for identifying longer-term trends.
The 5-second collection interval in Server Scout's medium tier ensures you can catch even brief network issues that might be missed by monitoring systems with longer collection intervals.
Next Steps
When network error metrics indicate problems, systematic troubleshooting helps identify root causes quickly. Start with physical layer verification—check cables, connections, and switch port status. Then examine system resources and kernel network parameters if drops are the primary issue.
Remember that network errors rarely resolve themselves. Unlike temporary CPU spikes or memory pressure that may be application-related, network errors typically indicate hardware or configuration issues that require active intervention. Early detection and response prevent minor network issues from becoming major outages.
Back to Complete Reference IndexFrequently Asked Questions
What do network errors indicate on a Linux server?
What is the difference between network errors and dropped packets?
Should network error counts always be zero?
How do I troubleshoot high dropped packets?
Was this article helpful?