Network Error Metrics Explained

Network errors and dropped packets are critical indicators of network health that can signal everything from failing hardware to insufficient kernel buffers. When your server experiences network issues, these metrics often provide the first clues about whether the problem lies in the physical layer, the network interface card (NIC), or system resource constraints.

Server Scout monitors four key network error metrics every 30 seconds as part of its medium-tier collection. These metrics read directly from /proc/net/dev, which provides per-interface statistics maintained by the Linux kernel. Understanding what these numbers mean—and when they should concern you—is essential for maintaining reliable network performance.

The Four Network Error Metrics

Server Scout tracks network errors and drops on your server's primary network interface. All four metrics are cumulative counters, meaning they increase over time. In the dashboard, you'll see them displayed as rates (events per second) calculated from the differences between consecutive readings.

MetricDescriptionSourceNormal Value
net_rx_errorsReceive errors including CRC, frame alignment, and layer-2 failures/proc/net/dev errs column0
net_tx_errorsTransmit errors including carrier, heartbeat, and window errors/proc/net/dev errs column0
net_rx_droppedPackets dropped during receive due to buffer or processing limits/proc/net/dev drop column0
net_tx_droppedPackets dropped during transmit due to queue overflow or driver issues/proc/net/dev drop column0

Why Zero is the Target

In an ideal network environment, all four metrics should remain at zero. Any non-zero values indicate that your network stack is experiencing problems, though the severity and urgency vary depending on the specific metric and rate of increase.

Unlike some server metrics where occasional spikes are normal, network errors and drops represent actual packet loss or corruption. Even small numbers can indicate underlying issues that may worsen over time or cause intermittent application problems.

Understanding Errors vs Drops

The distinction between errors and drops is crucial for diagnosing network problems effectively.

Network errors (net_rx_errors and net_tx_errors) represent fundamental failures in the network communication process:

  • Receive errors occur when the network interface receives malformed packets. This includes CRC (Cyclic Redundancy Check) failures where packet corruption is detected, frame alignment errors where packet boundaries are incorrect, and other layer-2 protocol violations.
  • Transmit errors happen when the NIC cannot successfully send packets. Common causes include carrier sense errors (cable disconnected), heartbeat errors (collision detection failures in older Ethernet), and window errors (timing issues).

Network errors almost always indicate hardware or physical layer problems. They suggest that the fundamental communication channel between your server and the network is compromised.

Dropped packets (net_rx_dropped and net_tx_dropped) represent capacity or resource exhaustion rather than corruption:

  • Receive drops occur when the system cannot process incoming packets fast enough. The packets arrive correctly but are discarded because receive buffers are full or the CPU cannot handle the processing load.
  • Transmit drops happen when outgoing packets cannot be queued for transmission, typically due to output queue overflow.

Drops often indicate that your network throughput is approaching or exceeding your system's processing capacity, rather than pointing to hardware failures.

Common Causes and Diagnosis

Physical Layer Problems

Network errors, particularly receive errors, frequently stem from physical connectivity issues:

Cable problems are surprisingly common. Damaged Ethernet cables, loose connections, or cables that exceed maximum length specifications can cause intermittent packet corruption. Even cables that appear to work may generate errors under high load or specific environmental conditions.

Switch port issues can manifest as negotiation failures between your server's NIC and the switch port. Auto-negotiation problems, duplex mismatches, or faulty switch ports often produce patterns of both receive and transmit errors.

Network interface card failures may start subtly with occasional errors before progressing to complete failure. Overheating, driver bugs, or hardware degradation can cause error rates to increase over time.

Buffer and Capacity Issues

Dropped packets typically indicate resource constraints rather than hardware problems:

Insufficient receive buffers are a common cause of net_rx_dropped increases. The Linux kernel maintains receive buffers controlled by parameters like net.core.rmem_max and net.core.rmem_default. When high-throughput network traffic exceeds these buffer sizes, packets get dropped.

CPU processing limitations can cause receive drops when the system cannot process incoming packets quickly enough. This is particularly common on high-bandwidth connections or systems with limited CPU resources.

NIC ring buffer exhaustion occurs when the network interface's hardware buffer fills up before the kernel can process packets. Modern NICs have configurable ring buffer sizes that may need tuning for high-throughput applications.

Correlating with Other Metrics

Network error metrics become much more meaningful when analysed alongside other Server Scout metrics.

High Throughput Scenarios

When you observe high net_rx_dropped values, check the corresponding net_rx_bytes metric:

  • High drops + high throughput suggests your receive buffers are too small for the traffic volume. This is often fixable through kernel parameter tuning.
  • High drops + normal throughput may indicate CPU processing bottlenecks. Check cpu_percent and cpu_system to see if the system is struggling with interrupt handling.

Error Patterns Without High Load

If you see net_rx_errors or net_tx_errors increasing when net_rx_bytes and net_tx_bytes are at normal levels, this strongly suggests physical layer problems rather than capacity issues. The network isn't particularly busy, but the packets being transmitted are getting corrupted or failing to transmit properly.

Memory Pressure Correlation

Sometimes network drops correlate with memory pressure. Check mem_percent and mem_available_mb when investigating dropped packets, as insufficient system memory can impact network buffer allocation.

When Drops Might Be Acceptable

While the goal is zero network errors and drops, reality is sometimes more nuanced.

Brief traffic bursts in high-throughput environments might cause temporary receive drops that don't impact application performance. If your applications use TCP (which handles retransmission automatically), occasional drops during peak traffic may be tolerable.

Acceptable drop scenarios include:

  • Infrequent, brief spikes during known high-traffic periods
  • Applications designed to handle packet loss gracefully
  • Non-critical traffic where occasional retransmission is acceptable

Unacceptable drop patterns include:

  • Sustained dropping over extended periods
  • Drops occurring during normal traffic levels
  • Any drops affecting critical real-time applications

Monitoring and Response Strategy

Effective network error monitoring requires both immediate alerting and trend analysis.

Immediate attention is warranted when:

  • Any error metric shows sustained non-zero values
  • Drop rates exceed your application's tolerance thresholds
  • Errors or drops correlate with application performance problems

Trend monitoring helps identify:

  • Gradual increases in error rates suggesting hardware degradation
  • Patterns correlating with specific times or traffic loads
  • Seasonal or cyclical variations in network performance

Server Scout's dashboard provides multiple time ranges for this analysis. Use the 1-hour view for immediate troubleshooting, the 24-hour view for daily pattern analysis, and the 7-day view for identifying longer-term trends.

The 5-second collection interval in Server Scout's medium tier ensures you can catch even brief network issues that might be missed by monitoring systems with longer collection intervals.

Next Steps

When network error metrics indicate problems, systematic troubleshooting helps identify root causes quickly. Start with physical layer verification—check cables, connections, and switch port status. Then examine system resources and kernel network parameters if drops are the primary issue.

Remember that network errors rarely resolve themselves. Unlike temporary CPU spikes or memory pressure that may be application-related, network errors typically indicate hardware or configuration issues that require active intervention. Early detection and response prevent minor network issues from becoming major outages.

Back to Complete Reference Index

Frequently Asked Questions

What do network errors indicate on a Linux server?

Network errors (net_rx_errors, net_tx_errors) indicate packets that were corrupted or malformed during transmission. Any non-zero error count deserves investigation. Common causes include faulty network cables, failing network interface cards, driver bugs, or duplex/speed mismatches between the server and switch. Even low error rates can indicate hardware degradation that will worsen over time.

What is the difference between network errors and dropped packets?

Errors (net_rx_errors, net_tx_errors) indicate corrupted or malformed packets, typically caused by hardware issues. Dropped packets (net_rx_dropped, net_tx_dropped) indicate packets discarded by the kernel, usually due to full receive buffers, traffic shaping, or firewall rules. Errors point to physical layer problems; drops point to capacity or configuration issues.

Should network error counts always be zero?

Ideally, yes. Network error and drop counters should remain at zero on a healthy system. Even a small steady increase indicates a problem that should be investigated. Since these are cumulative counters, a static non-zero value from a past event is not concerning. Watch for the rate of increase rather than the absolute value. The dashboard shows these as rates to make this easier.

How do I troubleshoot high dropped packets?

For RX drops, check if the network interface receive buffer is full using ethtool, and increase ring buffer sizes if needed. Check for firewall rules dropping traffic with iptables counters. For TX drops, investigate network congestion, traffic shaping rules, or link speed mismatches. The Linux kernel drops packets when it cannot process them fast enough, so the root cause is usually a capacity issue.

Was this article helpful?