🐳

Container Memory Limits That Don't Match Reality: Debugging cgroups v2 Resource Reporting in Production

· Server Scout

Last week, one of our hosting clients reported their web application was killing containers with OOM errors, but docker stats showed memory usage at only 60% of the configured limit. The host monitoring, however, painted a different picture entirely.

This disconnect between container-reported memory usage and actual host consumption has become more pronounced with the shift to cgroups v2, and it's breaking alert thresholds across production environments.

The Memory Mystery: When Production Alerts Don't Match Docker Stats

The client was running a WordPress application stack with Redis, MySQL, and PHP-FPM containers, each limited to 512MB. Docker stats consistently showed:

CONTAINER           MEM USAGE / LIMIT     MEM %
wordpress-php       310MB / 512MB         60.5%
wordpress-redis     156MB / 512MB         30.4%
wordpress-mysql     445MB / 512MB         86.9%

Meanwhile, the host was reporting memory pressure, and containers were being killed seemingly at random. The alert from our monitoring dashboard was firing for host memory usage at 95%, but the container metrics suggested plenty of headroom.

Diagnosing this required understanding exactly how Docker measures memory usage versus what the Linux kernel actually allocates.

Understanding Container Memory Accounting Fundamentals

The fundamental issue lies in how memory accounting differs between the container runtime's perspective and the kernel's actual memory management.

How Docker Reports Memory Usage

Docker stats pulls its memory figures from cgroup statistics, specifically combining cache and RSS (Resident Set Size). This gives you active memory usage but ignores several critical components:

# What docker stats actually reads
cat /sys/fs/cgroup/memory/docker/[container_id]/memory.usage_in_bytes

This figure excludes shared libraries that may be loaded multiple times, kernel buffers allocated on behalf of the container, and memory mapped files that aren't currently resident.

The Host's Perspective: What the Kernel Actually Sees

From the kernel's viewpoint, each container process consumes memory through multiple mechanisms that don't appear in the simplified Docker metrics:

  • Memory mapped files and shared libraries
  • Kernel buffers for network and filesystem operations
  • Anonymous memory pages that haven't been written to yet
  • Swap space that's been allocated but not necessarily used

The kernel tracks total memory commitment per cgroup, which often exceeds what container tools report as "usage".

Cgroups v1 vs v2: Why Your Debugging Commands Changed

The migration to cgroups v2 changed not just the file paths but the fundamental memory accounting model. Many debugging techniques that worked reliably under cgroups v1 now provide incomplete or misleading information.

Under cgroups v2, memory pressure is calculated differently, and the relationship between memory.current and memory.max doesn't match the old memory.usageinbytes and memory.limitinbytes behaviour.

# cgroups v2 memory inspection
cat /sys/fs/cgroup/system.slice/docker-[container_id].scope/memory.current
cat /sys/fs/cgroup/system.slice/docker-[container_id].scope/memory.max

This change explains why many organisations found their container memory alerts became unreliable after host OS upgrades.

Real-World Debugging: A Multi-Container Memory Investigation

Returning to our client's problem, we needed to establish ground truth about memory consumption across the entire container stack.

Step 1: Establishing Ground Truth with SystemD and Cgroups

First, we examined the actual memory pressure from the systemd perspective:

sudo systemctl status docker
sudo cat /proc/meminfo | grep -E '(MemTotal|MemFree|MemAvailable)'

This revealed that while individual containers appeared to have headroom, the cumulative memory pressure was pushing the host into swap.

Step 2: Identifying Memory Attribution Problems

The breakthrough came when we examined memory usage at the process level within each container:

sudo docker exec wordpress-php cat /proc/1/status | grep -E '(VmSize|VmRSS|VmData)'

Here we discovered that the PHP-FPM master process was allocating significantly more virtual memory than Docker stats indicated, and shared libraries were being counted multiple times across the container stack.

Step 3: Tracking Down the Memory Leak Sources

By comparing the kernel's memory attribution with Docker's internal accounting, we identified two issues:

  1. PHP-FPM was pre-allocating memory pools that didn't show up in RSS calculations
  2. The Redis container was using memory-mapped files for persistence that weren't included in Docker's usage figures

Adjusting the memory limits based on actual kernel allocation rather than Docker stats resolved the OOM kills.

Building Reliable Memory Monitoring for Container Environments

This investigation highlighted why traditional monitoring approaches fail in containerised environments. Docker stats provides a simplified view that's useful for development but insufficient for production capacity planning.

Effective container memory monitoring requires tracking both the container runtime's perspective and the host kernel's actual allocation. This is exactly the type of discrepancy that proper monitoring architecture needs to account for, providing visibility into both container metrics and underlying host resource consumption.

For organisations running mixed container and traditional workloads, starting with host-level monitoring often reveals resource contention that container-only monitoring misses entirely.

The solution isn't to ignore Docker stats, but to supplement them with host-level memory pressure monitoring and cgroup-native memory accounting. This dual approach catches both container-specific resource issues and system-wide memory pressure that affects the entire host.

Modern container orchestration requires monitoring that understands the gap between what containers report and what kernels actually allocate. The Linux Foundation's cgroup documentation provides the technical foundation for building this understanding into your monitoring strategy.

FAQ

Why does docker stats show different memory usage than the host system?

Docker stats only reports cache + RSS memory usage from cgroups, excluding shared libraries, kernel buffers, memory-mapped files, and pre-allocated memory pools that the kernel still tracks against the container's memory limit.

How does cgroups v2 change container memory monitoring?

Cgroups v2 uses different file paths (/sys/fs/cgroup/system.slice/docker-[id].scope/) and changed memory pressure calculations, making many cgroups v1 debugging commands provide incomplete information about actual memory usage.

What's the most reliable way to monitor container memory usage?

Combine Docker stats with host-level memory pressure monitoring and direct cgroup memory accounting. Track both container runtime metrics and kernel-level memory allocation to catch discrepancies that lead to unexpected OOM kills.

Ready to Try Server Scout?

Start monitoring your servers and infrastructure in under 60 seconds. Free for 3 months.

Start Free Trial