Kubernetes Memory Pressure: cgroups v2 Monitoring Before OOMKilled

Your Kubernetes cluster reports healthy memory usage through kubectl top pods, but containers keep getting OOMKilled with no advance warning. The Kubernetes metrics-server samples resource usage every 60 seconds, missing the rapid memory pressure spikes that kill production workloads.

Direct cgroups v2 filesystem monitoring catches memory pressure 2-3 minutes before pod termination. The Linux kernel maintains real-time memory contention data in /sys/fs/cgroup that standard Kubernetes tooling never exposes.

Understanding cgroups v2 Memory Pressure Signals

Kubernetes pods run inside cgroups that track precise memory usage and pressure. The cgroups v2 Pressure Stall Information (PSI) system provides microsecond-precision data on memory contention.

Locating Pod Memory Control Files

Pod memory control files live under /sys/fs/cgroup/system.slice/containerd.service/. Each pod gets a unique cgroup directory containing memory monitoring files:

find /sys/fs/cgroup -name "memory.pressure" -path "*containerd*" | head -5

The memory.pressure file contains PSI metrics showing memory allocation delays. When applications request memory faster than the kernel can provide it, PSI values spike before OOM conditions trigger.

Interpreting memory.pressure PSI Values

PSI data appears as three time-averaged percentages:

some avg10=23.45 avg60=18.32 avg300=12.67 total=2847293
full avg10=8.21 avg60=5.43 avg300=3.12 total=1234567

The some avg10 value indicates memory pressure over the last 10 seconds. Values above 40.00 signal imminent OOM risk. The full avg10 measurement shows complete memory stalls where all processes wait for memory allocation.

Setting Up Automated Memory.Events Monitoring

The memory.events file tracks OOM kill events and memory reclaim statistics. This counter-based system provides definitive proof of memory pressure progression.

Parsing memory.events Counters

Memory events appear as simple key-value pairs:

low 0, high 2847, max 12, oom 0, oom_kill 3

The oom_kill counter increments 60-180 seconds before pod termination. Monitoring this file provides the earliest possible warning of impending container death.

Creating Threshold-Based Alerting

This bash monitoring script checks memory pressure across all container cgroups:

#!/bin/bash, for cgroup in $(find /sys/fs/cgroup -name memory.pressure -path "containerd"); do, podpath=$(dirname "$cgroup"), pressure=$(awk '/some avg10/ {print $2}' "$cgroup" | cut -d'=' -f2), if (( $(echo "$pressure > 40.0" | bc -l) )); then, echo "HIGH PRESSURE: $podpath - $pressure%", fi, done

Real-Time Analysis Techniques

Continuous memory pressure monitoring requires tracking both PSI values and event counters. Memory contention patterns reveal workload characteristics that predict OOM conditions.

Bash Script for Continuous Monitoring

Real-time memory pressure detection combines PSI monitoring with event counter tracking. The script maintains baseline measurements and alerts on significant deviations.

Memory reclaim frequency analysis shows when containers exhaust available memory faster than the kernel can free it. High reclaim rates precede OOM kills by several minutes.

Correlating Pressure with Workload Patterns

Memory pressure correlates with application behaviour patterns. Java applications show pressure spikes during garbage collection. Database workloads exhibit pressure during large query execution. Web applications demonstrate pressure patterns matching request volume.

The Container Memory Pressure Hidden in cgroups investigation revealed how standard tools miss these patterns entirely.

Advanced Detection Methods

Memory pressure analysis requires distinguishing between cache pressure and working memory exhaustion. Cache pressure indicates healthy memory usage, while working memory pressure signals impending failures.

Memory Reclaim Frequency Analysis

The memory.stat file contains reclaim frequency data. High reclaim rates combined with elevated PSI values indicate terminal memory pressure. Applications that trigger frequent reclaim events consume memory faster than sustainable rates.

Working Memory vs Cache Pressure

Linux distinguishes between file cache and anonymous memory pressure. File cache pressure rarely causes OOM kills. Anonymous memory pressure directly threatens application stability. Monitoring both types separately improves prediction accuracy.

Combining cgroups v2 analysis with traditional monitoring provides comprehensive memory visibility. Server Scout's memory monitoring features complement filesystem-level analysis with historical trending and automated alerts.

Linux kernel documentation at kernel.org details the complete PSI implementation.

Filesystem-based memory monitoring provides the granular visibility that Kubernetes metrics-server cannot match. Direct cgroups v2 analysis detects memory pressure before applications crash, giving operations teams crucial minutes to respond to emerging problems.

FAQ

How often should I check memory.pressure files?

Check every 10-15 seconds for production workloads. The avg10 PSI value updates continuously, but checking more frequently than 10 seconds provides minimal benefit while increasing system overhead.

Can memory.events counters decrease?

No, memory.events counters only increment. They reset to zero when containers restart. Track counter deltas over time to identify new pressure events rather than monitoring absolute values.

Do these techniques work with Docker without Kubernetes?

Yes, Docker containers also use cgroups v2. Look for memory control files under /sys/fs/cgroup/system.slice/docker.service/ instead of the containerd path. The monitoring techniques remain identical.

Monitoring Kubernetes Memory Pressure Through cgroups v2: Direct Filesystem Analysis Before kubectl Commands Report OOMKilled