Introduction
Kubernetes v1.36 introduces a refined approach to memory management with the Memory QoS feature, now offering tiered memory protection based on Pod quality-of-service (QoS) classes. This update separates memory throttling from reservation, giving you finer control over how the kernel treats container memory under pressure. Whether you're running guaranteed workloads that need ironclad protection or burstable ones that can tolerate some reclamation, the new memoryReservationPolicy field lets you opt into different protection schemes. This guide walks you through the complete configuration process, from enabling the feature gate to verifying behavior using cgroup v2 files and monitoring metrics. By the end, you'll know how to prevent system-wide OOM kills while maximizing resource utilization.
What You Need
- A Kubernetes cluster running v1.36 (or later) on each node.
- Nodes that use cgroup v2 (required for Memory QoS). Check by running
grep cgroup2 /proc/filesystemson any node. - Cluster admin access to modify kubelet configuration (or use a
KubeletConfigurationfile). - Basic familiarity with Pod QoS classes: Guaranteed, Burstable, BestEffort.
- Optional but helpful: access to the kubelet metrics endpoint (
/metrics) to observekubelet_memory_qos_*metrics.
Step-by-Step Guide
Step 1: Enable the MemoryQoS Feature Gate
Memory QoS is alpha in v1.36, so you must explicitly enable it. Edit your kubelet configuration (or pass a flag) to include:
featureGates:
MemoryQoS: trueThis activates memory.high throttling (default throttle factor 0.9). Note that in v1.36, enabling only the feature gate does not automatically write memory.min or memory.low – those are controlled by the next step.
Step 2: Set memoryReservationPolicy to TieredReservation
To obtain tiered protection, add this field to your kubelet configuration:
memoryReservationPolicy: TieredReservationIf you omit or set it to None, only throttling (via memory.high) applies, and no cgroup reservation files are written. Using TieredReservation tells the kubelet to assign:
- memory.min for Guaranteed Pods (hard protection – the kernel never reclaims this memory).
- memory.low for Burstable Pods (soft protection – reclaimed only under extreme system pressure).
- no reservation for BestEffort Pods (fully reclaimable).
Restart the kubelet after changing configuration to apply the new policy.
Step 3: Understand How QoS Classes Map to Protection
With TieredReservation enabled, the values written to cgroup files depend on each Pod’s memory request (not limit). Here’s the exact behavior:
- Guaranteed Pods (all containers have equal requests and limits): The kubelet writes the total requested memory as
memory.min. Example: A Pod requesting 512 MiB yieldsmemory.min = 536870912. The kernel guarantees this memory; if it can’t, it invokes the OOM killer on other processes. - Burstable Pods (at least one container has a limit lower than its request, or no limit): The kubelet writes the memory request as
memory.low. Under normal memory pressure, the kernel avoids reclaiming these pages, but they can be reclaimed to avoid a system-wide OOM. - BestEffort Pods (no requests or limits): No
memory.minormemory.lowis set. Their memory is always reclaimable.
This is a major improvement over v1.27 behavior, where all QoS classes got memory.min, potentially locking too much memory and causing OOM kills.
Step 4: Verify the Cgroup Settings on a Node
After deploying a Pod with a known memory request (e.g., 512 MiB), SSH into a node and check the cgroup path. For example, a Guaranteed Pod:
cat /sys/fs/cgroup/kubepods.slice/kubepods-pod*guaranteed*/memory.minYou should see the value in bytes (e.g., 536870912). For a Burstable Pod, check memory.low. If you see no file or a value of 0, the pod is BestEffort or the reservation policy is not active.
Step 5: Monitor Observability Metrics
Kubernetes v1.36 exposes two new alpha metrics on the kubelet’s /metrics endpoint:
kubelet_memory_qos_node_memory_min_bytes: Totalmemory.minreserved across all Guaranteed Pods on the node.kubelet_memory_qos_node_memory_low_bytes: Totalmemory.lowreserved across all Burstable Pods.
These metrics help you visualize how much memory is “protected” and adjust resource requests accordingly. You can scrape them with Prometheus or use curl locally: curl http://localhost:10250/metrics | grep kubelet_memory_qos (requires authentication).
Step 6: Check Kernel Version Warning for memory.high
The memory.high cgroup file (used for throttling) has known issues on older kernels. In v1.36, the kubelet will log a warning if the kernel version is below 5.11. To check your kernel version, run uname -r on a node. If you see warnings, consider upgrading your kernel to ensure reliable throttling.
Tips and Best Practices
- Start with throttling only. If you are new to Memory QoS, first enable the feature gate without
TieredReservation. Observe workload performance under memory pressure using the metrics. Then gradually opt into reservation when you have enough headroom to lock memory. - Avoid over-reserving with Guaranteed Pods. Because
memory.minis a hard guarantee, over-requesting memory on Guaranteed Pods can starve system daemons or cause node instability. Use realistic request values. - Monitor node memory pressure. Use
kubelet_memory_qos_node_memory_min_bytesto calculate the total protected memory. If it exceeds 50-60% of total RAM, you may want to convert some Guaranteed Pods to Burstable. - Combine with Pod Priority and Preemption. Tiered memory protection works best when combined with proper priority classes. Give critical Guaranteed Pods the highest priority to ensure they survive OOM scenarios.
- Test in a non-production cluster first. The feature is alpha; verify behavior with your specific workloads and kernel version before rolling out to production.
- Review kernel documentation on memory.min and memory.low. Understanding how the kernel handles these cgroup files will help you set appropriate request values.