Kubernetes v1.36 GA: Pressure Stall Information (PSI) Metrics Now Stable for Production Workloads

By

Breaking: PSI Metrics Graduate to General Availability

Kubernetes v1.36, released today, marks a major milestone for node-level observability: Pressure Stall Information (PSI) metrics have graduated to General Availability (GA). This means operators can now rely on a stable, production-grade interface to detect resource bottlenecks—CPU, memory, and I/O—before they escalate into outages.

Kubernetes v1.36 GA: Pressure Stall Information (PSI) Metrics Now Stable for Production Workloads
Source: kubernetes.io

“PSI gives us the earliest possible warning of resource tension,” said Jane Chen, a contributor to the Kubernetes SIG Node. “Unlike traditional utilization numbers, PSI tells you how long tasks are actually waiting—and that’s the signal that matters in a live cluster.”

Background: Beyond Utilization

First introduced in the Linux kernel in 2018, PSI tracks the time tasks spend stalled due to resource shortages. Traditional metrics like CPU or memory utilization can be misleading: a node at 80% CPU may still cause severe latency for some workloads due to scheduling delays. PSI fills that gap by providing cumulative totals and moving averages over 10s, 60s, and 300s windows.

These moving averages help operators distinguish between transient spikes and sustained pressure, enabling more accurate capacity planning and faster incident response. Until now, Kubernetes lacked a standardized, stable way to expose PSI metrics at the pod and container levels.

What This Means for Operators

With the GA graduation in v1.36, PSI metrics are available through the Kubelet at the node, pod, and container granularity. Operators no longer need to rely on external agents or custom scripts to scrape kernel-level counters. This directly translates into:

“This is a game-changer for cluster resource management,” added Chen. “We now have a first-class, stable metric that aligns with how Linux actually schedules work.”

Proving Stability: Performance Testing at Scale

A common concern with new telemetry features is the resource overhead of collection and serving. To address this, SIG Node conducted rigorous performance validation on high-density workloads (80+ pods) across different machine types. The tests isolated two scenarios:

  1. Kubelet overhead: Compare Kubelet CPU usage with PSI feature enabled versus disabled, while kernel tracking was already active.
  2. Kernel overhead: Compare system-level CPU impact when kernel PSI is turned on versus off, with the Kubelet feature active.

Scenario 1: Kubelet Overhead Is Negligible

On 4-core machines, both clusters had kernel PSI enabled by default. The Kubelet’s CPU usage showed practically identical bursts whether the feature was on or off. The extra cost stayed within 0.1 cores—just 2.5% of node capacity—well within safe production margins.

Scenario 2: Kernel PSI Adds Minimal System Load

When measuring system CPU usage, the PSI-enabled clusters tracked the same pattern as those without, with only a marginal increase from the baseline of 2.5 cores. The act of Kubernetes reading cgroup metrics proved to be a fraction of the overall system cost.

“These numbers confirm that PSI is production-ready,” said Chen. “The overhead is so small it’s lost in the noise of normal Kubelet housekeeping.”

Immediate Availability

Kubernetes v1.36 is now available for download. Operators can enable PSI metrics by ensuring the kernel has psi=1 (default on most modern distributions) and upgrading their clusters to v1.36. No additional feature gate is required.

For detailed migration guides and configuration examples, refer to the official Kubernetes PSI documentation.

Related Articles

Recommended

Discover More

Linux Kernel 7.1: A Deep Dive into New NTFS Driver, Expanded Hardware Support, and Performance TuningLessons from The Mythical Man-Month: Timeless Wisdom for Software DevelopmentHow GitHub Issues Achieved Instant Navigation: A Technical Deep DiveThe Hidden Cost of Cloud Native Integration: Why Your CNCF Stack Fails TogetherApple Reveals 2026 Design Award Finalists Ahead of WWDC Keynote