Table of Contents
< All Topics
Print

HPA workload has reached the threshold but scaling is not triggered: kube-controller-manager calculation has a tolerance of 0.1, and scaling will not be triggered within the tolerance range

Problem Description

The HPA threshold is set to n, but the current workload usage associated with the HPA has already reached 101% * n and has not triggered an immediate scale-up.

Alert Information

None

Effective Troubleshooting Steps

Use the kubectl top command to view the current resource usage of the workload.

Check whether the ratio [current workload usage / expected workload usage (request * threshold percentage)] falls within the 0.9 to 1.1 range. Within this range, no scaling operations will be triggered.

Root Cause

The kube-controller-manager, when calculating HPA, has a default tolerance (which cannot be 0) to prevent frequent scaling operations due to system jitter.

image

The official Kubernetes documentation states the following:

At the most basic level, the Pod Horizontal Auto-scaler controller calculates the scaling ratio based on the current metrics and desired metrics.

Desired replicas = ceil[current replicas * (current metrics / desired metrics)]

For example, if the current metrics value is 200m and the desired value is 100m, the replica count will double, since 200.0 / 100.0 == 2.0. If the current value is 50m, the replica count will be halved, since 50.0 / 100.0 == 0.5. If the ratio is close enough to 1.0 (within a globally configurable tolerance range, defaulting to 0.1), the control plane will skip the scaling operation.

Solution

The behavior of not triggering a scaling operation immediately within the tolerance range is the expected effect and is part of the native mechanism. There is no need to worry about this.

Scope of Operation Impact

Not applicable

Is This a Temporary Solution

Not applicable

Recommendations and Summary

The tolerance concept in HPA helps mitigate the oscillation issues caused by metric fluctuations. However, the existence of tolerance also requires attention from operators.

Troubleshooting Content

NA

Original Link

https://support.sangfor.com.cn/cases/list?product_id=37&type=1&category_id=28936&isOpen=true