Only the Chinese version of this page is provided currently. The English version will be provided soon.

Resource management

Last updated: 2023-09-26 16:29:04

Why should containers set Request/Limit?

During runtime, containers typically consume CPU and memory resources. Without any configuration, the maximum amount of resources a container can use is determined by the allocatable resources of the node it resides on.
A single node typically runs multiple containers. If there is only one container on a node, the idle resources on the node are wasted when the container does not fully consume them. A modern personal computer can usually run hundreds of processes, and similarly, a node typically runs many containers. However, this raises another issue: containers may compete for resources, while the node's resources are fixed.
It is necessary to use Limit to control the maximum resource usage of containers. In the world of Kubernetes, Limit is used to define the maximum amount of resources a container can consume. If a container requests resources exceeding its Limit, its usage will be throttled or even evicted to another node.
Is it sufficient to only set Limit to control the maximum resource usage of containers? Consider this scenario: If a 10-core node runs 100 containers, and each container requires at least 1 core to start and operate normally, none of the containers on the node can function properly. Therefore, Kubernetes uses Request to ensure the minimum resource allocation for containers.
In summary, Kubernetes uses Request and Limit to both ensure and restrict the amount of CPU and memory resources consumed by a container.

What are the units for CPU and memory?

CPU

The default unit for CPU is cores. You can also use decimal values for CPU resources. When you define a container's CPU Request as 0.5, the requested CPU is half of what would be requested for 1.0 core. Additionally, 0.5 is equivalent to 500m, which can be considered as "500 millicpu" and read as "five hundred milli-cores."

Memory

The default memory unit is bytes. You can also use plain integers or integers with a quantity unit suffix to represent memory. For example, the following expressions represent approximately the same value:
128974848、129e6、129M、128974848000m、123Mi
Note
Please pay attention to the case of the suffix. For example, if you request 400m of temporary storage, the actual requested value is 0.4 bytes. Similarly, if you request 400Mi bytes (400Mi) or 400M bytes, the actual requested value is 0.4 bytes.

How should one understand Request/Limit?

In Kubernetes, Request/Limit is implemented using CPU Share and CPU Quota technologies.

CPU Share

Suppose multiple containers are running on a machine, how are resources allocated among them? You need to understand the concept of CPU Shares.
CPU Shares is a feature of Linux Control Groups (cgroup) that controls the amount of CPU time available for processes within a container. CPU time refers to the amount of time the CPU spends processing instructions for computer programs or operating systems, rather than the concept of time in everyday life. For example, when a process enters an interrupt, suspension, or sleep state, CPU time does not increase. However, when the process resumes operation, CPU time continues to increase from the point before the interruption.

Characteristics of CPU Shares

1. CPU Shares is a relative concept, not an absolute one. The CPU Shares of a container are used to schedule CPU time among different containers in a relative manner. The numerical value of CPU Shares alone has no inherent meaning. For example, setting the CPU Shares of container A to 512 does not provide information about how much CPU time the container will receive. If the CPU Shares of another container B are set to 1024, it means that container B will receive twice the CPU time of container A. In other words, it still does not provide information about the actual CPU time each container will receive, only their relative amounts. If A and B run simultaneously on a 3-core device, they will theoretically receive 1 core and 2 cores of CPU time, respectively. If they run on a 6-core device, they will receive 2 cores and 4 cores, respectively. If they run on a 0.3-core device, it becomes 0.1 core and 0.2 core.
2. CPU Shares come into play only when there is resource contention.
CPU Shares values you set will only be used to allocate CPU cores when both containers A and B are expected to run at the same time.
If only one container is running, it can utilize all available CPU resources.
When multiple containers are running concurrently, the allocatable CPU cores on the node are distributed based on the configured CPU Shares values.
For non-running containers, even if CPU Shares are configured, it will not affect the allocation of CPU time for running containers.
3. The purpose of CPU Shares design is to maximize CPU resource utilization.
Regardless of the CPU Shares value, any container has the potential to utilize all CPU resources on a node.
In the event of CPU contention, CPU Shares can be used to determine the amount of CPU time allocated to each container.

CPU Quota

CPU Quota is used to limit the maximum resource usage of a container. Even if a node has remaining resources, the container cannot use more than the specified CPU Quota value.

CPU Share in Kubernetes

In Kubernetes, Request/Limit is implemented through CPU Share and CPU Quota. However, Request has additional implications:
1. Request is an absolute value, not a relative one, ensuring the minimum available resources for a container.
2. Request is used by the scheduler to make decisions, finding the optimal node with available resources greater than the current Pod's Request among the cluster nodes, and scheduling the current Pod on that node.
3. During resource contention, the concept of relative CPU Share values is utilized to allocate CPU resources.

Pod without Request

Pods without a Request can be scheduled to any node, as the remaining allocatable resources on any node meet the Pod's requirements. However, in the event of resource contention, the Pod will not receive any resources, potentially leading to an indefinite lack of resources for the Pod.

Practical Application

Create an interactive busybox Pod in the terminal:
kubectl run -i --tty --rm busybox \
--image=busybox \
--restart=Never \
--requests='cpu=50m,memory=50Mi' -- sh
Use the following command in the interactive terminal to allow the Pod to fully utilize the available CPU and memory on the current node:
while true; do true; done
dd if=/dev/zero of=/dev/shm/fill bs=1k count=1024k
In another terminal, view the current resource usage of the Pod.
kubectl top pods
NAME CPU(cores) MEMORY(bytes)
busybox 460m 65Mi
It can be observed that the current busybox Pod's CPU and memory usage are both greater than the Request. However, theoretically, since busybox runs an infinite loop program, it should consume all available CPU resources. Why does it only consume 460m? This is because there are other Pods and processes in the cluster, and they compete for CPU resources through CPU Share.

Summary

Kubernetes uses Request to guarantee the minimum resource consumption for containers.
Kubernetes uses Limit to restrict the maximum resource consumption of containers.
When entering values, be aware of the default units: CPU default unit is cores; memory default unit is bytes.
In the event of resource contention, Kubernetes allocates resources based on the proportion of Request values specified by different containers.