Please Don’t Evict My Pod; Eviction Policy

DZone 's Guide to

Please Don’t Evict My Pod; Eviction Policy

In this post, we are going to cover node resources, what causes eviction, what are soft and hard threshold signals. Let’s start with node resources.

· Cloud Zone ·
Free Resource

Let’s first quickly revisit the understanding from the last post; pod QoS class determines which pod will remove first from a node in-case of eviction by Kubelet. There are three types of QoS classes, i.e. Best Effort, Burstable, and Guaranteed and among them Best Effort pods will evict first, following Burstable pods second and at last the Guaranteed one.

In this post, we are going to cover node resources, what causes eviction, what are soft and hard threshold signals. Let’s start with node resources.

Node Resources

Node resources mean the capacity of a node(number of CPUs and amount of memory, available disk space for data persistence). We can quickly figure out that node resources are finite. Node capacity is a part of the NodeStatus Object, and reports it's capacity at the time of node registration into the K8s cluster. There are two categories of node resources

  • Shareable: resources used by multiple processes are shareable ones like CPU or network bandwidth.
  • Non-Shareable: these are the incompressible compute resources such as memory or disk and scarcity of them leads to node instability.

The scarcity of the shareable resources leads to the throttling of the process; however, over-usage of non-shareable resources results in the execution of some maintenance programs like OOM Killer.

System maintenance programs are compute-intensive and can stall the node for the time-being which results in the non-availability of the node in a cluster. Instead of such behavior, Kubelet takes the pro-active approach by monitoring the resources of the node and evict some pods in-case of starvation. For observing a resource usage per-node basis, the following commands can be helpful.



#for capacity, allocatable capacity, and current usage of nodes. 
Kubectl describe node <node_name> 
#for knowing current usage of all the nodes in a cluster.
kubectl top nodes

Kubernetes Eviction Policy

Kubelet proactively monitors the node compute resources and in-case of starvation or exhaustion of the node compute resources, Kubelet will try to reclaim the resources by evicting(failing) the pods; it terminates all of its containers and transitions its PodPhase to Failed. Let's see the implementation of monitoring and eviction in Kubelet.

Eviction Signals

Kubelet process configures the threshold strategy for the resources and breach of the threshold point triggers the eviction of the pod from the node. Currently, the table below shows the eviction signals:

Eviction Signal

Signals are either literals or percentage-based. The percentage value calculation depends on the total capacity of the node. As per the official Kubernetes documentationKubelet supports only two filesystems:

  • The nodefs filesystem that Kubelet uses for volumes, daemon logs, etc.
  • The imagefs filesystem that container runtimes use for storing images and container writable layers. It is optional, and Kubelet auto-discovers these filesystems using cAdvisor.

Kubelet does not care about any other filesystems.

Kubelet configures two categories of eviction signals, soft signals, and hard signals.

  • Soft eviction threshold is a combination of two values, i.e. the threshold limit of the configuration and the administrator specified grace period. The grace-period argument is mandatory for the Kubelet process. Also, in-case soft eviction threshold coincided, we can define the pod termination grace period too.
  • Hard eviction threshold is similar to a soft eviction, but without any grace period and if it reached Kubelet will immediately evict the pod from the node with no graceful termination.

The kubelet has the following default hard eviction threshold:

  • memory.available < 100Mi
  • nodefs.available < 10%
  • nodefs.inodesFree < 5%
  • imagefs.available < 15%

Kubelet evaluates the eviction threshold every ten seconds by default or based on the value of the argument housekeeping-interval. Kubelet maps all the soft and hard eviction threshold signals to the condition of the node.

Summing Up All Together

If Kubelet state remains in the starvation of the resources, it will start reclaiming the resources by following the eviction of the end-user pods.

The Kubelet ranks pods for eviction first by whether or not their usage of the starved resource exceeds requests, then by priority, and then by the consumption of the starved compute resource relative to the pods scheduling requests.

I am resting this post here, however still several important details related to eviction policy are left behind. This post will set an enough of context related to pod eviction, If I feel something important to add, I will edit this post or will try to write a FAQ post related to this.

In coming next post, I am planning to write a Priority class and pod disruption budget and several other important topics related to pod eviction. As usual, I am looking for appreciation, questions, and comments related to the post. Thanks :)

cloud ,contaienrs ,distributed systems ,docker ,kubectl ,kubelet ,kubernetes ,node

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}