DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • The Production-Ready Kubernetes Service Checklist
  • Optimizing Prometheus Queries With PromQL
  • Demystifying Kubernetes in 5 Minutes
  • Strengthening Your Kubernetes Cluster With Pod Security Admission

Trending

  • Implementing Observability in Distributed Systems Using OpenTelemetry
  • Implementing Secure API Gateways for Microservices Architecture
  • Every Cache Miss Is a Tiny Tax on Your Performance
  • The Missing `bandit` for AI Agents: How I Built a Static Analyzer for Prompt Injection
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Please Don’t Evict My Pod: Priority and Budget Disruption

Please Don’t Evict My Pod: Priority and Budget Disruption

By 
Abhishek Sharma user avatar
Abhishek Sharma
·
Apr. 16, 20 · Presentation
Likes (3)
Comment
Save
Tweet
Share
8.2K Views

Join the DZone community and get the full member experience.

Join For Free

In this post, we are going to cover the pod priority class, pod disruption budget, and the relationship of these constructs' with pod eviction. Okay, enough of talking, let’s start with pod priority class.

PriorityClass and Preemption

PriorityClass is a stable Kubernetes object from version 1.14, and it is a part of the scheduling group used for defining a mapping between priority class name and the integer value of the priority. PriorityClass is straightforward to understand; the higher the value of the integer, the higher is the priority. Take, for example, a PriorityClass with an integer value of ten and another with an integer value of twenty; the later one holds a higher priority than the first one.

PriorityClass is a non-namespaced object and has one particular optional boolean field named as globalDefault. Among all the PriorityClass objects in a cluster, only one object in a cluster can have this value as globalDefault=true, which means the integer value of this object represents the default priority value of all the pods in a K8s cluster without specific priorityClassName value in pod definition. By default, if there is no PriorityClass object with globalDefault=true value, then default pod priority value is set to zero. 

Later, if we add an object with globalDefault=true value, then all new pods without a specific priorityClassName value have a priority value equals to the integer value of the PriorityClass object; however, the old pod priority remains zero. By default, Kubernetes cluster ships with two PriorityClasses: system-cluster-critical and system-node-critical. system-node-critical is the highest available priority, even higher than system-cluster-critical.

Let’s see how the priority of a pod affects the behaviour of the K8s cluster kube-scheduler and results in the eviction of the other pods from a node. Kube-scheduler tries to schedule a newly created pod on the K8s cluster; however, if the resources required for a pod is not available on any node, PriorityClass preemption logic comes into the picture. Based on the priority of the pod, kube-scheduler determines the node where eviction of low priority pods results in its execution.

The preemption process results in the eviction of the low priority pods from 
a node to schedule high priority pod on a node.

A PriorityClass object has a field named PreemptionPolicy, which defines the behaviour of the object that corresponds to preemption. By default, its values are PreemptionPolicy=PreemptLowerPriority, which will allow pods of that PriorityClass to preempt lower-priority pods. If PreemptionPolicy=Never, pods in that PriorityClass will be non-preempting other pods. Let’s quickly see the example of preempting and non-preempting:

YAML
x
45
 
1
---
2
apiVersion: scheduling.k8s.io/v1
3
kind: PriorityClass
4
metadata:
5
  name: high-priority-preempting
6
value: 1000000
7
preemptionPolicy: PreemptLowerPriority
8
globalDefault: false
9
description: "This priority class will cause other lower priority pods to be preempted."
10
---
11
apiVersion: scheduling.k8s.io/v1
12
kind: PriorityClass
13
metadata:
14
  name: high-priority-nonpreempting
15
value: 1000000
16
preemptionPolicy: Never
17
globalDefault: false
18
description: "This priority class will not cause other pods to be preempted."
19
---
20
---
21
apiVersion: v1
22
kind: Pod
23
metadata:
24
  name: nginx-preempting
25
  labels:
26
    env: test
27
spec:
28
  containers:
29
  - name: nginx-preempting
30
    image: nginx-preempting
31
    imagePullPolicy: IfNotPresent
32
  priorityClassName: high-priority-preempting
33
---
34
apiVersion: v1
35
kind: Pod
36
metadata:
37
  name: nginx-nonpreempting
38
  labels:
39
    env: test
40
spec:
41
  containers:
42
  - name: nginx-nonpreempting
43
    image: nginx-nonpreempting
44
    imagePullPolicy: IfNotPresent
45
  priorityClassName: high-priority-nonpreempting


Hang on with a preemption here, and we will revisit it after formalizing our understanding of the pod disruption budget.

Pod Disruption Budget

PodDisruptionBudget (PDB) is also a Kubernetes object that works at the application level. PDB defines the limits of the number of pods of a replication-set to go down simultaneously. PDB is an indicator of how much disruption an application can handle at a given time. One of the best use-cases of the PDB is to use it with the app, which requires quorum management, for example, zookeeper. Below is the definition of a PDB object, which defines min availability of the pod should be two.

YAML
xxxxxxxxxx
1
 
1
apiVersion: policy/v1beta1
2
kind: PodDisruptionBudget
3
metadata:
4
  name: zk-pdb
5
spec:
6
  minAvailable: 2
7
  selector:
8
    matchLabels:
9
      app: zookeeper


Commands for PDB

Shell
 
xxxxxxxxxx
1
 
1
kubectl get poddisruptionbudgets 
2
kubectl get poddisruptionbudgets zk-pdb -o yaml


PDB of an application is an import aspect which takes into consideration while performing disruption voluntarily in a K8s cluster. It will halt the disruption process to maintain the disruption budget of the app. PDB is very helpful in-case of cluster activities like node drain or balancing the K8s cluster using projects like Descheduler, but is PDB is useful in preemption too?

Preemption respects PDB with best effort, which means the scheduler tries to find the victim for eviction considering the PDB of an application and tries not to violate. Still, if no such option is available, then preemption will happen to dishonor the PDB of an app. For testing the PDB and eviction, you can try a kubectl-evict-pod plugin.

Warning: In a cluster where not all users are trusted, a malicious user could create Pods at the highest possible priorities, causing other pods eviction and pending for scheduling. Also, improper use of PriorityClass may lead to the cascading failure eventually results in production outage, like the following one shared by Grafana community.

How Pod PriorityClass, QoS Class, and Eviction Policy Are Linked

PriorityClass and QoS class of a pod are two independent and unrelated features. There is no specification and rules related to the QoS class of a pod and its priority. Hence, it is possible that for scheduling high priority pod, the node can evict the Guaranteed QoS class pod because of low priority.

The only component that considers both QoS and Pod priority is kubelet out-of-resource eviction. The kubelet ranks Pods for eviction first by whether or not their usage of the starved resource exceeds requests, then by priority, and then by the consumption of the starved compute resource relative to the Pods’ scheduling requests.

Putting it All Together

Pod priority, QoS class, and eviction policy all together create a balancing combination in the K8s cluster. Adding new objects without considering the effects on another will destabilize the cluster state and can lead to catastrophe. In another post, I will share some of the best practices that would help in managing the cluster state better without many evictions. 

pods Kubernetes cluster Object (computer science) PDB (Palm OS)

Opinions expressed by DZone contributors are their own.

Related

  • The Production-Ready Kubernetes Service Checklist
  • Optimizing Prometheus Queries With PromQL
  • Demystifying Kubernetes in 5 Minutes
  • Strengthening Your Kubernetes Cluster With Pod Security Admission

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook