Kubernetes: Lifecycle of a Pod
Looking at the lifecycle of a Kubernetes pod can help understand what's going on with the smallest unit of work that can be scheduled in the popular container software.
Join the DZone community and get the full member experience.Join For Free
Recently, I spoke about “Lifecycle of a Pod” at the Pune Kubernetes Meetup. This post is almost on the lines of the talk.
A Pod is the smallest unit of work which can be scheduled in Kubernetes. A Pod encapsulates an application container(s), storage resources, unique network IP and options that govern how a container should run. Ideally, pods are not directly deployed on a cluster and higher level abstractions are used instead. Applications are generally deployed via higher level constructs such as Deployments, Replica Sets, Daemon Sets, Stateful Sets, or Jobs. Interaction with Pods is generally used to troubleshoot issues, hence understanding of Pods is important.
States of a Pod
Through its lifecycle, a Pod can attain following states:
Pending: The pod is accepted by the Kubernetes system but its container(s) is/are not created yet.
Running: The pod is scheduled on a node and all its containers are created and at-least one container is in Running state.
Succeeded: All container(s) in the Pod have exited with status 0 and will not be restarted.
Failed: All container(s) of the Pod have exited and at least one container has returned a non-zero status.
CrashLoopBackoff: The container fails to start and is tried again and again.
Birth of a Pod
Now let’s look at the events that lead to the creation of a Pod.
Image courtesy: Joe Beda’s Blog.
- kubectl or any other API client submits the Pod spec to the API server.
- The API server writes the Pod object to the etcd data store. Once the write is successful, an acknowledgment is sent back to API server and to the client.
- The API server now reflects the change in state of etcd.
- All Kubernetes components use watches to keep checking API server for relevant changes.
- In this case, the kube-scheduler (via its watcher) sees that a new Pod object is created on API server but is not bound to any node.
- kube-scheduler assigns a node to the pod and updates the API server.
- This change is then propagated to the etcd data store. The API server also reflects this node assignment on its Pod object.
- Kubelet on every node also runs watchers who keep watching API server. Kubelet on the destination node sees that a new Pod is assigned to it.
- Kubelet starts the pod on its node by calling Docker and updates the container state back to the API server.
- The API server persists the pod state into etcd.
- Once etcd sends the acknowledgment of a successful write, the API server sends an acknowledgment back to kubelet indicating that the event is accepted.
Activities During a Pod’s Life
Init containers are containers which are run before the main application container gets started. They have two important characteristics:
- They always run to completion.
- Each init container must complete before the next one is started.
Init containers can be useful when some initial actions need to be run before the main container in the pod starts.
For example: copying config files and updating config values. Init containers use different Linux namespaces, so they have a different filesystem view so they can be given access to secrets which may not be desirable for sharing within the app container.
kubelet can run code triggered by Container Lifecycle Hooks. This allows the user to run specific code during specific events of a containers lifecycle.
For example: running a graceful shutdown script before a container is terminated.
There are two hooks which are exposed:
PostStart : This hook gets executed upon container creation but there is no guarantee that it will run after the container ENTRYPOINT.
PreStop : This hook gets executed just before a container is terminated. This is a blocking call which means the hook execution must complete before the call to delete a container can be sent.
Both hooks mentioned above do not take any parameters. There are two types of handlers which can be implemented in the hook implementation:
Exec : runs a specific command inside the container and the resources consumed by the command are counted against the container.
HTTP : executes an HTTP request against a specific endpoint on the container.
Apart from lifecycle hooks, another important thing which happens during a pods lifetime is the execution of container probes.
Container probes are diagnostics performed by kubelet on the container. There are two kinds of probes which kubelet can run on running containers:
livenessProbe : Indicates whether the container is running. If the liveness probe fails, kubelet kills the container and the container is subjected to its Restart Policy.
readinessProbe : Indicates whether the container is ready to service requests. If this probe fails, the endpoints controller removes the container IP from list of endpoints of all services that match the Pod.
There are three ways to implement a probe:
ExecAction : Executes a command inside the container. The diagnostic is considered successful if the command returns 0.
TCPSocketAction : Performs a TCP socket check against the container IP and specified port. The diagnostic is considered successful if the port is open.
HTTPGetAction : Runs an HTTP GET action against the container IP with the specified port and path. The diagnostic is considered successful if the response has a status code between 200 and 400.
Termination of a Pod
- The user sends a command to delete a Pod.
- The Pod object in the API server is updated with the time beyond which the Pod is considered “dead” (default of 30 seconds) along with the grace period.
- THe below actions happen in parallel:
- The pod shows up as “Terminating” when listed in client commands.
- When the Kubelet sees that a Pod has been marked as terminating because the time in 2 has been set, it begins the pod shutdown process.
- The endpoint controller watches the pod is about to be deleted and hence removes the pod from all the endpoints which were serviced by the pod.
- If the pod has defined a preStop hook, it is invoked inside of the pod. If the
preStophook is still running after the grace period expires, step 2 is then invoked with a small (2 second) extended grace period.
- The processes in the Pod are sent the TERM signal.
- When the grace period expires, any processes still running in the Pod are killed with SIGKILL.
- The Kubelet will finish deleting the Pod on the API server by setting grace period 0 (immediate deletion). The Pod disappears from the API and is no longer visible from the client.
The idea of this post came from an excellent post by Kubernetes founder Joe Beda where he explained major components of Kubernetes architecture and the concept of watch, which is crucial in understanding the working of APIServer and etcd functionality, and then the birth of a Pod.
We can see there are multiple ways to control the events that happen within the duration of a Pod’s lifetime. Init container(s) can help remove a lot of complexity related to bootstrapping of containers and thus help keep logic within the main containers simple. Similarly, a post start lifecycle hook can help run any code (such as registering to a monitoring system or a service mesh) which needs to run once the container starts running. Liveness and readiness probes help remove bad pods before they start disrupting any customers. Graceful shutdowns can be run as a pre-stop lifecycle hook allowing for a lot more elegant exit. Knowing the above control mechanisms can help in better designing a pod and the supporting use cases.
Published at DZone with permission of Harshal Shah, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.