DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Explaining Simple WSO2 Identity Server Kubernetes Deployment
  • The Production-Ready Kubernetes Service Checklist
  • Optimizing Prometheus Queries With PromQL
  • Demystifying Kubernetes in 5 Minutes

Trending

  • Secrets Sprawl and AI: Why Your Non-Human Identities Need Attention Before You Deploy That LLM
  • Exploring Intercooler.js: Simplify AJAX With HTML Attributes
  • How To Introduce a New API Quickly Using Quarkus and ChatGPT
  • Docker Base Images Demystified: A Practical Guide
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Kubernetes — Replication, and Self-Healing

Kubernetes — Replication, and Self-Healing

The benefit of using replication for your microservices and how the Kubernetes cluster can automatically recover from a service failure.

By 
Sudip Sengupta user avatar
Sudip Sengupta
DZone Core CORE ·
Oct. 15, 20 · Analysis
Likes (14)
Comment
Save
Tweet
Share
6.7K Views

Join the DZone community and get the full member experience.

Join For Free

To start with, I would like to explain what does self-healing means in terms of Kubernetes. Self-healing is a fantastic feature of Kubernetes to recover from service or node failure automatically. In the following article, we will consider the benefit of using replication for your microservices and how the Kubernetes cluster can automatically recover from a service failure.

Prerequisite

One of the great features of Kubernetes is the ability to replicate pods and their underlying containers across the cluster. So, before we set up our self-healing feature please make sure you have managed replication, and here is a simple example of a deployment file that will deploy nginx container with replication factor 3:

Shell
 




x
21


 
1
apiVersion: apps/v1
2
kind: Deployment-example
3
metadata:
4
  name: nginx-deployment-example
5
  labels:
6
    app: nginx
7
spec:
8
  replicas: 3
9
  selector:
10
    matchLabels:
11
      app: nginx
12
  template:
13
    metadata:
14
      labels:
15
        app: nginx
16
    spec:
17
      containers:
18
      - name: nginx
19
        image: nginx:1.15.4
20
        ports:
21
        - containerPort: 80



All right, so let's create the deployment:

Shell
 




xxxxxxxxxx
1


 
1
kubectl create -f deployment.yaml



Now, let's check whether our nginx-deployment-example was created:

Shell
 




xxxxxxxxxx
1


 
1
kubectl get deployments -n default



You should see your nginx-deployment-example deployment in the default namespace. If we want to see more details about those pods, please run the following command:

Shell
 




xxxxxxxxxx
1


 
1
kubectl get pods -n default



We will see our 3 nginx-deployment-example pods:

Shell
 




xxxxxxxxxx
1


 
1
NAME                                     READY   STATUS    RESTARTS   AGE
2
nginx-deployment-example-f4cd8584-f494x   1/1     Running   0          94s
3
nginx-deployment-example-f4cd8584-qvkbg   1/1     Running   0          94s
4
nginx-deployment-example-f4cd8584-z2bzb   1/1     Running   0          94s



Self-Healing

Kubernetes ensures that the actual state of the cluster and the desired statue of the cluster are always in-sync. This is made possible through continuous monitoring within the Kubernetes cluster. Whenever the state of a cluster changes from what has been defined, the various components of Kubernetes work to bring it back to its defined state. This automated recovery is often referred to as self-healing.
So, let's copy one of the pods mentioned in the prerequisite and see what happens when we delete it:

Shell
 




xxxxxxxxxx
1


 
1
kubectl delete pod nginx-deployment-example-f4cd8584-f494x



And after a few seconds, we see that our pod was deleted:
pod "nginx-deployment-example-f4cd8584-f494x" deleted
Let's go ahead and list the pods one more time:

Shell
 




x


 
1
kubectl get pods -n default
2
NAME                                     READY   STATUS    RESTARTS   AGE
3
nginx-deployment-example-f4cd8584-qvkbg   1/1     Running   0          109s
4
nginx-deployment-example-f4cd8584-sgfqq   1/1     Running   0          5s
5
nginx-deployment-example-f4cd8584-z2bzb   1/1     Running   0          109s



And we see that the pod nginx-deployment-example-f4cd8584-sgfqq was automatically created to replace our deleted pod nginx-deployment-example-f4cd8584-f494x. And the reason is that nginx deployment is set to have 3 replicas. So, even though one of these was deleted, our Kubernetes cluster works to make sure that the desired state is the actual state that we have.
So, now let's consider the case when there is an actual node failure in your cluster.

First, let's check our nodes:

Shell
 




xxxxxxxxxx
1


 
1
kubectl get nodes



You will see your Master and Worker nodes. Now, let's figure out what server our pods are running on. To do that, we have to describe it:

Shell
 




xxxxxxxxxx
1


 
1
kubectl describe pod nginx-deployment-example-f4cd8584-qvkbg



Under the Events, you can see to which server the pod was assigned. We can also scroll up and under Node will also see where it has been assigned, which is of course will be the same server. Once you've identified all servers that are pods running on, you have to pick one server and simulate a node failure by shutting down the server.

Once the node has been shut down let us head back to the Master and check on the status of the nodes:

Shell
 




xxxxxxxxxx
1


 
1
kubectl get nodes



We see that the cluster knows that one node is down. Let's also list our pods:

Shell
 




xxxxxxxxxx
1


 
1
kubectl get pods -n default



You see that the state of the pod that was running on "failed" node is Unknown, but you see that another pod took its place. Let's go ahead and list the deployments:

Shell
 




xxxxxxxxxx
1


 
1
kubectl get deployments -n default



And as we expected we have 3 pods available which are now in sync with our Desired amount of pods.

Now, let's try to describe the Unknown pod:

Shell
 




xxxxxxxxxx
1


 
1
kubectl describe pod <pod_in_unknown_state>



You see that the Status of Unknown pod is Terminating, Termination Grace Period will be the 30s by default and the reason for this is NodeLost. Also, you will see the messages that specifies that our node which was running our pod is unresponsive.

Now, let's start our "failed" node and wait till it successfully rejoins the cluster.

Alright, once "failed" node is up and running, let's go ahead and check the status of our deployment:

Shell
 




x


 
1
kubectl get deployments -n default



We will see that our pod count is in sync. So, let's list our pods out:

Shell
 




x


 
1
kubectl get pods -n default



We will see that our Kubernetes cluster has finally terminated the old pod, and we are left with our desired count of 3 pods.

Conclusion

As we see, Kubernetes takes a self-healing approach to infrastructure that reduces the criticality of failures, making fire drills less common. Kubernetes heals itself when there is a discrepancy and ensures the cluster always matches the declarative state. In other words, Kubernetes kicks in and fixes a deviation, if detected. For example, if a pod goes down, a new one will be deployed to match the desired state.

Kubernetes pods cluster Replication (computing) shell

Opinions expressed by DZone contributors are their own.

Related

  • Explaining Simple WSO2 Identity Server Kubernetes Deployment
  • The Production-Ready Kubernetes Service Checklist
  • Optimizing Prometheus Queries With PromQL
  • Demystifying Kubernetes in 5 Minutes

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!