Kubernetes in Production: Best Practices to Follow
We all know, Kubernetes is hard! Here are some best practices to follow while using it in production. Following these should ensure more security & efficiency.
Join the DZone community and get the full member experience.Join For Free
No doubt, DevOps has come a long way! Platforms like Docker and Kubernetes have made companies ship their software faster than ever. With the ever-growing usage of containers to build and ship software, Kubernetes has gained colossal popularity among software enterprises as a defacto container orchestration tool.
Kubernetes has excellent features that support scaling, zero-downtime deployments, service discovery, automatic rollout and rollback capabilities, etc. To manage your container deployment at scale, Kubernetes is a must. It enables flexible distribution of resources and workloads. No doubt, Kubernetes in production is a great solution, but it takes some time to set up this tool and be familiar with it. Since many companies want to use Kubernetes in production these days, it is essential to prioritize some best practices. In this article, we will be discussing some Kubernetes best practices in production.
Kubernetes in Production
Kubernetes has a complex & steep learning curve, and it is loaded with feature-rich capabilities. Production operations should be handled with utmost care & priority. If you face a shortage of in-house talent, you can always outsource this to the KaaS providers to take care of all the best practices for you. But suppose you are managing Kubernetes in production by all yourself. In that case, it is very important to pay attention to the best practices and especially around observability, logging, cluster monitoring, and security configurations.
As many of us know, running containers in production is not an easy thing. It requires a lot of effort and computing resources, etc. There are many orchestration platforms in the market, but Kubernetes has gained enormous traction and major cloud providers' support.
All in all - Kubernetes, containerization, and microservices are shiny things, but they introduce security challenges. Kubernetes pods can be quickly spun up across all infrastructure classes, leading to a lot more internal traffic between pods and hence pose a security concern. Also, the attack surface for Kubernetes is usually larger. You must consider that the highly dynamic and ephemeral environment of Kubernetes does not blend well with legacy security tools.
You can read my other article on how hard Kubernetes is.
Gartner predicts that by 2022, more than 75% of global organizations will be running containerized applications in production, up from less than 30% today. By 2025, more than 85% of global organizations will be driving containerized applications in production, which is a significant increase from fewer than 35% in 2019. Cloud-native applications require a high degree of infrastructure automation, DevOps, and specialized operations skills, which are tough to find in enterprise IT organizations.
Developing a Kubernetes strategy that applies best practices across security, monitoring, networking, governance, storage, container life cycle management, and platform selection is a must. Let us see some Kubernetes production best practices.
Running Kubernetes in production is not easy; there are several aspects that should be taken care of.
Do Health Checks with Readiness and Liveness Probes
It can be complicated to manage large and distributed systems, particularly when something goes wrong. To make sure app instances are working, it is crucial to set up Kubernetes health checks.
Creating custom health checks allows you to tailor them to your environment and needs.
Image credits: WeaveWorks
Readiness probes are intended to let Kubernetes know if the app is ready yet to serve the traffic. Kubernetes will always make sure the readiness probe passes before allotting a service to send traffic to the Pod.
How can you know if your app is still alive or dead? Liveliness lets you do that. In case if your app is dead, Kubernetes removes the old Pod and replaces it with a new one.
It is a good practice to specify resource requests and limits for individual containers. Another good practice is to divide Kubernetes environments into separate namespaces for different teams, departments, applications, and clients.
Image credits: Kubecost
Kubernetes resource usage
Kubernetes resource usage points to the amount of resources that are utilized by a container/pod Kubernetes environment in production.
Hence it is very important to keep an eye on the resource usage of pods and containers. One apparent reason for this is cost because more usage translates to more f cost.
Ops teams usually want to optimize and maximize the percentage of resources consumed by pods. Resource usage is one such indicator of how optimized your Kubernetes environment actually is.
You can consider that the optimized Kubernetes environment is one where the average CPU usage of the containers running is optimal.
RBAC stands for Role-Based Access Control. It is an approach used to restrict access and admittance to users and applications on the system/network.
Image credits: Medium
They introduced RBAC from the Kubernetes 1.8 version. RBAC uses rbac.authorization.k8s.io API group to create authorization policies.
RBAC is used for authorization in Kubernetes, with RBAC, you will be able to give access to a user, account, add/remove permissions, set up rules, and much more. So basically, it adds an extra security layer to a Kubernetes cluster. RBAC restricts who can access your production environment and the cluster.
Cluster Provisioning and Load Balancing
Production-grade Kubernetes infra usually needs to create certain critical aspects such as highly available, multi-master, multi-etcd Kubernetes clusters, etc. The provisioning of such clusters typically involves tools such as Terraform or Ansible.
Image credits: AVM Consulting
Once the clusters are all set up, and pods are created for running applications, these pods are equipped with load balancers; these load balancers route traffic to the service. Load balancers are not default with the open-source Kubernetes project; hence it is required to integrate with tools such as NGINX Ingress controller, HAProxy or ELB, or any other tools that enlarge the Ingress plugin in Kubernetes to provide the load-balancing capability.
Attach Labels to Kubernetes Objects
Image credits: TheITHollow
Labels are like key/value pairs attached to objects, such as pods. Labels are meant to be used as identifying attributes of objects that are important and meaningful to users. When using Kubernetes in production, one important thing you can't neglect is labels; labels allow Kubernetes objects to be queried and operated in bulk. The specialty of labels is that they can also be used to identify and organize Kubernetes objects into groups. One of the best use-cases of this is grouping pods based on the application they belong to. Here, teams can build and have any number of labeling conventions.
Set Network Policies
Setting network policies is crucial when it comes to using Kubernetes.
Image credits: TheITHollow
Network policies are nothing but objects that enable you to explicitly state and decide on which traffic is permitted and what's not. This way, Kubernetes will be able to block all other unwanted and non-conforming traffic. Defining and limiting network traffic in our clusters is one of the basic and necessary security measures that is highly recommended.
Each network policy in Kubernetes defines a list of authorized connections as stated above. Whenever any network policy is created, all the pods that it refers to are qualified to make or accept the connections listed. In simple words, a network policy is basically a whitelist of authorized and allowed connections only — a connection, whether it is 'to' or 'from' a pod is permitted only if it is sanctioned by at least one of the network policies that apply to the pod.
Cluster monitoring and logging
Monitoring deployment is critical while working with Kubernetes. It is vital to ensure the configurations, performance, and traffic remains secure. Without logging and monitoring, it is impossible to diagnose issues that happen. To ensure compliance, monitoring, and logging become essential.
Image credits: DZone
When it comes to monitoring, it is necessary to set up the logging capabilities on every layer of the architecture. The logs generated will help us to enable security tooling, audit functionality, and analyze performance.
Start with stateless applications
Running stateless apps is significantly easier than running stateful apps, but this thinking is changing with the ever-increasing growth of Kubernetes Operators. The recommended fact for teams new to Kubernetes is, to begin with, stateless applications.
A stateless backend is advised so that the development teams can make sure that there are no long-running connections that make it more challenging to scale. With stateless, developers can also deploy applications more efficiently with zero downtime.
It is highly believed that stateless applications make it easy to migrate and scale as and when required, according to the business needs.
Enable the use of Auto Scalers
Kubernetes has three auto-scaling abilities for deployments: horizontal pod autoscaler (HPA), vertical pod autoscaler (VPA), and cluster auto-scaling.
Horizontal pod autoscaler automatically scales the quantity of pods in deployment, a replication controller, replica set, or stateful set based on perceived CPU utilization.
Vertical pod autoscaling recommends suitable values to be set for CPU and memory requests and limits, and it can automatically update the values.
Cluster Autoscaler expands and shrinks the size of the pool of worker nodes. It adjusts the size of a Kubernetes cluster depending on the current utilization.
Control the sources of the run times
Have control of the source from where all containers are running in the cluster. If you allow your pods to pull images from public sources, you don't know what's really running in them.
If you pull them from a trusted registry, you can apply policies on the registry to allow pulling only safe and certified images.
Keep evaluating the state of your applications and setup to learn and improve. For example, reviewing a container's historical memory usage can lead to a conclusion that we can allocate less memory and by this saving costs in the long run.
Protect your important services
Using Pod priority you can decide to set the importance of different services running. For example, you'd like to make sure RabbitMQ pods are more important than your application pods for better stability. Or your Ingress controller pods are more important than data processing pods to keep services available to users.
Support Zero Downtime upgrades of the cluster and of your services by running them all in HA. This will also guarantee higher availability for your customers.
Use pod anti affinity to make sure multiple replicas of a pod are scheduled on different nodes to ensure service availability through planned and unplanned outages of cluster nodes.
Use pod disruption budgets to make sure you have the minimal number of replicas up at all cost!
Plan to fail
"Hardware eventually fails. Software eventually works." (Michael Hartung)
As we all know, Kubernetes has become the de facto orchestration platform in the DevOps field. Kubernetes environments need to hold up on the stormy seas of production from an availability, scalability, security, resilience, resource management, and monitoring perspective. Since many companies are using Kubernetes in production, it becomes imperative to follow these above mentioned best practices to smoothly and reliably scale your applications.
Published at DZone with permission of Pavan Belagatti, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.