Creating an Affordable Kubernetes Cluster
If you're looking to host websites with Kubernetes without breaking the bank, take a look at how this developer did it.
Join the DZone community and get the full member experience.
Join For FreeI have been a Docker/Kubernetes fan for a while. When I started working for DevOn I quickly realized that lots of new technology was available and started to learn Docker, and later I entered the world of Kubernetes and eventually I found myself providing Docker and Kubernetes training.
As a trainer, you always want to be a few steps ahead of your trainees, so I regularly try to challenge myself with new, cool, and twisted ideas to extend my knowledge.
I privately host several websites for myself, my girlfriend, and a few small customers. I still do this on a shared VM. I often tell customers and trainees that you should use immutable infrastructure whenever possible, and my VM definitely is not, so this is not a good example of “practice what you preach.”
Immutable Infrastructure for The Win
My goal was to phase out my VM and move all websites and services to a Kubernetes cluster. Setting up a Kubernetes cluster on the public clouds of the big boys (GCP, Azure, AWS) is easy, but can also become expensive, especially in the context of a private person with limited funding. A Kubernetes cluster generally requires several nodes (VMs running the containers and services) and if you want to expose services on the public internet, you need to provision a cloud load balancer. Monthly costs can quickly rise up toward $40, which I am definitely not going to pay for hosting just a few sites.
Resource(s) | Price |
Kubernetes Nodes (VM) | $10 x 2 |
Cloud Load Balancer | $20 |
Total | $40+ |
Goals
So here started my journey to come up with an affordable Kubernetes cluster. I first set some goals:
- Affordable
- Monthly costs close to my current VM costs (approx. $14)
- Near zero cluster maintenance
- I hate doing server maintenance and I don’t have the discipline to do it
- High stability and availability
- Services should be up and running all the time and in case of problems the system should repair itself whenever possible
Solution
Fast-forward, tried out a lot of stuff, also learned a lot from other blogs (which I will refer to at the end of this blog), the combined solutions that worked for me:
- Setup Kubernetes on Google Cloud, using small preemptible nodes. Preemptible VMs will run 24h at max and will be removed by Google whenever they want. The cluster, however, supports autoscaling and new capacity will be available within one minute. These VMs cost a third of the original price, downtime is limited and if availability is really important, you can consider scaling up our Kubernetes pods to get higher availability.
- Use nginx-ingress, provisioned with ServiceType = ClusterIP and HostNetwork = true. Instead of provisioning an expensive cloud load balancer, each Node itself will act as a public-facing reverse proxy. All you need to do is make sure at least one of the Nodes will have a stable, static public IP address (see next bullet). Create a DNS A record and point it to the node’s IP address. All traffic to the DNS will route to the Node, to the Ingress controller, to the service, to the pod.
- Run Kubeip on the cluster: Google Cloud Kubernetes Nodes cannot be provisioned with static IP addresses. Fortunately, someone created a Kubernetes service called Kubeip that can work around this. Kubeip is triggered whenever a new Node is added to the cluster. It then assigns any available fixed IP address from a pool to that Node, replacing the existing dynamic IP. This way, all static IP addresses keep being assigned to the cluster, effectively the services remain available. Again, you may experience some downtime here, but can have higher availability by extending your cluster with multiple Nodes and multiple fixed IP addresses. Use multiple DNS A records and CNAME to get “DNS Load Balancing” which should fix the problem.
The Results
I’m excited about this solution. It’s been running for more than two weeks now, stable, at least, in my experience so far.
Affordable? Yes!
The preemptible VMs costs half the price of the regular VMs, this saves about $5 per VM. By configuring the nginx-ingress controller as ClusterIP with HostNetwork enabled, we don’t need an external load balancer, saving another $22 or so.
Resource(s) | Regular Price | Preemptible solution |
Kubernetes Nodes (VM) | $11 x 2 | $5 x 2 |
Cloud Load Balancer | $22 | – |
Total | $44 | $10 |
Near Zero Cluster Maintenance? Yes!
In GCP you can choose which Kubernetes version you want to use, and it can even auto-update your Nodes on the spot. Once the cluster is created, the self-organizing, supporting nature of Kubernetes basically means that once the cluster is running, it will maintain itself indefinitely. Probably not entirely true, but for now it feels like low-maintenance.
Of course, I still need to maintain my websites, containers, and dependencies, but that now is the responsibility of “the web developer.” Now I can experiment with new PHP or MySQL versions for one single website in isolation, without disturbing any other website!
There now is a better separation of concerns in respect of application management and infrastructure management.
High Stability and Availability? Probably…
The solution has run for a few weeks now, and the nodes are replaced within 24 hours. The way Kubernetes was designed – desired state configuration – it keeps adjusting the cluster, restarting missing services, replacing defective services, etc. to make sure the desired state is met.
I promised myself to set up some monitoring in the future to measure the downtime if any. Theoretically, I’d expect one or two minutes of cluster downtime per day, as each Node will be replaced.
Improvements
There are still parts that can be improved:
- Graceful Node replacement – I’ve also found Kubernetes charts that resolve the downtime problem by removing Nodes after 12 hours, before Google will, and gracefully shutting it down, allowing the cluster to move running pods from one Node to the other. Once all services/pods have moved, the Node will be removed and the new one will be created.
- Show me the numbers – I still need proof that there’s no downtime, so the best option would be to create some logging/dashboards and measure any failed web requests during the day. (Spoiler alert: I ran into several downtimes a day for different reasons, stay tuned for the next blog!)
Ideas and Feedback …
Please let me know what you think of this solution. I’m definitely not saying that everyone should do it this way and it probably should not be used for the more mature, demanding high available environments, but one might want to consider using this for development or even testing.
Please let me know your idea’s, experiments and what you would and would not do!
Published at DZone with permission of Remko Seelig. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments