When Kelsey Hightower first entered the ops world, the “coolest” thing you could do was to deploy a server: Configure it, harden it, get it ready for use, write a bunch of scripts to monitor it in production. And that’s about it.
As Hightower began thinking about the future of ops in a software world increasingly powered by cloud, containers, and other modern technologies, he realized that ops should no longer just be about managing servers. Maybe you love writing Nagios scripts, but Hightower would rather that not be his full-time job. He thinks he — and plenty of other sysadmins and ops folks — can provide a lot more value elsewhere.
“How” to do that is the foundation of Hightower’s recent FutureTalk presentation at New Relic’s Portland, Ore., engineering headquarters: “Kubernetes Abstractions: Building Next Generation Automation Tools.” Hightower is a developer advocate for Google Cloud Platform and an avid proponent of containers and distributed systems, including Kubernetes, Google’s open source container orchestration platform.
Of course, someone still needs to keep the servers up and running. But what if we replaced “someone” with something? Hightower explores how ops can use new platforms and abstractions — including Kubernetes — to build the tools it needs to evolve beyond the server maintenance game. Hightower shares examples of the kinds of tools ops can build with these new abstractions—and how those examples provide patterns for all kinds of other uses.
Building Declarative, Responsive Systems
One of the greatest opportunities these new platforms and abstractions provide, according to Hightower, is reducing the inefficiency and manual effort that comes with necessary but painful operational tasks. A use case in point: Implementing and managing security certificates for your HTTP endpoints. That was a particular headache prior to Let’s Encrypt, and it remains a labor-intensive chore today when done manually. Tracking and remediating expiring certificates alone, for example, can be a bear, especially at scale, and not necessarily the best use of ops’ time.
Managing TLS certificates is a great example: You can do it in a node-specific manner by writing shell scripts and so forth, but that doesn’t mean you should: “Too much work,” Hightower scoffs, especially once you move into environments running thousands of machines. “What we want to do is declare to the system that the certs must be there and anything that needs to use the certs should just declare that they want to use the certs. That way we don’t pin ourselves to an individual machine, and this is critical to building some of these next-generation tools. We have to decouple ourselves from the node. Right now, all of our tools are very node-centric. They assume we’re going to do a deployment to a node. We have to remove that.”
Hightower’s talk walks through building a tool, “kube-cert-manager,” for managing Let’s Encrypt certificates for a Kubernetes cluster. He also shares the code behind the tool via GitHub.
Ops Nirvana: Optimal Resource Utilization
Hightower’s kube-cert-manager establishes a model for other tools that use similar abstractions, such as a watch pattern (for ensuring that the system grabs data only when it’s actually needed for an event to happen) or a control loop (for reconciliation throughout the cluster). He also demos a scheduling tool to help automate another pressing challenge for many ops teams: How do you ensure you’re using your resources efficiently by matching the right workloads with the right machines?
“Just placing things on nodes based on memory and CPU is not going to be enough, especially because every company is different,” Hightower says. This becomes critical as more and more organizations move to the cloud. Given the varying costs per machine on most cloud platforms, resource optimization is crucial for managing the budget. You don’t want to run a small web server, for example, on an expensive GPU in the cluster. That’s simply wasting resources on something that can be used for a “higher-order purpose,” Hightower notes.
“We need to build something a little bit smarter to handle this for us,” he says. And in the video below, you can watch him walk through how to do exactly that with Kubernetes, ensuring that workloads are assigned to the cheapest available machines before moving up to more expensive resources.