Linux Container CPU: How to Optimize Real-Time and I/O-Intensive Environments
Highly-threaded I/O intensive Linux containers running on Kubernetes would have all the CPU time they need — can this be a reality?
Join the DZone community and get the full member experience.
Join For FreeIdeally, highly-threaded I/O intensive Linux containers running on Kubernetes would have all the CPU time they need. But just how compatible is that goal with reality? To find the answer – and optimize Linux containers – application developers and DevOps teams must understand how Linux schedules tasks and allocates them CPU time.
The goal behind “real-time” containers is enabling your most important containers – those with mission-critical requirements around time-sensitive performance and reliability – to share the same hardware as non-real-time containers. But before committing to this strategy, it’s important to first determine how practical and achievable this is. In a strategy reliant on real-time containers, under-resourcing those containers could produce application performance shortfalls and hamper security efforts.
Real-time containers with optimal CPU time require several capabilities and tools. Among these must-haves are access to manipulating I/O requests, a real-time container profile, a runtime environment that enables CPU requests, and a cooperative real-time operating system.
These needs can be met by utilizing a real-time scheduler in the host kernel, installing the right container runtime engine version that integrates with the OS kernel scheduler, and then configuring each container with parameters to handle special CPU requests and related requirements.
Unfortunately, there isn’t an officially-supported solution available. Developing one will require host providers, runtime engine providers, and container developers to collaborate on producing a stable and functional system. That has yet to happen. For the time being, host providers give fair warnings of the risks in altering CPU allocation. Docker, as one example, advises:
“CPU scheduling and prioritization are advanced kernel-level features. … Setting these values incorrectly can cause your host system to become unstable or unusable”.
What can you do today? Your best bet is following the current default approach (and it requires a quick history primer). Looking back to the dawn of container technology, containers initially didn’t have any resource constraints. In those salad days, containers could simply use as much as the host’s kernel scheduler allowed them.
This led to all sorts of issues, and containers were frequently shortchanged on CPU resources in the free-for-all. The workaround was introducing a new CPU bandwidth control mechanism into the Linux kernel – the Completely Fair Scheduler (CFS). Integrated into Linux 2.6.23., CFS performs the following duties:
- CFS ensures that the CPU is allocated equitably.
- If the CPU access time provided to different tasks isn’t balanced, CFS gives the shortchanged tasks the time they need to execute.
- CFS keeps track of the balance between tasks by maintaining CPU access times in the virtual runtime. The smaller a task’s virtual runtime, the greater its recognized CPU need.
- CFS uses “sleeper fairness” to make sure even tasks that aren’t currently running will still receive their fair share of CPU when required.
- CFS doesn’t directly use priorities.
CFS also maintains an “RB tree” of time-ordered operations, which looks like this:
Operations on the RB tree happen in O(log n) time, with all runnable tasks sorted by the p->se.yruntime key. CFS continually executes the leftmost task on the RB tree, working through tasks so that each gets a turn and is allocated CPU resources.
Most current Linux container runtime engines are built on a cgroup subsystem, with a CPU scheduler under the OS CFS as a default. Importantly, this means that in the OS scheduler, each cgroup owns its virtual runtime subsystem. The OS scheduler gives a cgroup a turn, during which the cgroup takes up its CPU slices, and then passes its turn to the next virtual runtime.
Because of this, it’s necessary to consider cgroups in CFS not in terms of processor counts, but in terms of time slices. Tuning the CPU cgroup subsystem that controls scheduling can ensure that tasks receive relative minimum resources, and can also enforce hard caps on process tasks to make sure they can’t use more resources than are provisioned.
The Linux scheduler’s handling of the CFS virtual runtime during the CFS scheduling class looks like this:
Unfortunately, there are distinct challenges to using these methods for controlling CPU allocations to containers. Under CFS, it’s not possible to designate higher priority tasks. I/O-intensive tasks require I/O-waits and syscalls. Because they often take short CPU shares into an I/O-wait stage and then yield to other tasks, the CFS tree tends to move these tasks to the right – slowly but surely reducing their priority.
The dynamic balance of the CFS tree doesn’t allow tasks in a cgroup to demand equal CPU usage. It’s important to understand that when tasks in a cgroup necessarily become idle, the cgroup yields its CPU shares to a global pool of leftover CPU time that other cgroups can borrow from. At the same time, it’s inescapable that tasks attached in the CFS queue share CPU resources.
Therefore, creating a complete real-time container under CFS is impossible. It is possible, however, to create a “soft real-time container” that can capture extra CPU and achieve solid results before having its CPU allocation degraded after its deadline.
To match the needs of highly-threaded I/O intensive container applications, developer teams must thoroughly understand how CFS balances the RB tree and how to optimize the chances that key tasks remain in the RB tree’s leftmost nodes. Leveraging the Kubernetes CPU manager — which offers additional POD management and utilizes CFS mechanisms — is also smart.
Keep in mind the specifics of each task have a tremendous impact on which optimization techniques will be most effective. Developer teams must experiment and carefully observe the resulting behaviors to implement soft real-time containers on Linux that will succeed in meeting their application’s needs.
Opinions expressed by DZone contributors are their own.
Comments