The Bare Metal Bet That Made Our Multiplayer Platform Hum
Switching to a bare metal–based hybrid model cut costs and improved latency. It also eliminated cloud egress pain and avoided vendor lock-in.
Join the DZone community and get the full member experience.
Join For FreeThe cloud may be fast…but it nearly slowed us down.
When we launched Hathora in 2022, we knew the infrastructure behind multiplayer games was long overdue for reinvention. Studios like EA and Blizzard had built their own complex systems to host game servers, but for most multiplayer game studios, that approach was out of reach. Our goal was to eliminate the barrier with a platform-as-a-service built specifically for multiplayer game workloads (low-latency, stateful servers ready to handle millions of connections without the overhead of managing infrastructure).
We launched on AWS, using EKS for Kubernetes orchestration. Adoption came fast. In our first six months, more than a hundred studios signed on. That early success validated our bet on developer demand, but it also revealed a hidden cost structure that threatened the model entirely.
For larger studios in particular, we saw that the cloud’s egress charges were spiraling far beyond compute. Because multiplayer games rely on high-frequency state updates (we’re talking hundreds of updates per second, per connected player), the bandwidth cost of constantly pushing those updates to clients became astronomical. In some cases, egress costs were more than four times higher than compute.
We also realized that much of what public cloud platforms offer wasn’t particularly relevant to our use case. Our workloads were ephemeral, memory-intensive, and didn’t need databases, message queues, or serverless components. Game sessions spin up and tear down constantly, and each instance behaves identically. What we did need was fast, efficient compute close to users, but without all the extras that cloud platforms bake into their pricing. On top of that, the performance characteristics of EKS weren’t where we wanted them to be. For latency-sensitive gaming workloads, we were hitting limits we couldn’t tune past. Between the cost and the performance tradeoffs, it became clear that our infrastructure model had to evolve.
Why We Stepped off the Cloud-Only Path
To keep serving independent developers and win the trust of larger studios, we had to rethink our platform’s foundations. We needed to bring infrastructure costs down dramatically without compromising scale, availability, or player experience. And we needed to offer consistent performance globally, even during unpredictable traffic spikes tied to new releases or live events.
That all led us to an approach many would consider unconventional in 2025: bare metal. By defaulting to dedicated hardware for our base workloads and reserving cloud capacity only for on-demand bursts, we could finally break the cost-performance tradeoff. We started building out this hybrid model, using bare metal servers for baseline compute and reserving cloud capacity for elasticity. But that decision introduced its own complexity. Managing orchestration across bare metal and multiple cloud providers wasn’t something our existing setup could support.
The Orchestration Layer Had to Evolve, Too
EKS had helped us get off the ground, but it wasn’t built for orchestrating containers across a mix of hardware and cloud providers. We needed something more minimal, portable, and vendor-agnostic. Something purpose-built for distributed environments, not just AWS.
That search brought us to Talos Linux, an open source operating system designed specifically for Kubernetes. Talos had a stripped-down, API-driven model with no SSH layer, making it simple to manage while improving our security. It ran just as well on bare metal as it did in virtualized cloud environments, and it was already powering large-scale production clusters.
After a successful proof of concept, we took it a step further and adopted Omni, Sidero Labs’ Kubernetes management platform. Our strategy with Omni gave us unified control across every node, no matter where it lived (bare metal, AWS, or GCP). With it, our small team could operate a distributed, multi-cloud fleet with the confidence of a much larger organization. As we expanded into new regions and brought on new providers, Talos and Omni helped us scale without fracturing our infrastructure model.
We’re now operating infrastructure that manages 30,000+ cores across 14 global regions, spanning two bare metal vendors and multiple public cloud platforms. Everything is orchestrated as one cohesive system.
The Hybrid Model That Transformed Our Platform
With bare metal deployed via Omni now handling 80% of our compute and cloud filling in the gaps during spikes, we’ve achieved a level of efficiency and flexibility that simply wasn’t possible with a cloud-only architecture. For game studios, this has translated into significantly lower costs (especially for those with large or persistent game worlds) and consistently low latency regardless of global traffic patterns.
Talos Linux (connected to the bare metal Omni) gives us the freedom to onboard new edge nodes quickly, no matter the provider. That flexibility makes it easy to expand into new regions or test out new hardware options without locking into a single vendor. We can move fast, stay lean, and keep our infrastructure tuned precisely for the unique demands of multiplayer gaming.
What started as a shift to improve economics has turned into a long-term advantage. The hybrid model didn’t just save us money, but also control, performance, and the confidence to grow globally without compromise.
From Infrastructure Burden to Competitive Advantage
From the beginning, we set out to make it easier for game studios to build multiplayer games without worrying about infrastructure. That vision hasn’t changed. What has changed is how we deliver on it. By really leaning into bare metal, refining our orchestration strategy, and using the cloud where it actually makes sense, we’ve built an infrastructure that matches the real-world needs of our customers. Studios can scale instantly, launch in new regions overnight, and trust that player experience won’t suffer when their games take off.
We don’t believe infrastructure should be a bottleneck, or a budget killer, for studios focused on creativity and gameplay. Thanks to this hybrid foundation with bare metal and no lock-in, it no longer has to be.
Opinions expressed by DZone contributors are their own.
Comments