Last week, we announced the arrival of Docker's multi-stage build feature to the image builder. The main benefit? Much smaller images for faster download times.

This week, we're focused on build speed. We can now build your container images on bare metal ARM servers, increasing build speed by up to five times — with the potential for much bigger increases.

As users know, part of our offering is the unusual task of building Docker containers for connected devices on cloud servers. And things get difficult when the device’s architecture is different than the one of the build server. For x86 devices, this is not an issue, as the architectures are the same. But in the growing ecosystem of small and affordable internet-connected devices, ARM architectures dominate. We currently support devices as old as ARM v5 and as modern as the 64-bit ARM v8 architecture. This poses a challenge: what is the most reliable and efficient way to build images for ARM devices when the most accessible cloud resources have an x86 architecture?

Over the last four years, we've been attacking this problem from a number of angles. Besides trying various early “ARM server” offerings that were little better than putting a single-board computer in a datacenter, the most reliable solution we've built uses the QEMU emulator on x86 servers to create ARM build environments. We’ve added a number of patches to QEMU to prevent non-deterministic crashes, and this at least gives us a degree of stability at the expense of performance. While we're proud of how well this has worked so far, it's not a perfect solution. For instance, we've seen frequent build failures for projects built in Rust and Go, as the compilers for these languages aren't fully supported by QEMU. But the biggest reason for a better approach is that emulated builds are significantly slower than builds done on native hardware.

Of course, building on ARM servers, especially at the scale of hundreds or thousands of builds a day, requires a cloud infrastructure that until very recently didn't exist. ARM servers have been a topic of conversation for years, but a strong market for easy-to-scale, production-ready hosting options is only just now beginning to emerge. We've been lucky enough to work with the good folks at to get early access to their new ARM servers. These servers contain two CPU sockets, each with a beefy 32 core chip. This allows all 64 cores to be used in parallel, and as can be seen with our benchmarks below, this makes a huge difference in build speeds. The clock speed itself is 2.4GHz, which may not sound like much, but on an ARM chip, this is huge (and light years ahead of QEMU on EC2 instances). With this computing power, we've been able to prove one of the most legitimate early use cases for ARM servers—speedy image builds for ARM devices.

How speedy? In our tests, we've seen build speed increases anywhere from 1.5 to 5 times. We've run comparisons for three projects:

ARM (min:sec) QEMU (min:sec) ARM speedup
BoomBeastic 7:42 16:06 2.1x
resin-electronjs 4:37 7:19 1.6x
resin-opencv 48:27 284:05 5.9x
resin-opencv (with make -j64*) 14:07 66:10 4.7x

*compiled using all available cores

Native ARM builds are now automatically available to all users deploying to ARM devices, such as the Raspberry Pi, BeagleBone, ODroid, and other ARM families. If your next build feels like it finished early, it’s not a bug and nothing’s broken. Your builds might really be five times faster! This means you can spend less time waiting and more time improving your code. From what we've seen, the drastic reduction in build times fundamentally changes the experience of deployment, making fleet updates even more seamless. This brings us another step closer to realizing our vision of making it as easy to build an IoT project as it is to build a web app.

If you want to learn more about what goes into building a Docker container for a device, we've got some information in our documentation. But as always, the best way to learn is to get started on your own project!