Charlie and the X-ray Factory: ZeroMQ at ESRF and CERN
Originally authored by Pieter Hintjens
Last week I had the pleasure of visiting not one major European physics research facility, but two. My first stop was at ESRF in Grenoble, then I went to visit CERN in Geneva. Both these organizations are moving to use ZeroMQ for their control systems. In this article I'll explain what that means.
First, the Physics
I'm not a physicist, but my kind hosts — Emmanuel Taurel at ESRF, and then Wojtek Sliwinski at CERN — explained the basics to me. ESRF and CERN both have large circular accelerators, though "large" is relative. ESRF's is the size of a small airport. CERN's is the size of a small town. They also have linear accelerators (LINACs) that act as injectors for the larger circular ones. Actually CERN has a dozen different accelerators, built over time.
The major difference — from the physics point of view, not the software — is that ESRF accelerates beams of electrons, while CERN accelerates beams of protons, because the physics experiments they're conducting on both sites are very different.
ESRF scientists basically want to shoot streams of X-rays at samples to get information about their molecular structure. I assume that it's like shining a very bright light at them.
How do you get streams of X-rays? It turns out that if you take electron beams and bend them (or perhaps it's twist them, I'm not sure), they emit X-rays along the direction they wanted to go in. I've not tried this at home, but it makes sense.
So ESRF shoots electron beams (from very big cathode ray guns, like we used to have in TVs) into a ring and speeds them up, getting them to go faster and faster, and as they spin around, they emit X-rays where they get bent by the massive magnets. There are several kinds of magnets, the size of fridges. The X-rays are guided down with more magnets, filtered and bounced off mirrors, and whatnot, and focused on the samples, where high-speed cameras record the results. The ring carries forty independent beams of electrons, and along the ESRF ring there are forty corresponding experiment rooms.
In ESFR's X-ray factory, therefore, you have a whole set of devices — hundreds — that create, steer, twist, bend, and focus these electrons and X-rays. As there are forty independent beams, they manage each beam more or less separately, though I've no idea how they do this. Magic, I assume.
The control room is basically filled with large panels that show the status of the whole machine as it operates. From these panels, scientists can tune their beams, pump in more electrons, focus them, get readings, and so on, through an impressive graphical user interface. It is a "drive by wire" control system, which was innovative when it came along, for before that, scientists would literally tune their magnets and mirrors by hand.
Now, at CERN, the physics are very different. CERN's mission is to create black holes, or at least very large holes in the ground, because their largest ring, the Large Hadron Collider or LHC, is 27km around. You can't see it except from the buildings on the surface, dotted around its position.
At CERN, the X-rays are a nuisance. What the LHC does is create speeding clouds of protons, which are positively-charged particles with mass (electrons being negatively-charged wavelets that sometimes act as particles but with a tiny mass compared to protons). They speed these clouds up faster and faster, in two beams, one rotating clockwise, and one anti-clockwise. They then smash these two clouds into each other, simulating the Big Bang at the origin of the Universe, and seeing what the heck emerges from that.
The CERN experiments are basically massive collectors, and the LHC has four main ones, and three smaller ones. We visited the largest, ATLAS, buried underground. ATLAS is a "general purpose" collector, meaning it looks for many kinds of collisions. This is what makes it so large. The special purpose collectors focus on just one type of collision and can be much more compact.
So the experiments at ESRF and CERN are entirely different but the two control systems follow a similar pattern. It's these two control systems that both ESRF and CERN are rewriting to take advantage of state-of-the-art technology in messaging, namely ZeroMQ.
Second, the Architecture
The ESRF control software is called Tango, and is a collaborative effort between nine different research institutes. It's open source along the cathedral model (whereas ZeroMQ is a work of the bazaar). Tango used CORBA for its messaging and is now moving to ZeroMQ. Tango has the makings of a widely-used product, even a general-purpose industrial control system, or SCADA, since it is essentially simple and already used in multiple institutes.
The CERN control software (the "middleware") is a different animal. It's much larger and more complex, reflecting the much larger size of CERN's projects, teams, and ambitions. The CERN SCADA is also not open source. The two approaches make an interesting comparison, since there is inevitably some competition, or at least friendly rivalry, between institutes around the world.
The architecture consists of several layers:
- A set of physical devices (lasers, magnets, mirrors, rotating pigeons, and magic hats), aka the "front ends". These run some embedded stack or Linux, and talk over an array of weird and wonderful interfaces such as serial lines.
- A set of "device services" that manage a small group of devices, with custom drivers for each type of device. These device servers expose the devices to the world as addressable objects with properties and some methods. This is the CORBA view of the world as objects. It's not entirely sane, but not totally insane either.
- A name service, which holds a registry of devices and their network addresses (rather, that of the device server which talks to them).
- A set of interactive or batch applications that talk to the device servers using a set of patterns: asynchronous pub-sub, in which devices publish their status; asynchronous request-reply, in which applications query or set device properties, and synchronous request-reply, which is the same but blocking.
There are two API layers that hide the whole messaging stack from normal developers: one for those writing device servers, and one for those writing applications. In both projects the goal is to replace CORBA without modifying the APIs too heavily. CERN has rather more freedom to break the old APIs since the LHC is currently in shutdown, so it's okay to write new applications.
What are the biggest risks and constraints in these projects? Obviously, as in any SCADA, things can't break in ways that damage the whole system. Individual devices will, sometimes, fail. People will write buggy code. But the messaging layer has to be entirely invisible.
This means, mainly, no brokers. While CERN, at least, has a lot of brokers (such as ActiveMQ and RabbitMQ), they don't use these in the LHC control middleware, but for other projects. Each project team makes its own choices, and some people just really like STOMP and JMS, for example.
Throughput and latency are not yet problems, since in both projects the ambition is to replace a well-known technology with a conceptually similar one. ZeroMQ does have higher latency for the synchronous request-reply case but this doesn't matter (it never does; in any real low-latency design you will always work asynchronously).
However as ESRF and CERN gain more experience and confidence with ZeroMQ, I would expect them to move to more interesting use cases. Multicast is an obvious one since there are thousands of devices talking to hundreds of applications. But there are a few other areas where the ZeroMQ community could help a lot over time:
- Building a reusable name server (this would be so useful I plan to make one anyhow, so that "bind" automatically registers a new service, and "connect" automatically looks it up).
- Standardizing on protocols based on ZMTP and ZRE to talk to devices so that the device server layer could be removed.
- Helping ESRF and CERN with the process of building community around their open source layers. At CERN I was fortunate to be able to present our community-building process to a hundred or so developers. At ESRF I presented much the same arguments to a packed room of thirty or forty developers.
It's nice to see such significant organizations as ESRF and CERN choosing ZeroMQ for their most important control systems. I'd expected a little more competition from other products but there's nothing out there with the same mix of scale (the sheer number of projects in and around ZeroMQ is impressive), maturity, and appeal to developers.
With a little patience and attention, we could see ZeroMQ become the technology of choice not just in synchrotrons and accelerators but also factories and airports and more, and ZeroMQ's protocol — ZMTP — could become the "Protocol of Things" for the Internet of Things. Which would be fitting, since the Web was invented at CERN.