Erlang: live upgrade
Join the DZone community and get the full member experience.
Join For FreeThere are many ways to upgrade a running system, and one of the key points in these upgrades is keeping the service for the end users available for as long as possible. In this context, availability is the percentage of time where the system is running or, equivalently, the probability to find the system running at a random time.
Language differences
Every platform has a specific way to deal with upgrades:
- Java and the JVM initially required a restart of the Java virtual machine, but now in more complex environments allows for hot plugging new classes, or unload, restart and reload single services (OSGi).
- PHP, Ruby and other interpreted languages can usually just switch a simbolic link fro the old version of the application to the folder containing a new one.
- C and C++ applications require a restart unless they are designed around it (like Chrome)
Thus Erlang is built in such a way that is possible to replace modules while a process is running, and there is an eplicit semantic on when the replacement occurs (not just at random when loading the code).
Semantics
In particular, Erlang provides the concept of old and new version of a module: at any time, two different and consecutive version of a module may be running inside the same machine.
Of course the processes that are running the old version of the module need a seam where the new code can be inserted. This does not happen inside function executions, but only at new fully qualified calls; the data structures passed around remain the same and the program never stops (so the signatures of the seam functions must remain identical).
After an upgrade:
- any new call from module to anotherModule:function() will result in the new version of function being called.
- Inside a module, only fully qualified calls of the form module:function() will result in the new version of the module being used.
So inside a module all local calls remain tied to the old version.
A running example
Here is a module which contains a seam for upgrade:
-module(loading_12). -export([wait_until_correct_answer/0]). answer(_Question) -> 0. wait_until_correct_answer() -> Answer = answer("6 times 7?"), io:format("Answer is: ~w~n", [Answer]), check_answer(Answer). check_answer(42) -> ok; check_answer(_) -> timer:sleep(2000), loading_12:wait_until_correct_answer().
The seam is the loading_12:wait_until_correct_answer() call on the end of the module. wait_until_correct_answer/0 continues to poll answer/1 until it gives out the right result.
Let's try to use the current version of the module, starting from compilation:
[18:35:56][giorgio@Galen:~/erlang-series/src]$ erl Erlang R14B02 (erts-5.8.3) [source] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false] Eshell V5.8.3 (abort with ^G) 1> c(loading_12). {ok,loading_12}
We can spawn the main loop in another process, to be able to continue to work in the shell:
2> spawn(loading_12, wait_until_correct_answer, []). Answer is: 0 <0.39.0>
The process continues printing every wrong answer it gets:
Answer is: 0 Answer is: 0 Answer is: 0 Answer is: 0 Answer is: 0 Answer is: 0
Now we can fix the module by returning 42 from answer/0, and recompile it, which will cause the new version to be loaded.
3> c(loading_12). {ok,loading_12}
The main loop gets the correct answer, and stops.
Answer is: 42 4>
More upgrades
The Erlang Programming book even goes as far as explaining a pattern for upgrades that lets you upgrading explicitly after having loaded the code by sending a message to the loop process.
This pattern consists of a special message upgrade* that is handled differently from the loop process: it is the only mesage that once handled triggers a fully qualified calls like module:loop(State) instead of loop(State).
This hook lets you decouple the upgrade moment from when the code is loaded in the system, keeping in mind that only two versions (old and new) of a module are allowed in the machine at any time.
Moreover, the hook can be used to convert the State data structures into the format required for the new module, like we would do in a database migration. The module will not be available if the migration takes a long time, but it would queue messages in its mailbox as usual, and handle them after the upgrade has finished.
Conclusions
When non-functional requirements impose a very high availability, there is the possibility of investing in systems that never stop. Even entire languages like Erlang can be designed for such use cases; in particular, the functional and modular approach of Erlang lets the programmer define very precise moments where to perform the upgrade, without losing user requests in the process.
The code for this series is available on GitHub.
Opinions expressed by DZone contributors are their own.
Comments