Book review: Feedback control for computer systems
If you're tired of Agile practices and magic spells for dummies-type of books, here is a different kind of learning for you. It does contain math - not the high school one but at the level of electric and mechanical engineering; in fact, you can say it is one of the few texts that earn the title of software engineering books for the quantitative approach.
Moreover, it is not the faux-mathematical approach of so-called software engineering who try to estimate function points and the maintainability of a system as a number. But I digress.
The fundamental idea
Feedback is known in the field of software development mostly for its application in processes with human actors, not in the product. Biweekly demos gain you feedback from customers of the Product Owner, while Test-Driven Development gives you the feedback fo trying to control, reuse and test an object through its API even before it exists.
Feedback however is applied to the machines that engineers build in other fields, such as in factories and power plants and other production processes. The simplest example of feedback is temperature control, present in all our houses: an output quantity such as the temperature of a room is controlled indirectly through an input that the control system can set (turning off or on the furnace).
The possible applications to computer systems are a direct consequence of identifying variables that need tuning and that we can control indirectly. They also has expanded in the last years with the advent of cloud computing, now that changing variables such as the machine computing power has become possible. Here are some examples contained in the book as full case studies:
- cache hit rate (controlled by cache size)
- response times and request queue length (controlled by the number of servers)
- CPU cooling (controlled by the voltage applied to it)
Each case study contains a mathematical treatment and Python code for its simulation, that has already been performed by the author who has collected several possible control solutions. All data and graphs can easily be reproduced on a machine with Python and Gnuplot.
Three fundamental properties
So far so good: when your response times are going up like crazy, spin up new servers in your Virtual Private Cloud. If they go down too much, shut down some of them since their power is wasted to deal with the current traffic.
Even by ignoring the fact that systems do not usually scale linearly, there are several difficulties that come up in every control problem; these difficulties lead to the study of the following properties:
- stability: does adding a new server make the system stable, or do the number start to oscillate, shutting down two servers, and then spinning up more again?
- performance: how quickly the response times reach the desired state when you add one or more machines?
- accuracy: how precisely the measured response time can track the desired value? It will probably have some highs and lows around that line, but how large they will be?
Coming from an engineering university, I should also add the sensitivity property. Let's say we produced a controller for our system based on the fact that every new server can handle N concurrent requests; how does the system behave when the real parameter becomes N-1 or N-2 due to external factors? Does it react, or lose some of its properties very quickly?
The problems and the potential of the software field
Enterprise software does not have an history of applying feedback control, but it has a potential. While precise and fast sensors for measuring output variables are costly in the physical world, they are quick to insert into existing systems, usually for a low performance penalty.
Moreover, we can also control input variables with precision and speed: changing the cache size is an instant operation. Even in the case of servers to start, we still have options to expand our hardware that are outside of the possibilities of physical plants.
Finally, we also can take short-enough sample steps to control the system every few seconds. If the time characteristic of our system is to change its outputs in milliseconds, we can similarly build a controller that operates at the same frequency, or probably more since he does not have the burden of ordinary traffic on it.
One advantage of computer systems, digital control whose variable values are changed discretely only when the controller decides so, is also one of the problems. The model that we have of a computer system is totally opaque and must be invented and then fit by measurement.
Physical systems usually have some standard differential equation such as the heat transfer equation that can be approximated with linear models; computer systems have few standard models and it is difficult to dream up a transfer function for them.
Why you should read this book
If you want to incorporate feedback control inside your applications, this book is a must-read. The kind of applications suited to these control patterns have wildly changing traffic from the rest of the world, and spikes high enough that they need an optimization process to be handled.
If you have an engineering background, you will be able to finish the book in a couple of full days of reading and exercising; otherwise, it will be much more difficult to study these concept from scratch. The mathematical treatment is condensed and thought as a reference, not as a full course.