Here’s an interesting problem our team faced last month that was extremely infuriating. We were in the process of launching replacement haproxy instances that are used to load balance to nodes in our RabbitMQ cluster. We’ve done this a lot of times before and set all the usual user settings required under
limits.d to ensure proper file descriptors are allocated for the haproxy process. While creating this new role, we also decided to use supervisor to supervise the haproxy process, as it was previously observed in an older release that it didn’t automatically restart when it crashed (which in itself is a rarity).
Everything looked solid and we began throwing some traffic at the new balancer. Eventually, we discovered something had gone horribly wrong! Tons of "connection refused" errors began showing up and the behavior exhibited was what one would expect if file descriptors weren’t being allocated correctly. Sure enough, a quick look at
/proc/<pid>/limits revealed that maximum open file descriptors were set to the very low value of 1024. We directed traffic back to the old balancer and began the investigation. How could this be? All of the settings were correct, so why was it being set to 1024?
Supervisor was one new variable in the mix, so I decided to begin pursuing the supervisor documentation and scanning for the number 1024 to see what might be tied to that. Sure enough, I came to discover the
minfds setting. Let’s take a look at what the supervisor documentation has to say about this setting.
The minimum number of file descriptors that must be available before supervisord will start successfully. A call to
setrlimitwill be made to attempt to raise the soft and hard limits of the supervisord process to satisfy
minfds. The hard limit may only be raised if supervisord is run as root. supervisord uses file descriptors liberally, and will enter a failure mode when one cannot be obtained from the OS, so it’s useful to be able to specify a minimum value to ensure it doesn’t run out of them during execution. This option is particularly useful on Solaris, which has a low per-process fd limit by default. Default: 1024
Well, that doesn’t make much sense. If I’m reading this correctly, it’s simply saying that the number specified is the minimum that should be available, right? The devil, as they say, is in the details. If we look at the documentation on setrlimit we’ll clearly see that this will actually set the limits without any reservations on what it currently is. The call basically is going to set max open files to whatever the value minfds is defined to in supervisor. Sure enough, as an experiment, I set
minfds in supervisor’s configuration to a higher number and after restarting supervisor the number of open file descriptors allocated to the haproxy process were greatly increased and reflected what
minfds was set to.
In the end, this pain also turned out to be unnecessary. While we had used supervisor because it was “what we know well” it turned out that the newer distribution we were releasing on already managed services via systemd, which by default was also configured to respawn on failure.