Dependencies Management in a Crisis
Ayende Rahien brings forth the less attractive side of dependencies ... when they stop working in the middle of the night.
Join the DZone community and get the full member experience.Join For Free
Typically, when people talk about dependencies they talk about how easy it is to version them, deploy them, change and replace them, etc. There seems to be a very deep focus on the costs of dependencies during development.
Today, I want to talk about another aspect of that. The cost of dependencies when you have a crisis. In particular, consider the case of having a 2 AM support call that is rooted to one of your dependencies. What do you do then?
The customer sees a problem in your software, so they call you, and you are asked to resolve it. After you narrowed the problem down to a particular dependency, you now need to check whether this is your usage of the dependency that is broken or whether there is a genuine issue with the dependency.
Let us take a case in point with a recent support call we had. When running RavenDB on a Windows Cluster with both nodes sharing the same virtual IP, authentication doesn’t work. It took us a while to narrow it down to Windows authentication doesn’t work, and that is where we got stuck. Windows authentication is a wonderfully convenient tool, but if there is an error, just finding out about it requires specialized knowledge and skills. After verifying that our usage of the code looked correct, we ended up writing a minimal reproduction with about 20 lines of code, which also reproduced the issue.
At that point, we were able to escalate to Microsoft with the help of the customer. Apparently, this is a Kerberos issue, you need to use NTLM, and there was a workaround with some network configuration (check our docs if you really care about the details.) But, the key point here is that we would really have absolutely no way to figure it out on our own. Our usage of Windows authentication was according to the published best practices, but in this scenario you had to do something different to get it to work.
The point here is that if we weren’t able to escalate that to Microsoft, we would be in a pretty serious issue with the customer “we can’t fix this issue” is something that no one wants to hear.
As much as possible, we try to make sure that any dependencies that we take are either:
- Stuff that we wrote and understand.
- Open source* components that are well understood.
- Have a support contract that we can fall back on, with the SLA we require.
- Non essential / able to be disabled without major loss of functionality.
* Just taking an OSS component from some GitHub repo is a bad idea. You need to be able to trust them, which means that you need to be sure that you can go into the code and either fix things or understand why they are broken.
Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.