I’ve come across a few teams in my career who have unstable build and test environments. When they develop a feature and test it on their test servers everything’s fine, but as soon as they move the code into production bugs that didn’t show up before begin to manifest.
I had one client whose system was so sprawling and complex they weren’t able to reproduce production bugs on their test servers. Things got so bad they had to build a second data center.
For this company, releasing to production meant bleeding 1% of their traffic from their active data center to their inactive data center. If everything went well, an hour later they would bleed off another 1% of their traffic. And they kept doing this — as long as things went well — until after several days all of the traffic was moved over to the new data center. Then they’d start the process all over again with a new release on the other data center.
This was an extremely painful and costly approach to releasing code in an unstable environment.
I’ve heard similar stories from other developers as well — and these are not amateur efforts but rather multi-million dollar businesses. I even heard a rumor that some of the bugs in Apple’s iOS operating system only show up in production and can’t be reproduced in their test environment.
These issues may be caused by different factors but one thing is for sure, they aren’t caused by the random nature of computers because computers aren’t random. They are entirely deterministic.
If there is a difference between the ways a test server operates and the ways a production server operates with exactly the same kind of inputs then it means that the servers themselves, the way they are configured, are not identical.
Systems should be easily replicable and configurable and a system’s configuration depends on a lot more than just the code in the system.
Version everything that the build depends on, and this includes much more than the source code. It includes configuration files and database schemas, stored procedures and build scripts. It may include hundreds of files other than source code. But it’s only when we version all dependencies of the build that we’re able to create a reliable and traceable build.
It’s especially important to version everything the build depends on, including installation and configuration files.
Having identical environments for build, test, and production is critical to making this work. I’ve heard a lot of horror stories about how a system administrator gets a build server to work by updating the configuration but then forgets to check the file back into version control (or doesn’t have a policy to do this) and then the system fails in production because the configuration change never made it to the production server. Don’t let this happen on your team. Version everything!