There are some misconceptions and simplifications going around Continuous Deployment. We're not arrived to the Continuous version yet at Onebip, but we deploy at least once a day several different projects. This experience, protracted for more than 6 months, has brought up some tips I want to share.
The need for automated tests
It doesn't matter how smart you are, how much you are careful or how little you modify: the effect of a diff of a dozen lines on a complex system such as a modern application are still fundamantelly unpredictable (especially with legacy code going around). For this reason, the deployment frequency goes hand in hand with the automated regression test coverage.
Continuous Deployment is akin to Agile processes: they bring out the problems efficiently, but do not solve them for you. Features breaking are a symptom of the need for basic tests; and deploying early is a way of finding them as soon as they're broken. The cost of finding out however cannot be beared by manual testers, but only by machines performing the same routines over and over again.
Note that testing practicies cannot substitute good engineering, like "just writing code" cannot substitute architecture and constant refactoring in Agile processes: automated and unrelated tests that break very often will bring up your coupling problems before your customers will.
It is rare to have an high volume of requests on all features; for us, small countries and merchants can only be covered with tests for quick feedback on their state with respect to monitoring production traffic.
However, at the application level you are sure to have enough traffic to setup a baseline for what should happen in the minutes after a new deploy. This is easy for us, being a payment system: just track the transacted money and the attached commissions.
Applications not related directly to money can anyway measure the number of successful business transactions:
- price quotations or similar documents successfully saved by all tenants.
- Number of screens of certain key classes.
- Page views on internal URLs.
Once these metrics are available, you can set up an automatic rollback as part of an atomic build If, in the period following a deploy, these metrics deviate from the expected values wildly (like you're not making any new transaction in two minutes) the system should switch back to the previous version of the code and the database. It won't save you from obscure bug, but it will stop basic sanity problems like misconfigured databases and URLs, bugs in integrations that can only be tested in production and so on.
New builds at each...
Continous Deployment follows from Continuous Integration, and using Feature Branches is going to require to put a limit on the time that a branch can spend afar from master, by story splitting or periodical merging. Once the code arrives in master, however:
- new code (one or more very near commits, a pull request being merged) should be built and tested as soon as possible.
- New dependencies (libraries, being imported as git submodules, with Maven, or anything else) must trigger new builds.
- New infrastructure (server, load balancer, database secondary or replica set) must trigger new builds.
- New versions of supporting software (Apache, single PHP extensions, PHPUnit versions, binaries such as R or python)... must trigger new builds.
The goal is just to find out a build is broken not only early, but also with a clear indication of the change that broke it. With this valuable information, a change can be reverted and the deployment pipeline be kept green without significant downtime. Which brings us to the next point...
What if the pipeline stops?
While you're accelerating all these deployments, what happens if the pipeline stops? Once you have get an habit of deploying every day, it's going to be difficult.
The reasons can be many (a non-deterministic test, a broken server, disk failure, cloud services unavailable), but downtime in the deployment pipeline is only second to downtime on the production servers; without it, you can leave production running but you cannot fix any bug or improve production in any way.
Another related problem is that the much slower feedback does not allow to easily find causes of a problem (often in the later stages of the pipeline): if the smoke tests for a single application fail for a day, 24 hours worth of changes are going to get into the later stages (unit or integration or end2end tests, maybe with more than one application involved) at the same time. When a broken build happens then in the later stages, it is not clear at all what is its cause in the midst of new code that is being exercised all at once.
Since the pipeline is a chain of operations each depending on the previous one,- the weakest link blocks the whole flow. This also means that a stage containing flaky tests or a very slow test suite influence the whole process, in problem detection and in the cycle time that it takes for a commit to go into production.
It's not easy to perform deployments every day or every commit: working towards that is a long journey with respect to what you see described in a book. However, treating your testing environments with the same care as the production one will result in the early replication of problems: all the effort in bringing up reliable test suites and build processes has a positive influence over the quality of your code and a negative one on the number of defects that get to the live stage.