Gene Kim recently observed that a good mental model for a continuous delivery pipeline is looks like a change in development appears, and the pushes its way through testing gates towards production. With DevOps drawing an affinity for "pull" from Lean, do we have a problem?
We can think of the pipeline as something like an assembly line in manufacturing. Raw materials (source code changes) enter at one end, and validated, software pops out the other end into production. A developer hacks up some changes and pushes those to source control. A commit trigger in source control demands a build from the CI tool, which pushes the build into a package repo. The deployment tool (which may be the same as the build tool depending on pipeline design) pushes the builds into a test environment, invokes tests, then pushes it forward to the next test environment. Finally, someone chooses to deploy a tested build into production. This looks very "pushy".
Stepping back a level
What if we step out a level and treat the pipeline as a black box. What we would see is the business requesting a changes to the system. Those requests going into a prioritized backlog (inventory). Developers then pull a change request, work it, run some tests to validate it and then put it out for delivery where the business can put it into production at will. Assuming the pipeline can be passed through extremely quickly and developers rarely make mistakes, we can ignore the fact that there are stages to it and pushes within it. Developers are the bottleneck and the pipeline is just the delivery of a single change to the business based on their requests. The business always wants changes immediately and is constantly pulling them. There are about a dozen things wrong with this view, but also some truth at the core. Let's keep looking at this.
But developers make mistakes
The first bad assumption is that the tests run by the pipeline won't find anything. They very often do. In fact, this is how we stumbled into build pipelines in the first place. With continuous integration, we knew that we needed to discover bad commits more quickly. So we wanted to do a build and run the tests for each commit, to deliver feedback to developer. We wanted to stop the line immediately in case of problems rather than let them fester towards integration hell. In the late early 2000s this meant doing a compile and unit tests. By the late 2000s, it became accepted that other tests (functional, API, security, performance, etc) should be run automatically as well to derive feedback. The DevOps unicorns today include production A/B testing and may allow the developer to push to production for that purpose. From this perspective, delivery into production is incidental. A developer has a change (which happens to be her own) and asks the pipeline to validate the the code has not gotten worse. Again assuming the tests are fast, the idea of stages in the pipeline are silly. Why not run all the tests in parallel after the compilation?
Some Tests are Slow or Costly
Pipelines remain linear because some tests are slow (or used to be slow and old assumptions have echoes in current pipeline design). In the "black box" strategy of considering the pipeline as a single validation of a change, we could consider single piece flow. When available, the pipeline would pull a single change from the development team for validation. If the pipeline included manual testing or a production A/B test that took a week's time. One change would flow per week. This is terrible for a few, fairly obvious reasons:
- Low amounts of value are delivered
- Simple, fast tests provide much of the value when asking "did this change break something" at a fraction of the cost
- The longest running tests (manual regression checks) have a batch size of one system, and are (mostly) not impacted by number of changes in that system.
These insights explain why delivery pipelines have stages. They are not assembly lines delivering value to production. They are quality inspections. We can often have a first stage that does a fast check, providing much of the checking value immediately to developers. Deeper, more costly checks that protect production need to pull new versions of the system to validate when those resources (environments or people) are ready. Instead of pulling a new build, those checks consume builds that ran through the rapid checks successfully to avoid duplication of effort or wasting an expensive test cycle on something we know is already broken. If there are naturally several time buckets several stages should emerge. Perhaps build + unit tests are run with each change. API testing of the module runs every ten minutes, usually on a build that represents a single change sometimes on a build with several changes. Long running automated functional tests or performance tests might be run twice a day on builds with a number of changes. Slow manual regression testing happens at its own pace on whatever is passing all the other tests.
While the happy path from the perspective of a change is to be promoted through the pipeline, from the perspective of the slower tests they pull a new version of the system to operate on, and the upstream processes provide a plausible candidate.
Real story: In 2006 we had a customer call in a panic. Dev loved their new CI tool that could crank 20 builds a day. But QA was telling the to stop doing that. Their process was to take sequential build numbers and validate them - about once per day. They wanted to do the "pull" and decide when new builds would happen. A feature was born from this tension. When QA self-serviced a build into their environment, a second build number was generated that would increment only on QA pull. Dev got 20 builds a day, and QA got sequential numbers when they pulled.
So Push or Pull?
In idealized CD, the flow should be a single change initiates a build. Should that test, a battery of fast tests is applied to the build containing a single change. If certified, A/B testing may follow (a slow test with human testers that happen to be users) and eventually the business can choose to accept the change into full use. This is very change centric, and each process triggers a downstream process in something like a push.
For most organizations, the pipeline is a series of successively slower testing types. Where things are very fast, the push dynamic works. Once tests become too slow to run per change the push model breaks. It would result in horrific queues. Instead, numerous changes are consolidated into a single build (given CI, this is free). Those builds are pulled by the slower tests at a cadence that works for those slower tests.
Note: with most enterprise systems, the scope of mid to late test cycles is validating a single build. It's testing a system with changes from multiple builds, and other versioned stuff such as schema changes, infrastructure updates, etc. For the purpose of this conversation, that doesn't matter, but when managing flow through pipelines making sure what was tested together as a system is released together as a system is important.