You have an application to slide into production on some release cadence. You’ve split it up into a number of services, perhaps.
Are all the services get deployed together? Tricky. You could have them in one repo (directory separation, directed graph or recursive build system) and take advantage of atomic commits, but you’ll also ordinarily have a larger checkout and a larger amount of data exchanged with each pull versus the single service you’re working on.
Google has a single repo for 25K developers. In that, there are many hundreds of separately buildable/deployable things. Each of those has its own release cadence. Some go into prod with every commit (Continuous Deployment style). Others have daily, weekly, or monthly cadences. It is all made sane by their expanding and contracting mono repo configuration and their Blaze (Bazel to you and me) build system which evolved with it.
What seems wasteful is an app/service with certain release cadence being the product of two or more repositories — at least, it does where the product of those two repos had no other dependant apps.
Git Loves Microservices (or) the Push/Pull Bottleneck
In the microservices era, it makes sense to have one repo per microservice — at least, it does if they are separately deployable (the point of microservices). Being in a separate repo, they are by default separately buildable, of course. Being in a separate repo, they manage to avoid a push/pull bottleneck that comes with default Git usage. Perforce’s GitFusion side-install didn’t have that bottleneck, but the vast majority of Git teams are not using that.
When Git gets past the push/pull bottleneck, then we maybe are able to step back from the one repo per microsoervice world that we are in now and see team VCS use evolve in ways closer to Google’s.
Incidentally, if Google is not competing with your business idea (they won’t balkanize their ad revenue), count your lucky stars as their developer throughput is better than yours and they’d change your business and rules of engagement from afar. They will learn your domain/vertical faster than you can learn their developer efficiency. Oh, Buzz and GooglePlus notwithstanding.
Twitter Conversation With Sam Newman
A correction from me — you can do a mono repo for even large teams without Buck/Bazel (Blaze) — Maven, Gradle and other recursive build systems are fine choices, too (in theory). We also discussed lockstep upgrades (which I like in a mono repo), and lockstep releases, which is at least entangled in this blog entry.
Why Focus on Buildable/Deployables?
Ok, so a modular build is a good thing if it allows you to choose to build one module only (or a subset of modules). At least if that allows elapsed time to be saved on the build. It is also good if the build technology itself can determine a elapsed time saving on the build (versus the full build).
Maven (and similar) allows a modular build structure. You can
cd into one of the sub modules, and build from there (and do naughty things like
-DskipTests). That is an effective way of shortening elapsed build times. Jason van Zyl (Mr. Maven) made a Smart Builder that does some deterministic quickening of build, but I believe the competing
Gradle wins for speed.
Buck and Bazel (both in the image of Blaze) are directed graph build systems. There’s still intermediate buildable things, that it can build or skip the building of (on each build invocation). Those are modules too, even if you’re less aware of them. You always build from root with directed graph build technologies, and the
cd to sub-module thing of Maven/Gradle doesn’t apply.
Anyway, each module is a buildable thing, but the dev team has no intention of ever deploying a module on its own. Each module could be a Jar, and only a collection of those jars (say a WAR file) makes sense to deploy. Thus, the number of repositories determination is for things that are buildable and deployable.