Maven in a Google Style Monorepo
Maven in a Google Style Monorepo
Is using Maven in a monorepo situation possible? Is it even desirable? This experiment looks at making Maven into a Monorepo-type setup for more modularity.
Join the DZone community and get the full member experience.Join For Free
Let's consider some data. Google’s gigantic Monorepo has:
- 86TB of history
- 9 million unique files
- One branch (Trunk Based Development)
- 25K developers all committing there
From my calcs (Googlers please correct me), there’s one commit to the trunk every 30 seconds, and their proprietary CI infrastructure keeps up with that on a per-commit basis. That’s a different story.
A Monorepo is where two or more teams with different deployable/shippable applications and/or services (and potentially different release cadences/schedules) exist in the same repo/branch. ‘The trunk’ of Trunk Based Development specifically.
Those checkouts could get big, right?
A Level 2 Monorepo
(Martin Fowler will give this a better name, hopefully).
The checkout to the developer’s workstation expands or contract to the smallest amount of buildable/linkable modules needed to perform test and package operations. Not only that but in a 100% provable way from a deep understanding of the directed graph of buildable things and the things that could use them.
At some level, ex-Googler ex-FaceBooker Buck-committer, Simon Stewart points out, this expand/contract thing is just a view or projection of a Monorepo. He says:
“The amount of boilerplate required to get the maven thing working is terrifying. Both buck and blaze let you just create a new directory, shovel classes into it, and you’re done. The “src” nonsense required to make mvn work without hoop jumping seriously raises the bar to multiple modules.”
He is right of course, but Maven is still the enterprise gorilla.
The Maven Challenge
Maven is a recursive build technology that forward declares child modules to build. Compare that to Google’s Blaze (partially open sourced as Bazel) and Facebook’s Buck which are directed graph build systems where there are no forward declarations. It is no co-incidence that expanding/contracting checkouts are easy with them, because that was the design goal Blaze (the one that came first).
Maven’s forward module declarations look like this:
<modules> <!-- Maven works out the build order - phew! --> <module>aModule</module> <module>another-module</module> <module>this_one_has_child_modules_too</module> </modules>
All those are directories within the current directory. Maven is going to take some coercing.
Ant and Gradle are the same, but I’ll focus on the Java enterprise gorilla — Maven.
Maven Monorepo Proof of Concept — in Git
I took Google’s Guava because it had a multi-module build that, although small, could be a surrogate for something with hundreds of modules, which could represent a company’s entire set of (Java) deployable/shippable applications and/or services.
The default checkout of this repo doesn’t build as all the
pom.xml files were renamed
No matter, run this:
Now you have POMs again.
mvn install runs as you’d expect now.
Run this on the command line:
mvn com.github.ferstl:depgraph-maven-plugin:aggregate -Dincludes=com.google.guava
This gives a Dot graph (GraphViz) that, via some colorization in OmniGraffle, looks like:
I’ve colored in two items above. Wouldn’t it be nice to modify the checkout (working copy) to have just those two modules, and Maven not choke because of missing modules?
Well, that’s possible now. Do this:
git config core.sparsecheckout true echo '/mr' > .git/info/sparse-checkout echo '/README.md' >> .git/info/sparse-checkout echo '/pom*' >> .git/info/sparse-checkout echo '/guava/' >> .git/info/sparse-checkout echo '/guava-testlib/' >> .git/info/sparse-checkout mr/checkout.sh
mvn install works on just two modules now, rather than the eight before. The dependency graph looks like this:
A Word of Warning
For the love of Turing, have a lock-step version number for everything built in the Monorepo. Maybe Maven’s classic
1.0-SNAPSHOT suffices, and in your CD-esque deployment technologies, you designate something more meaningful in Jenkins (etc).
That’s easy, some more Python fu, that works with the first dot graph to allow you to conveniently modify
.git/info/sparse-checkout. Like so:
mr/checkout.sh guava-testlib # calculates that it needs guava too # or mr/checkout.sh guava-tests,guava-gwt # calculates that it needs another 5: guava-testlib/test, guava, # guava-testlib/test, guava-tests/test, and guava/test (and not # guava-testlib at all)
Well, I haven’t convinced anyone bar Googlers and Xooglers. No matter, I’m looking forward to having Git or Mercurial push their size boundaries to get to the place Perforce, Subversion, and PlasticSCM can so that companies can bank on Monorepo setups.
While I Have Your Attention
Can I direct your attention to a portal documenting ‘Trunk Based Development’ (including Monorepos)? -> https://trunkbaseddevelopment.com
No ads, no services being sold, and mobile friendly. Well worth a look, in my opinion.
Published at DZone with permission of Paul Hammant , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.