Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Maven in a Google Style Monorepo

DZone's Guide to

Maven in a Google Style Monorepo

Is using Maven in a monorepo situation possible? Is it even desirable? This experiment looks at making Maven into a Monorepo-type setup for more modularity.

· Java Zone
Free Resource

Download Microservices for Java Developers: A hands-on introduction to frameworks and containers. Brought to you in partnership with Red Hat.

Let's consider some data. Google’s gigantic Monorepo has:

  • 86TB of history
  • 9 million unique files
  • One branch (Trunk Based Development)
  • 25K developers all committing there

From my calcs (Googlers please correct me), there’s one commit to the trunk every 30 seconds, and their proprietary CI infrastructure keeps up with that on a per-commit basis. That’s a different story.

Monorepo Recap

A Monorepo is where two or more teams with different deployable/shippable applications and/or services (and potentially different release cadences/schedules) exist in the same repo/branch. ‘The trunk’ of Trunk Based Development specifically.

Those checkouts could get big, right?

A Level 2 Monorepo

(Martin Fowler will give this a better name, hopefully).

The checkout to the developer’s workstation expands or contract to the smallest amount of buildable/linkable modules needed to perform test and package operations. Not only that but in a 100% provable way from a deep understanding of the directed graph of buildable things and the things that could use them.

At some level, ex-Googler ex-FaceBooker Buck-committer, Simon Stewart points out, this expand/contract thing is just a view or projection of a Monorepo. He says:

“The amount of boilerplate required to get the maven thing working is terrifying. Both buck and blaze let you just create a new directory, shovel classes into it, and you’re done. The “src” nonsense required to make mvn work without hoop jumping seriously raises the bar to multiple modules.”

He is right of course, but Maven is still the enterprise gorilla.

The Maven Challenge

Maven is a recursive build technology that forward declares child modules to build. Compare that to Google’s Blaze (partially open sourced as Bazel) and Facebook’s Buck which are directed graph build systems where there are no forward declarations. It is no co-incidence that expanding/contracting checkouts are easy with them, because that was the design goal Blaze (the one that came first).

Maven’s forward module declarations look like this:

<modules>
    <!-- Maven works out the build order - phew! -->
    <module>aModule</module>
    <module>another-module</module>
    <module>this_one_has_child_modules_too</module>  
</modules>


All those are directories within the current directory. Maven is going to take some coercing.

Ant and Gradle are the same, but I’ll focus on the Java enterprise gorilla — Maven.

Maven Monorepo Proof of Concept — in Git

See github.com/paul-hammant/googles-monorepo-demo.

I took Google’s Guava because it had a multi-module build that, although small, could be a surrogate for something with hundreds of modules, which could represent a company’s entire set of (Java) deployable/shippable applications and/or services.

The default checkout of this repo doesn’t build as all the pom.xml files were renamed pom-template.xml

No matter, run this:

mr/checkout.sh


Now you have POMs again. mvn install runs as you’d expect now.

Run this on the command line:

mvn com.github.ferstl:depgraph-maven-plugin:aggregate -Dincludes=com.google.guava


This gives a Dot graph (GraphViz) that, via some colorization in OmniGraffle, looks like:

I’ve colored in two items above. Wouldn’t it be nice to modify the checkout (working copy) to have just those two modules, and Maven not choke because of missing modules?

Well, that’s possible now. Do this:

git config core.sparsecheckout true echo '/mr' > .git/info/sparse-checkout
echo '/README.md' >> .git/info/sparse-checkout
echo '/pom*' >> .git/info/sparse-checkout
echo '/guava/' >> .git/info/sparse-checkout
echo '/guava-testlib/' >> .git/info/sparse-checkout
mr/checkout.sh


As promised, mvn install works on just two modules now, rather than the eight before. The dependency graph looks like this:

A Word of Warning

For the love of Turing, have a lock-step version number for everything built in the Monorepo. Maybe Maven’s classic 1.0-SNAPSHOT suffices, and in your CD-esque deployment technologies, you designate something more meaningful in Jenkins (etc).

That’s easy, some more Python fu, that works with the first dot graph to allow you to conveniently modify .git/info/sparse-checkout. Like so:

mr/checkout.sh guava-testlib
# calculates that it needs guava too

# or

mr/checkout.sh guava-tests,guava-gwt
# calculates that it needs another 5: guava-testlib/test, guava,
# guava-testlib/test, guava-tests/test, and guava/test (and not
# guava-testlib at all)


Conclusion

Well, I haven’t convinced anyone bar Googlers and Xooglers. No matter, I’m looking forward to having Git or Mercurial push their size boundaries to get to the place Perforce, Subversion, and PlasticSCM can so that companies can bank on Monorepo setups.

While I Have Your Attention

Can I direct your attention to a portal documenting ‘Trunk Based Development’ (including Monorepos)? -> https://trunkbaseddevelopment.com

No ads, no services being sold, and mobile friendly. Well worth a look, in my opinion.

Download Building Reactive Microservices in Java: Asynchronous and Event-Based Application Design. Brought to you in partnership with Red Hat

Topics:
maven ,trunk based development ,java ,continuous delivery

Published at DZone with permission of Paul Hammant, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}