Over a million developers have joined DZone.

Managing Source Code: When Common Sense Fails

DZone 's Guide to

Managing Source Code: When Common Sense Fails

When it comes to managing source code, Viktor Sadovnikov is in favor of building tools that can work with single large source repositories.

· Agile Zone ·
Free Resource

We have been writing software for more than half of a century, and during this time we've learned that dealing with all large problems at once is not productive. Things go wrong, plans do not hold, maintenance turns into a nightmare. We've got to break large tasks into a few smaller parts, relations among which are overseeable, and to keep on breaking these parts down until implementation becomes clear.

This approach makes sense and we apply it everywhere. And it does work really well... well, nearly everywhere. How do we manage source code? We call every logically complete unit a project that can be released, delivered, and used independently from others. Therefore, we create a separate source repository for it and version it there. Yes, of course, not all projects are trivial and therefore they may consist of a few modules, but all of them are managed within one repository. In most of the cases, our project needs external libraries too. So, we let build tools to resolve external dependencies from artifact repositories.

What Does This Approach Give Us?

  • Clean boundaries among projects
  • Easier reusable in other contexts code
  • Fine-tuned access control to the sources
  • Small, fast builds and tests

Project A with modulesIn the image above, we see Project A with two applications to deliver, which use three other modules. All of these modules depend on a few external libraries. Modules within the project depend on the very latest versions of each other. If the project uses Maven, the version of the dependency is often expressed as ${project.version} - use whatever you have right now.

Let's assume the two applications in our project are iOS and Mac weather apps and we decided to version their code within one repository. And now we are adding Android app, which shares some logic with the first applications. Since its release cycle won't be matching the cycle of Project A, we create a separate repository for it and move shared components to a separate repository too.Two Project with Shared Components

Everything is still very clear and logical. Every project has its own purpose. Changes in Shared Components do not affect delivery projects until they decide to start using the changed code. We still keep things separate and we can focus on one thing at the time.

However, what did we have to do before creating the repository for Project B?

  1. Create Shared Components repository
  2. Copy code of components from Project A to Shared (with a drop of their test coverage)
  3. Release Shared Components
  4. Delete components from Project A
  5. Add external dependencies to Project A

Before continuing, I'd like to ask you to answer a few questions (quickly, just for yourself):

  • Did you ever add a unit test in a shared library to remove IDE complain about an unused public method?
  • Did you ever extend a class from a shared library in order to overwrite a method and to work around a bug?
  • When do you think about updating versions of external dependencies of your project?
  • What is Semantic Versioning and what is it used for?
  • What are GIT Submodules and Subtrees for?
  • What is the @depricated annotation for?

I think by now you already suspect where I'm going to. The tendency to break down source code into multiple repositories brings serious challenges. Let's me try to group them.

Cross-Project Code Changes

  • Just because the scope of every project is kept small, refactoring of the shared code very quickly turns into an incompatible code change.
  • Long chains of releases due to chains of dependencies: what happens if Project A has a problem due to a bug in Shared Components? We need to release the Shared Components with a fix, change Project A to start using the fixed version, and then release Project A. This might sound doable in our example. However, the length of Spring Boot dependencies chain (only within Spring Projects) is eight. And this is short because all Spring Boot 79 modules are in a single project and repo
  • The temptation to just pick a version and “stabilize” (meaning, stagnate): refusal to use the latest version of shared libraries delays discovering bugs.
  • The growth of obsolete and deprecated code: authors of the shared libraries very quickly lose track of their users. Just to keep compatibility obsolete and deprecated code is hardly ever removed.
  • Loss of atomicity of large cross-modules commits: if somebody dares to work across the projects, the committed changes are not atomic

Dependencies Conflicts

  • External dependency versions of multiple repos are more likely to conflict. In our example, both Project A and the Shared Component depend on the same external library, but it becomes difficult to keep direct and transitive dependency on the same level. If not, the potential problem won't be noticed at compilation, but at runtime
  • Diamond dependency is a more complicated case of the same problem.
  • These problems become even more interesting because dependencies are set among modules, but we version and release projects. For example, 24 modules of OAuth for Spring Security depend on modules of Spring Boot. However, at the same time, two modules of Spring Boot depend on OAuth for Spring Security. What does it mean? If an incompatible change is introduced in one of the two, there is a risk of running into a problem at runtime until both are released again.

In larger projects figuring out compatible versions of internal and external dependencies becomes somebody's full-time job.

Re-Use and Sharing of the Code

  • Multiple repositories and projects create multiple development environments with possibly different requirements. This is an additional obstacle for developers to start working with extra projects.
  • Not fixing bugs in “not my code”—did you notice a bug is shared code? Will you fix it, submit a change, and wait for new release or will you simply work around the bug in your code?

Most OSS projects, which limit their scope, can safely ignore these problems. However, their scale grows together with the growth of the company source base. Therefore, companies like Google, Facebook, Twitter, Digital Ocean, Salesforce, and Etsy (there must be more) utilize the monolithic repository approach. How is it different? Let's draw our three applications in monorepo.All code in MonorepoNow all three applications use the very latest version of shared components and this resolves the problems above.

  • Developers can work with the entire source base and IDE will help them with refactoring.
  • Noticed a bug is shared code? Fix it and release your application. There is no need for releases in between.
  • Changed shared code is used right away by all applications—immediate feedback on changes.
  • Obsolete and deprecated code can be removed; your IDE and tests know its users.
  • All hassle with internal dependencies is gone; use the latest, which is the same for all modules.
  • Code analyzers start seeing all the code. There's no copying and pasting across projects.
  • There are still external dependencies, but things like <dependencyManagement> help keep them aligned within one project.
  • Developers have to create only one development environment and its increased complexity might make you think about unifying requirements of the applications.

By no means, monolithic repositories imply creation of large monolithic applications. Monolithic repositories is an approach for handling the code, for making it visible and reusable. All practices for packaging applications and deploying them remain. It’s perfectly possible to deliver microservices from monolithic repositories as well as, unfortunately, it’s possible to compose monolithic giants from hundreds of small repositories.

Yes, this approach has its own problems, too.

  • Popular VCS do not allow for the fine-tuning of authorization to change code in directories.
  • When code base grows even further, they stop managing the volume well.
  • They do not enable selective checkout and this might overload IDE.
  • Builds become heavier and slower.

However, these are implementation problems, problems of the development tools we've developed to work with small repositories. Should we uhra git in all situations? Should we run "clean" build all the time? Large companies have been creating their own tooling and it gradually becomes available. Here are a few examples:

  • Bazel is Google's own build tool, now publicly available in Beta.
  • Buck is a build system developed and used by Facebook.
  • Pants is a collaborative open-source project built and used by Twitter, Foursquare, Square, and Medium.

I'm not claiming Monolithic Repository to be a panacea for all software development problems. However, if your company grows, has open and collaborative culture and health of your code base, it is a very viable approach for handling the code.


  • http://jv-ration.com/2016/09/when-common-sense-fails/

  • http://dl.acm.org/citation.cfm?id=2854146

  • https://danluu.com/monorepo/

  • https://gist.github.com/arschles/5d7ba90495eb50fa04fc

  • https://gist.github.com/technosophos/9c706b1ef10f42014a06

  • http://blog.xebia.com/monorepos-for-true-ci/

  • https://www.digitalocean.com/company/blog/taming-your-go-dependencies/

source control ,code quality ,git ,mercurial ,build tools ,dependencies

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}