The 2016 Git Retrospective: Submodules

DZone 's Guide to

The 2016 Git Retrospective: Submodules

Submodules get a bad rap due to usage complexities and because it's easy to break them — but they have uses and are, I think, the best choice for vendoring dependencies.

· Agile Zone ·
Free Resource

Welcome to part five of our Git in 2016 retrospective series! In part four, we looked at improvements made to Git's rebase command. This time, we'll be looking at some enhancements made to one of Git's more contentious features: submodules

Submodules allow you to reference and include other Git repositories from inside your Git repository. This is commonly used by some projects to manage source dependencies that are also tracked in Git, or by some companies as an alternative to a monorepo containing a collection of related projects. 

Submodules get a bit of a bad rap due to some usage complexities and the fact that it's reasonably easy to break them with an errant command.

Image titleHowever, they do have their uses and are, I think, still the best choice for vendoring dependencies. Fortunately, 2016 was a great year to be a submodule user, with some great performance and feature improvements landing across several releases:

Image title

In March, Git v2.8 brought the ability to fetch submodules in parallel when using the  --jobs option with --recurse-submodules. In June, Git v2.9 sped things up even more with the --shallow-submodules option for git clone. November's Git v2.11 release improved performance even further with submodule alternates and also included a new strategy for diffing submodules. 

Parallelized Fetching

When cloning or fetching a repository, appending the --recurse-submodules option means any referenced submodules will be cloned or updated, as well. Traditionally, this was done serially, with each submodule being fetched one at a time. As of Git v2.8, you can append the --jobs=n option to fetch submodules in n parallel threads. 

I recommend configuring this option permanently with:

$ git config --global submodule.fetchJobs 4

...or whatever degree of parallelization you choose to use.

Shallow Submodules

Git v2.9 introduced the git clone -−shallow-submodules flag. It allows you to grab a full clone of your repository and then recursively shallow clone any referenced submodules to a depth of one commit. This is useful if you don’t need the full history of your project’s dependencies.

For example, consider a repository with a mixture of submodules containing vendored dependencies and other projects that you own. You may wish to clone with shallow submodules initially and then selectively deepen the few projects you want to work with.

Another scenario would be configuring a continuous integration or deployment job. Git needs the super repository as well as the latest commit from each of your submodules in order to actually perform the build. However, you probably don’t need the full history for every submodule, so retrieving just the latest commit will save you both time and bandwidth.

Submodule Alternates

The --reference option can be used with git clone to specify another local repository as an alternate object store to save recopying objects over the network that you already have locally. The syntax is:

$ git clone --reference <local repo> <url>

As of Git v2.11, you can use the --reference option in combination with --recurse-submodules to set up submodule alternates pointing to submodules from another local repository. The syntax is:

$ git clone --recurse-submodules --reference <local repo> <url>

This can potentially save a huge amount of bandwidth and local disk but it will fail if the referenced local repository does not have all the required submodules of the remote repository that you’re cloning from.

Fortunately, the handy --reference-if-able option will fail gracefully and fall back to a normal clone for any submodules that are missing from the referenced local repository:

$ git clone --recurse-submodules --reference-if-able <local repo> <url>

Submodule Alternates

Prior to Git v2.11, Git had two modes for displaying diffs of commits that updated your repository’s submodules:

git diff --submodule=short displays the old commit and new commit from the submodule referenced by your project (this is also the default if you omit the --submodule  option altogether):

$ git diff 25ad6a3~1 25ad6a3

diff --git a/src/liblibc b/src/liblibc
index 5a17b4a733a2..ebeab042e6bb 160000
--- a/src/liblibc
+++ b/src/liblibc
@@ -1 +1 @@
-Subproject commit 5a17b4a733a22d445fdd63326f826fcd8a584328
+Subproject commit ebeab042e6bb14a447627b57ed9a493e2cc0e095

git diff --submodule=log is slightly more verbose, displaying the summary line from the commit message of any new or removed commits in the updated submodule:

$ git diff 25ad6a3~1 25ad6a3 --submodule=log

Submodule src/liblibc 5a17b4a733a2..ebeab042e6bb:
  > Auto merge of #426 - alexcrichton:s390x, r=alexcrichton
  > Auto merge of #425 - alexcrichton:appveyor-target, r=alexcrichton
  > Auto merge of #424 - mmatyas:android_afnetlink, r=alexcrichton
  > Auto merge of #422 - alexcrichton:workspaces, r=alexcrichton
  > Auto merge of #420 - kallisti5:master, r=alexcrichton

Git v2.11 introduces a third much more useful option: --submodule=diff. This displays a full diff of all changes in the updated submodule:

$ git diff 25ad6a3~1 25ad6a3 --submodule=diff

Submodule src/liblibc 5a17b4a733a2..ebeab042e6bb:
diff --git a/src/liblibc/.travis.yml b/src/liblibc/.travis.yml
index 47a50c7721ea..e02f9ca2568f 100644
--- a/src/liblibc/.travis.yml
+++ b/src/liblibc/.travis.yml
@@ -33,7 +33,7 @@ matrix:
     # build documentation
     - os: linux
       env: TARGET=x86_64-unknown-linux-gnu
-      rust: stable
+      rust: nightly
       script: sh ci/dox.sh


Next Up: Stash It, 2017 Style

I'd still typically recommend submodules only as a last resort, as dependency management systems are typically more effective for combining projects. However, with the raft of improvements from 2016, they've certainly become a lot more palatable to use when you do need them! Stay tuned for the final article in our retrospective series on improvements made to another unique Git feature: the git stash command.

As always, if you've got some Git tips to share, let me know on Twitter! I'm @kannonboy.

If you stumbled on these articles out of order, you can check out the other topics covered in our Git in 2016 retrospective below:

Or, if you've read 'em all and still want more, check out Atlassian's Git tutorials (I'm a regular contributor there) for some tips and tricks to improve your workflow.

agile, diffs, fetching, git, submodules

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}