How Google Does Code Review
How Google Does Code Review
Join the DZone community and get the full member experience.Join For Free
Whatever new awaits you, begin it here. In an entirely reimagined Jira.
Mondrian has been talked about a lot over the years. Guido Van Rossum, who lead the development talked in 2006 about it, and the fun he had making it:
Guido was a Googler then, but now he works for DropBox. A thoroughly nice guy.
This could be out of date as I only have context up to the end of 2008, but for the most part they only review commits that happen on the Trunk. Apart from Android, Google does Trunk Based Development (TBD) everywhere. They made the review process as lean as they could, by making reviews ordinary and continuous. In Google’s view, Perforce’s change-lists (CLs) either have or have not been reviewed. Mondrian keeps track of those. Releases, as such, don’t have a late cycle of reviews. Indeed it feels that ‘Review’ is not an entity within Mondrian, but instead another artifact of a commit (a CL). As it is not an entity in itself, there is a 1:1 mapping of reviews to CLs. I’m laboring it now, but change-lists are the only important noun going on here.
- More than one person can dive into a review
- Some people are going to be keener than others to review the work of others
- Reviews happen within minutes or hours of code-completion
- As a developer, you are naturally going to be offered reviews for things in your project (or library/framework) area..
- .. But you might also review something completely out of your assigned area
Thus to Google “Continuous Review” is merely the speedy picking up of unreviewed pending commits. By ‘pending’, I mean that the commits have not hit the trunk yet. Indeed until they ‘pass’ review, they are not going to. Perforce’s pending commits can have a unique number, and be extracted from the developer’s workstation even before they are allowed to be committed to the trunk for posterity (after successful review). Guido had to do a lot of tooling to make that smooth though.
Google’s internal Mondrian has inspired two open source equivalents
Unrelated to Mondrian:
- Atlassian’s Crucible is the big commercial success (Git, Mercurial, Subversion and Perforce)
- ReviewBoard (open source & for Subversion, Git, Mercurial, Bazaar, Perforce, ClearCase & Plastic)
- Phabricator (open source & for Git, Mercurial and Subversion)
- Collaborator" (commercial for – Git, Mercurial, Subversion, ClearCase, Perforce and TFS)
These are all installable. There are Portals like Github that have an built-in review mechanism, that I’m not going to go into.
Enterprises other than Google
Most enterprises are not going to setup pre-commit reviews, but instead to a boy-scout-pledge to not break the build and review after the commit. With Review as an entity, it’s not uncommon to group multiple commits into one code review. With Review as a larger entity, you might have a formal workflow around it, and indeed have a formal assignment and management aspect to the review.
Atlassian’s Crucible is an installable tool that works like that – assignable Reviews that can happen any time and are usually an aggregation of commits. Crucible (which I’m going to concentrate on for the remainder of this article) can review commits on any number of branches and attempts to make sense of the ancestry.
Bigger enterprises with more separate projects going on, are going to do there reviews late (if at all). Industry regulators as well as internal wishes to be more professional can drive code-reviews.
Problem – Not knowing you’ve reviewed something already
We’re going to dive into a worked examples here, for Crucible specifically, as it’s the Gorilla in the room.
Install your own Crucible (a 30 day demo is downloadable), and launch it.
Crucible With Subversion
Make a Subversion repo with this script (create_repo.sh):
#!/bin/sh svnadmin create $2 perl -p -i -e "s/# anon-access/anon-access/g" $2/conf/svnserve.conf perl -p -i -e "s/# auth-access/auth-access/g" $2/conf/svnserve.conf perl -p -i -e "s/# password-db/password-db/g" $2/conf/svnserve.conf perl -p -i -e "s/# harry = harryssecret/harry = harry/g" $2/conf/passwd svn mkdir file:///path/to/root/directory/$2/trunk -m "create trunk" svn mkdir file://$1/$2/branches -m "create branches"
Use that like so, before starting the Svn Server:
./create_repo.sh /path/to/root/directory repo_name sudo svnserve -d --foreground -r /path/to/root/directory
In Crucible Admin console add a repository for svn://127.0.0.1/repo_name
Run this shell script (svn_test.sh) to populate the directory:
#!/bin/sh mkdir -p svn_workingcopy rm -rf svn_workingcopy/* rm -rf svn_workingcopy/.svn svn --username harry --password harry co $1 svn_workingcopy cd svn_workingcopy/trunk echo "** Create trunk version of file" echo "Efficiently unleash cross-media information without\nimmediate value. Dramatically\nmaintain clicks-and-mortar solutions without\nfunctional aspects." > file.txt svn add file.txt svn --username harry --password harry commit -m "initial version on trunk" # svn --username harry --password harry copy $1/trunk $1/branches/br1 -m "branch br1 from trunk w/o mods" echo "** Create Branch .." svn --username harry --password harry copy $1/trunk ../branches/br1 if [ "$2" = "mod" ]; then echo "** .. With small Modification during branching" perl -p -i -e "s/aspects/bits and pieces/g" ../branches/br1/file.txt fi svn --username harry --password harry commit .. -m "branch br1 from trunk with one mod" echo "** Modify trunk version of file" perl -p -i -e "s/cross-media/sexed-up/g" file.txt perl -p -i -e "s/clicks-and-mortar/empirical/g" file.txt svn --username harry --password harry commit -m "follow-up change on trunk" cd ../branches svn up --username harry --password harry cd br1 echo "** Modify branch version of file" perl -p -i -e "s/cross-media/socially-awkward/g" file.txt perl -p -i -e "s/immediate/any/g" file.txt svn --username harry --password harry commit -m "divergent and potentially clashing modification on the branch" svn up .. echo "** Merge trunk to branch at HEAD, preserving 'any' from the br1 ancestor, taking 'empirical' from the trunk, and choosing neither ancestor with 'tie-dyed'" svn --username harry --password harry merge --accept=postpone $1/trunk . echo "Efficiently unleash tie-dyed information without\nany value. Dramatically\nmaintain empirical solutions without\nfunctional aspects." > file.txt if [ "$2" = "mod" ]; then perl -p -i -e "s/aspects/bits and pieces/g" file.txt fi svn --username harry --password harry resolve --accept=working file.txt svn --username harry --password harry commit -m "merge from trunk to branch, with resolved conflict (and an auto-merge)"
Then go into the web interface and take a look at the “Commit graph” for the repo:
The commit I’ve highlighted (#4) created the branch. Note yesterdays article raised a Subversion defect – the test case I’m using made the branch ‘unchanged’ from the trunk, yet Subversion registers a commit, and that commit is noisy compared to what is should be:
ph7785:svn_workingcopy paul$ svn diff -r 3:4 Index: branches/br1/file.txt =================================================================== --- branches/br1/file.txt(revision 0) +++ branches/br1/file.txt(revision 4) @@ -0,0 +1,5 @@ +Efficiently unleash cross-media information without +cross-media value. Quickly maximize timely +deliverables for real-time schemas. Dramatically +maintain clicks-and-mortar solutions without +functional solutions.
This lines didn’t change! Crucible fixes that for you though, with some clever double-checking:
Without that Crucible fix for (or masking of) the Subversion bug, you might end up reviewing everything that happened on trunk all over again, even if you reviewed it before.
There another snafu in Subversion in that the merge from revision 5 on trunk, isn’t shown into revision 7 on branch br1. I’ve added a red arrow to show what that should look like:
Atlassian note some limitations in an article Viewing the Commit Graph for a Repository (Search for ‘subversion’ in the page). They could, I suspect, eliminate that bug.
The Clashing Merge (and Ultimate Commit) – #7
Two Changes to Note
Line 1 – Versus the common ancestor which said ‘cross-media’, the trunk (before the merge) has been changed to ‘sexed-up’. Versus the common ancestor, the branch br1 (before the merge) has been changed to ‘socially-awkward’. Both of those counted for nothing though, as we arbitrated by choosing ‘tie-dyed’ instead.
Line 3 – Versus the common ancestor which had ‘clicks-and-mortar’, the trunk (before the merge) was changed to ‘empirical’ in change #5. The branch had no change versus the common ancestor. This merge shows the ‘empirical’ change as a change in #7, which technically it is (was not previously on the branch br1), but it might have been reviewed before as part of a review for #5.
It’d be great if Crucible could show that it was in a previous commit and even reviewed with a different color.
One Change not Apparent
It’d previously happened on the branch, but ‘immediate’ (line 2) had changed to ‘any’. It’s correct that it is not shown.
Crucible With Git
Make a new Git repo inside Crucible (remember the repo_name).
With this script (git_test.sh):
#!/bin/sh rm -rf git_workingcopy git clone $1 git_workingcopy cd git_workingcopy echo "** Create trunk version of file" echo "Efficiently unleash cross-media information without\nimmediate value. Dramatically\nmaintain clicks-and-mortar solutions without\nfunctional aspects." > file.txt git add file.txt git commit -m "initial version on master" echo "** Create Branch .." git branch br1 git checkout br1 if [ "$2" = "mod" ]; then echo "** .. With small Modification during branching" perl -p -i -e "s/aspects/bits and pieces/g" file.txt fi echo "** Modify trunk version of file" git checkout master perl -p -i -e "s/cross-media/sexed-up/g" file.txt perl -p -i -e "s/clicks-and-mortar/empirical/g" file.txt git commit -am "follow-up change on master" git checkout br1 echo "** Modify branch version of file" perl -p -i -e "s/cross-media/socially-awkward/g" file.txt perl -p -i -e "s/immediate/any/g" file.txt git commit -am "divergent and potentially clashing modification on the branch" echo "** Merge trunk to branch at HEAD, preserving 'any' from the br1 ancestor, taking 'empirical' from the master, and choosing neither ancestor with 'tie-dyed'" git merge --no-commit -s ours master echo "Efficiently unleash tie-dyed information without\nany value. Dramatically\nmaintain empirical solutions without\nfunctional aspects." > file.txt if [ "$2" = "mod" ]; then perl -p -i -e "s/aspects/bits and pieces/g" file.txt fi git commit -am "merge from trunk to branch, with resolved conflict (and an auto-merge)" echo "you'll do the push yourself - master AND br1 :)"
Populate the test commits in the repo:
Now go to the Commit graph for that:
- Those lines look like they are crossing. It’s just a display glitch in Cruicible – it’s a rhombus really.
- The modification to trunk, on an implied timeline looks to have happened chronologically before the branch was made, and that was not the case. Again, that’s a display choice in Crucible that’s harmless.
- There’s only four commits. The branch creation isn’t shown as a commit (normal Git behavior), and only the first change on that branch is shown.
- Unlike for Subversion, Crucible shows the merge line from the latest one the trunk to the latest on the branch.
Let’s look at the merge commit:
Arghh, it’s a nightmare, it’s not showing me the diff on the branch. The previous commit on the branch looks like this:
How can something change from ‘immediate’ to ‘any’ in two successive commits on the same branch? The problem is that, in Crucible at least, the parent of the merge was the last version on the trunk, not the last version on the master branch.
Crucible with Perforce
Same as Subversion. I didn’t make a script, although it would have been only 15% longer than the Subversion one due to increased command-line setup complexity. Caveats: the initial branch population in Crucible – clicking on the diff does nothing (can’t see anything). Atlassian note there are limitations in Perforce too.
No pics, as I’m all picced out for now.
With Crucible and any of Subversion/Git/Perforce there were scenarios that would lead to changes being reviewed twice. Really though it’s going to be a very small percentage of cases, and you shouldn’t worry about it. I have not explored, but it could be that there are things that don’t get reviewed at all, in a multi-branch code-review design.
I can’t help but feel that Google’s linear and “Continuous Review” is simply the most efficient way to do reviews. Sure we can’t do roll-up reviews (not otherwise covered in this article) by methodically reviewing every commit soon after they happen, but I’m not sure I’d want to if I’m losing all the context that comes with the commit messages.
Actually, I’m sure a Trunk Based Development workflow (regardless of your local-branching model), with reviews of the commits on the trunk only and ASAP is best. “Continuous Review” isn’t my coin by the way, but I like it.
Published at DZone with permission of Paul Hammant , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.