Every so often a piece of technology comes along and changes everything. Once we experience this new way of doing things, we can no longer understand how we survived without it. After we sent our very first emails, walking to the post office to drop mail seemed unearthly. And who’d replace an IDE with a text-editor?
Git1 didn’t seem the answer to my needs. I’ve been using Subversion (SVN) since 2006 and I’ve been a very happy camper indeed. Before that I used CVS and, although inexperienced with Version Control Systems (VCS), it was a major improvement over MS Source Safe (which I had used for almost 6 years before that.) I use SVN at home and at work. I’ve grown used and dependent on version control so much that I use SVN for my documents and other files, not just code. But Git? Why would I need Git?
When Git came to the scene there were already some Distributed VCS (DVCS) around (as opposed to centralized VCS, such as CVS and SVN.) But Linus made an impression with his Google Talk. I wanted to try this new piece of technology regardless of my needs. It was just too tasty to pass up. At the first opportunity, I installed the core tools and Git Extensions to ease my way with some visual feedback (I learn by reading and experimenting.)
Now that I’ve played around with Git for a while, and I’ve successfully moved some of my projects from SVN to Git, I can share my experience. Here is why I use Git even when not working with a team (where it’s infinitely more useful.)
Commit Often, Commit Many
Commits with half a dozen of -unrelated- changes is no stranger to us. A developer might add some new function, refactor another and rename an interface member all in the same change-set. This is counter-productive, because reviewing such unrelated code-change is artificially made more difficult than necessary. But, if the review unit is the commit unit, then developers combine multiple changes to reduce overhead and push them onto their colleagues. This is unfortunate, because the code should evolve in the best way possible, uninfluenced by unrelated artificial forces, such as tooling nuances. But more than reviewing, combined commits cause much headache and lost productivity when we need to go back in time and find a specific line of code, rollback or merge. But what if the changes were related? What if we need to make a few KLOCs of change for the code to even build successfully? The centralized VCS would recommend a branch. But unless the sub-project is long-term, branching is yet another overhead that developers try to avoid.
With Git, these problems are no more, thanks to local commits. With local commits, one can (and should) commit as often as possible. The change log no longer is anything more than a single sentence. The changes aren’t reflected anywhere, until we decide to push the changes onto the server. There is no longer a distinction between major changes and minor changes. All changes can be subdivided as much as necessary. No longer does one need to do local backups2, create personal branches or make every change visible company-wide or publically. Local commits are full-fledged VCS that doesn’t introduce new or extra work. When we’re done, we just update the repository in one push command.
If you need to keep some piece of code around, but do not wish to send it for review and commit, you need to copy it somewhere. With local commits, you can indeed commit it, with relevant commit-log. In a subsequent change-set, you can delete it, with full guarantee that you can retrieve it from Git later. Since this is done locally, no one is complaining and no one needs to review it. The code will be forever preserved in the repository when we push it. Later when we resurrect it, it will be reviewed as it becomes part of the current code. Indeed, with local commits, you can experiment with much freedom, with both the advantage of version-control and the subsequent repository preservation of your bits for posterity.
Notice that all this applies equally-well to private projects, single-developer public projects and multi-developer projects. The organizational advantages are only more valuable the more the participants.
Even with local commits, sooner or later we’ll need to branch off and work on a parallel line of code. And if our project is useful to anyone, the branches will diverge faster than you can checkout. Merging code is the currency of branching. Anyone who’s tried merging should know this is more often than not painful. This is typically because what’s being merged are the tips/heads of the branches in question. These two incarnations of our code are increasingly more difficult to reconcile the more changes they had experienced in their separated lives.
But any VCS by definition has full history, which can be leveraged to improve merging. So why is this a Git advantage? Git has two things going for it. First and foremost, it has full history locally. That’s right. Your working-copy (WC) is not a copy of what you checked-out, rather it’s a clone of the repository. So while centralized VCS can take advantage of the repository’s history, for Git this information is readily in your WC. The second is that with local commits, the commit unit is typically very small, this helps merging quite a bit, as it can have higher confidence regarding where the lines moved and what was changed into what.
Overall, merging with Git is otherworldly. So far, no centralized VCS can even match the accuracy of Git’s merge output.
With Source Safe, CVS and SVN it’s not rare to get broken builds because of missing files. After some point in a project’s life, adding new files takes a sporadic pattern. It’s common to forget to add the new files under the VCS, only to be reminded by colleagues and broken build emails to the humiliation of the developer who missed the files, of course. If reviews are mandatory, then fixing this error involves at least another developer, who need to sign-off the new patch for committing.
This problem arises from the fact that with these traditional, centralized VCSs, files are excluded implicitly (by default) and they are opted-in when necessary. With Git, the opposite is the case: everything under the root is included by default, exclusion is the exception. This sounds very trivial, but the consequences are anything but. Not only does this save time and avoid embarrassing mistakes, but it’s also more natural. Virtually always a file within the project tree is a file necessary for the project. The exceptions are few indeed. If you think about it, most of the exceptions are files generated by tools. These are excluded by file extension and folder names in the ignore file (.gitignore for Git.) Rarely do we add any files that shouldn’t be stored and tracked by the VCS. If it’s not automatically generated during build, then it should be in the repository.
Git is a paradigm shift in version-control. It’s not just a set of new features, it’s a different way of organizing change-sets, and by extension writing code. Git gives us better automation and tooling, at the same time it encourages us to employ healthy and useful practices. In fact, the features outlined above, do make a good use of the distributed architecture of Git. So it’s not a coincidence that it’s so much useful even for the single-user project.
If you’re using SVN, consider copying it over to Git using git-svn and playing around. Git can synchronize with the SVN, until you decide to abandon one or the other. In addition, GitHub is a great online repository. As a learning tool, consider forking any of the countless projects and play around.
1 Git has no exclusive monopoly on the discussed advantages, however I’m reviewing my experience with Git in particular. Hg, Bazaar and others will have to wait for another time.
2 Here I’m concerned with code back up that we don’t want to discard yet, but don’t want to commit either. Data backup is still necessary.