I’ve been meaning to put together an article on Git for quite a while but I’ve always ended up postponing it for various reasons, one of which being that there is already so much material describing Git inside and out that I wasn’t quite sure I could add anything to the subject.
After thinking about this a bit more, I decided to go ahead anyway but to give this article a slightly different spin than what you might have read elsewhere. For this reason, you won’t find in this post any elaborate discussion of Git’s branching model, command line switches or graphical tools. I do include at the end of this article a list of references that covers all the technical aspects of Git that I’m glossing over today.
This article is meant for people who are interested in Git, either personally or as a corporation, but wondering what making the jump will really mean and what to expect.
What you won’t find in this post:
- Fanboy opinions (“Git is so good that it cured my asthma”).
- Hater opinions (“Git’s command line syntax is so arcane that it will make your brain leak through your ears”).
What you will find in this post is the perspective of someone who:
- … was forced to switch to Git
- … has become reasonably comfortable with it (although by no means an expert)
- … but still remembers the pain it took to get there.
State of the union
It’s hard to argue with the fact that Git has gained a lot of momentum these past years and that it’s slowly eclipsing other more popular Version Control Systems (VCS), such as Subversion or Perforce. Just a few days ago, the Subversion team released their roadmap which promptly generated discussions wondering whether Subversion wasn’t dying (I think it’s a bit exaggerated and the Subversion team is making the right decision by focusing on the non distributed aspect of their VCS).
What to expect
Switching to Git can be very easy or absolutely traumatizing, depending on how you approach it. Individual users will usually have a pleasant experience since they can start small and expand their knowledge from there, but corporations have to take a big jump, retrain entire teams of developers, adjust their tools and their expectations and be ready to give up on a few benefits that they don’t think they can live without.
As a user
This is undoubtedly the best way to start with Git. Start by picking a hosting service, which can be either remote (e.g. Github) or local (your own machine), and from that point on, you can pretty much get by with just a few commands:
- git add [file] to tell Git to track a new file.
- git commit -a -m "Commit message" to commit the changes you have made to your local repository.
- git push to push your changes to the “server”.
- Additionally, you will probably want to learn about git clone or git pull to download a repository to your machine.
You can go very far with these simple commands, which mimic almost exactly the work flow of a non-distributed VCS. This is the best way to get comfortable with Git until you get bored, at which point you can start exploring some of its advanced features.
As a team
Switching an entire team or organization from a regular VCS to Git is a complex project that will require managing not just the technical complexity but also the users’ expectations. I could write a lot of pages on this topic alone, but for now, I’ll just give you a few quick tips:
- The transition will work much better if several of your employees are already familiar with Git and can help you evangelize the idea and provide support to reticent users.
- Be very clear on the features that they will have to learn to live without (e.g. monotony, see below) and emphasize the personal benefits (e.g. branch juggling, see below as well). You won’t get a lot of credit by telling them that they can inspect each other’s repositories, but they will most likely appreciate the fact that moving between several lines of code is trivial.
You will find plenty of definitions for what a “Distributed” Version Control System is, but let me just sum it up for you and share a dirty little secret at the same time:
- A DVCS doesn’t have a central server.
- I bet that most people using a DVCS are still using a central server.
Let me rephrase this: I think the DVCS aspect of Git is not its main strength, but it does provide a lot of benefits which are really what get people hooked to Git.
The one thing you need to know when you are using a DVCS such as Git is that whenever you work on a depot on your computer, you are carrying the entire repository with you. I’m simplifying this a little bit, but it’s the general idea. In contrast, centralized VCS will usually allow you to create “views” or “clients” on your local machine, but the only place where the full repository is stored is on the server (and hopefully, its backups).
Carrying a full repository on your machines sounds like a crazy idea but it comes with a lot of benefits (“cheap branching” and “easy offline work” are two benefits that developers will immediately appreciate) and, more importantly, it works, mostly because Git is extremely efficient in the way it stores the representation of that repository (the fact that hard drives are cheap doesn’t hurt, of course).
Since every user carries a full repository on their machine, everyone is a potential server, which means it’s very easy for developer A to “pull” a change from developer B in order to inspect it and then discard it or merge it in their own repository. This sounds very useful, but in practice, I think that most regular Git users will not take advantage of that feature, preferring to push to and pull from a central server. And by “regular Git user”, I mean “everyone except the Linux kernel team”.
Taming the beast
Git’s user interface is probably its greatest weakness. It’s inconsistent, counter-intuitive and it often reappropriates verbs and nouns that have a clear meaning in the VCS world and it makes them mean something else.
I’ve been trying to come up with a lot of ways to rationalize this bizarre and arcane user interface, but in the end, all I can tell you is: “Deal with it”.
I know, it’s sad, but there is really no other way. Here is an analogy that might help, though: to me, learning Git is very similar to learning a foreign language. Natural languages are notoriously hard to learn for adults because their organic growth has resulted in all kinds of inconsistencies and oddities. At the end of the day, the only way to learn a foreign language is to memorize, memorize and memorize. As years go by, practice make memory regurgitation more automatic and the use of that language requires less and less conscious effort, but the learning curve is something that just can’t be avoided.
Git is very similar in this respect. Don’t try to understand why checkout suddenly means something completely different from what you were used to or to explain the various switches of the reset command: memorize the recipes and just apply them. And don’t be shy about asking seasoned Git users, they are used to getting these questions and they will give you instant answers that will save you a lot of time hunting down the documentation.
However, there is hope for those of us who want to do more than regurgitating Git recipes: if you are serious about becoming comfortable with Git, you should definitely spend some time trying to understand its internals. I’m referring to the graphs, objects, commits, heads and branches that form the backbone of Git. As opposed to its ugly user interface, Git’s underlying infrastructure is very solid and extremely well designed. On top of that, it’s actually not that hard to understand, and once you do, a whole new world will open up.
Once you understand these internals, you will be able to solve problems conceptually by simply visualizing the operations that need to be done on that underlying structure (moving the head there, branching here, rebasing this, merging that, etc…). Once you have solved that problem conceptually, applying the awkward Git user interface to actually do the work will look much easier, albeit still not very natural.
The end of monotony
Centralized VCS such as Subversion and Perforce offer a very popular feature: monotonic revisions. Whenever a new change list is committed to the main repository, it receives a unique number that’s guaranteed to be greater than the previous submission. This makes it very easy to compare files and commits chronologically and to determine which one is the most recent.
Unfortunately, you will have to give up this feature with a DVCS. Because of its distributed nature, it’s not possible (or rather, “practical”/”easy”) to offer such a monotonic number. There is no denying that this feels like a huge step backward, especially for large teams.
However, the absence of such a feature reflects the more fundamental reality that these numbers are not always accurate, and that the bigger and more active and branched out a project is, the more meaningless they become. When branch 2.1 receives commit #720 and branch 2.0.4 has #742, what do these numbers actually tell you about these change lists?
The good news is that there are ways to make up for the absence of monotony, and the main difference is that the tagging will be performed by humans, which is what a lot of VCS-based organizations are already doing anyway.
When you are trying to make the case for Git in front of a skeptical crowd, it’s important to provide anecdotes that developers can relate to. I promised that this post would avoid content that can be easily found elsewhere, but I have to talk a little bit about branching because for me, understanding this was the first time that I actually acknowledged a very tangible advantage of Git over other source control systems.
The daily routine of a software developer typically involves a lot of multitasking. As much as you want to focus your work on “feature A” and you don’t want to be interrupted, there are many reasons why you might be forced to put your work on hold and start working on “bug B” or “feature C” before being able to resume work on “feature A”.
Switching back and forth between tasks has always been painful with regular VCS but Git makes it very smooth. That’s all I’m going to say about this topic, but if you want to find out more, most of the links that I supply at the end of this article will show you how to achieve this goal very early on in the tutorial.
Gerrit is a web based application that allows you to perform code reviews for Git repositories. It’s basically a server that acts as a gatekeeper: developers push their Git changes to it, other developers review them and when these changes are approved, Gerrit pushes them to the main repository on behalf of the original submitter. If you’re curious to see Gerrit in action, here is the Android open source Gerrit.
The reason I mention Gerrit (besides the fact that I contribute to it in my 20%) is that it represents an interesting compromise where a DVCS is used as a centralized VCS. You might wonder if using a DVCS with a central server is not a bit paradoxical, and I can’t argue that it is in a certain way, but I see this as having the best of both worlds.
The Gerrit server acts as a canonical repository from where developers can clone their own repositories but all the facilities offered by a DVCS (direct push/pull between developers) are still possible and fully compatible with this model. For example, you could pull a change from a fellow developer, makes a few additional changes and then push that change to Gerrit so it will be committed to the canonical repository. After that, both developers can pull and merge in order to reconcile their local repository with the central one.
First of all, I’d like to apologize for the amorphous nature of this post. I tried to structure it in a way that would show some linearity and consistency, but this is basically a big brain dump of points that have been twirling in my head for a while.
Looking back, it took me a while to warm up to Git (quite a while), but now that I’m here, I really enjoy using it. As I hinted throughout this essay, the benefit will probably not be very apparent for very small teams that mostly commit and push (although you will probably find value in branch juggling even if you’re the only one on your project), but as the team and the number of committers and releases grows, you will find that Git provides an undeniable increase in productivity.
- Pro Git. This is a free PDF book written by Scott Chacon, a Git expert who happens to be also very good at explaining Git concepts very clearly.
- The Git community book. Another book, written by the entire Git community, so not as consistent as Chacon’s book but with quite a few useful tips as well
- Git user’s manual, written by Linus Torvalds and a few other authors.
- Git magic. A small and digestible overview of the main concepts.
- Git ready. A web site dedicated to providing Git recipes to solve common problems. A great time killer when you feel like you’d like to learn something about Git but you’re not sure what.
- Git documentation. A misleading name that might lead you to think this is the authoritative documentation for Git, which it’s not. It’s more like a hub that points to other Git materials.
- Embrace the Git index. A good overview of one of Git’s core components.
- Smacking Git around. Another document by Scott Chacon, this time showing advanced tricks.
- Git from the bottom up. Save this for the end and only if you want to become a Git expert. This book explains how Git stores its objects and dives deep into the Git internals. You don’t really need to know any of this to be a happy Git user.
- Git manuals. UNIX manuals usually represent the worst possible way to document tools, but the Git manuals are a pleasant exception in this ocean of mediocrity because they do something right: they show you examples for every single command. See for example the manual for checkout (scroll to the bottom).