From Subversion to Git in a morning
The Agile Zone is brought to you in partnership with JetBrains. Learn how Agile Boards in YouTrack are designed to help teams plan, visualize and manage their work in an efficient manner, with support for both Scrum and Kanban processes.
From when I arrived in this new company as a consultant, I pushed Git as the emergent version control system, knowing its power from my experience in the open source world.
The company I am in had a complex layout for its codebases: a central Subversion server with some application repositories, plus an internal repository set up via svn:externals which was shared between the application one. Moreover, other vendors Subversion repositories were set up via svn:externals, such as the one of Zend Framework 1 and Doctrine 1.2.
Here is the story of how in a single morning we migrated our repositories (not a large number of codebases) from Subversion to Git. I won't include much configuration commands, since there are already very good howtos that I now know by heart and which you can follow to setup your own Git server. What I'll tell you in this article are our experience in migrating and the whole story of how we did it from scratch (an additional, empty virtual server).
First: setting up gitosis
The first step was finding a way to host a central Git repository, to keep as a point of reference. Git does not actually need a central server due to its distributed nature, but our workflow does.
We chose gitosis, the leader Git hosting application, for the installation on an Ubuntu Server machine (a virtual one); gitosis is available in the Ubuntu repositories and it creates a gitosis system user.
We gave a name to the machine that every /etc/hosts in the development boxes now contains. The name chosen was Homer, the Greek bard. The boxes were all Linux machines, which simplified much the search for client software: the Git command line client.
We created rsa keys for each user/machine via ssh-keygen and uploaded the first public one via ssh to initialize gitosis. Once you have a public key in gitosis configuration, you can clone the gitosis-admin repository and put other public keys there and manage the configuration. When you're finished, you simply push your changes: gitosis uses Git to version the configuration of its Git repositories, which is very cool and in the spirit of an open source product.
As the final part of the setup, we configured (with git config --global) the user.name and user.email properties for every author, along with the color.ui to auto to display colors in Git status output.
Second: creating the applications repositories
git-svn is the package to install in Ubuntu to make the Git svn command available. With this extension, you can talk download the whole history of a Subversion repository in your own Git local one.
If you want to use the users translation (from Subversion's user to Git authors), prepare a users.txt file like described in the related howto. Note that if you configure the porting tool to use this, you must include in the list all the users that have done even only one commit.
After the configuration and git svn fetch, we had a full Git repository on our local machine, with a git log equal to our old svn log: all the revisions had been kept and translated.
The next step was git remote add origin gitosis@Homer:repositoryname.git to define a remote to push to. With this configuration, we executed Git push origin master the first time, and then Git push which will push all the matching branches by default.
Third: emulating svn:externals
The problem with porting Subversion repositories via git-svn is that svn:externals properties are ignored (at least by default). So for our internal repository shared between projects, we have to substitute the svn:externals property with a Git submodule (after porting it too, of course).
Submodules are Git repositories of their own, much like in Subversion. However they do not update silently, since when you add a submodule, you really take out a specific commit. You must do a pull manually in the submodules folder when you want to update them.
The only information saved in the master repository is the commit of the submodule which we are arrived to. When you edit a submodule within your master repo working copy, you must first execute git checkout master in it to move the working copy on the master branch, since by default it is freezed on the particular commit.
To save your work within the submodule you then push the commit, then commit also the folder of the submodule in the main repo, which will change the name of the commit (for example from ab5f.. to 67da..) which the current main commit depends on. It's a bit noisy when done the first time, but it is very clean and keeps the submodule effectively separate from the master repository.
Another issues we encountered was that Doctrine 1 and Zend Framework 1 do not have Git repositories (Doctrine 1 has one, but without any tags actually), apart from unofficial mirrors. So we now mirror them with git-svn on our server, and have a local copy of them as a Git repository to keep as a submodule. I'm not a fan of mixing svn:externals in Git repositories, so in this solution we have an additional layer of indirection, but a cleaner setup.
Doctrine has also sfYaml in its svn:externals definition, so we setup our Doctrine 1 Git mirror with another submodule containing a small mirror of sfYaml.
By the way, the whole solution of using our own Git mirrors to serve submodules is lightning fast since now git clone takes 10 seconds, while svn checkout had always taken some minutes to finish.
If you want to imitate our setup, remember that working with various submodules is tricky and involves many commands in sequence. We added a couple target tto our phing buildfile:
- submodules-setup, which executes git submodule init and git submodule update in repositories with submodules present; git submodule add instead has been executed only one time and its results have been saved in the .gitmodules file. It's amazing how Git does not have magic metadata features like subversion properties but relies on itself to save the .git* configuration files.
- submodules-pull, which executes git pull in each of the submodules folders in case we need to update them.
Now we have a Git repository for each of the development machines, plus a central server where we push our branches for collaboration. We do a lot of atomic commits instead of monolithic ones, which have the defect of being difficult to revert. We made branches without fear, and we stash changes, switch branch and reapply them like a charm. Git was really worth the hassle of porting our codebases and I definitely recommend you to think about switching.