HOWTO: Partially Clone an SVN Repo to Git, and Work With Branches
Join the DZone community and get the full member experience.
Join For FreeI've blogged a few times now about Git (which I pronounce with a hard 'g' a la "get", as it's supposed to be named for Linus Torvalds, a self-described git, but which I've also heard called pronounced with a soft 'g' like "jet"). Either way, I'm finding it way more efficient and less painful than either CVS or SVN combined.
So, to continue this series ([1], [2], [3]), here is how (and why) to pull an SVN repo down as a Git repo, but with the omission of old (irrelevant) revisions and branches.
Using SVN for SVN repos
In days of yore when working with the JBoss Tools and JBoss Developer Studio SVN repos, I would keep a copy of everything in trunk on disk, plus the current active branch (most recent milestone or stable branch maintenance). With all the SVN metadata, this would eat up substantial amounts of disk space but still require network access to pull any old history of files. The two repos were about 2G of space on disk, for each branch. Sure, there's tooling to be able to diff and merge between branches w/o having both branches physically checked out, but nothing beats the ability to place two folders side by side OFFLINE for deep comparisons. So, at times, I would burn as much as 6-8G of disk simply to have a few branches of source for comparison and merging. With my painfullly slow IDE drive, this would grind my machine to a halt, especially when doing any SVN operation or counting files / disk usage.
Using Git for SVN repos naively
Recently, I started using git-svn to pull the whole JBDS repo into a local Git repo, but it was slow to create and still unwieldy. And the JBoss Tools repo was too large to even create as a Git repo - the operation would run out of memory while processing old revisions of code to play forward.
At this point, I was stuck having individual Git repos for each JBoss Tools component (major source folder) in SVN: archives, as, birt, bpel, build, etc. It worked, but replicating it when I needed to create a matching repo-collection for a branch was painful and time-consuming. As well, all the old revision information was eating even more disk than before:
- jbosstools' trunk as multiple git-svn clones: 6.1G
- devstudio's trunk as single git-svn clone: 1.3G
So, now, instead of a couple Gb per branch, I was at nearly 4x as much disk usage. But at least I could work offline and not deal w/ network-intense activity just to check history or commit a change. Still, far from ideal.
Cloning SVN with standard layout & partial history
This past week, I discovered two ways to make the git-svn experience at least an order of magnitude better:
- Standard layout (-s) - this allows your generated Git repo to contain the usual trunk, branches/* and tags/* layout that's present in the source SVN repo. This is a win because it means your repo will contain the branch information so you can easily switch between branches within the same repo on disk. No more remote network access needed!
- Revision filter (-r) - this allows your generated Git repo to start from a known revision number instead of starting at its birth. Now instead of taking hours to generate, you can get a repo in minutes by excluding irrelevant (ancient) revisions.
So, why is this cool? Because now, instead of having 2G of source+metadata to copy when I want to do a local comparison between branches, the size on disk is merely:
- jbosstools' trunk as single git-svn clone w/ trunk and single branch: 1.3G
- devstudio's trunk as single git-svn clone w/ trunk and single branch: 0.13G
So, not only is the footprint smaller, but the performance is better and I need never do a full clone (or svn checkout) again - instead, I can just copy the existing Git repo, and rebase it to a different branch. Instead of hours, this operation takes seconds (or minutes) and happens without the need for a network connection.
Okay, enough blather. Show me the code!
Check out the repo, including only the trunk & most recent branch
# Figure out the revision number based on when a branch was created, then
# from r28571, returns -r28571:HEAD
rev=$(svn log --stop-on-copy \
http://svn.jboss.org/repos/jbosstools/branches/jbosstools-3.2.x \
| egrep "r[0-9]+" | tail -1 | sed -e "s#\(r[0-9]\+\).\+#-\1:HEAD#")
# now, fetch repo starting from the branch's initial commit
git svn clone -s $rev http://svn.jboss.org/repos/jbosstools jbosstools_GIT
Now you have a repo which contains trunk & a single branch
git branch -a # list local (Git) and remote (SVN) branches
* master
remotes/jbosstools-3.2.x
remotes/trunk
Switch to the branch
git checkout -b local/jbosstools-3.2.x jbosstools-3.2.x # connect a new local branch to remote one
Checking out files: 100% (609/609), done.
Switched to a new branch 'local/jbosstools-3.2.x'
git svn info # verify now working in branch
URL: http://svn.jboss.org/repos/jbosstools/branches/jbosstools-3.2.x
Repository Root: http://svn.jboss.org/repos/jbosstools
Switch back to trunk
git checkout -b local/trunk trunk # connect a new local branch to remote trunk
Switched to a new branch 'local/trunk'
git svn info # verify now working in branch
URL: http://svn.jboss.org/repos/jbosstools/trunk
Repository Root: http://svn.jboss.org/repos/jbosstools
Rewind your changes, pull updates from SVN repo, apply your changes; won't work if you have local uncommitted changes
git svn rebase
Fetch updates from SVN repo (ignoring local changes?)
git svn fetch
Create a new branch (remotely with SVN)
svn copy \
http://svn.jboss.org/repos/jbosstools/branches/jbosstools-3.2.x \
http://svn.jboss.org/repos/jbosstools/branches/some-new-branch
From http://divby0.blogspot.com/2011/01/howto-partially-clone-svn-repo-to-git.html
Opinions expressed by DZone contributors are their own.
Comments