Bash Script to Convert Subversion to Git
Join the DZone community and get the full member experience.
Join For FreeFor fun, and practice with bash scripting, I thought I’d see what it would look like to make a script to convert Subversion repos to Git. Mine does a fairly good job of converting the files in a trunk in thirty or so lines of code:
#!/bin/bash if [ ! -d .git ]; then echo "no .git folder - do 'git init'"; exit 10; fi if [ ! -d .svn ]; then echo "no .svn folder - checkout the trunk of some subversion repo"; exit 10; fi [ ! -d svn_to_git_commits ] && mkdir svn_to_git_commits echo -e ".svn\nsvn_to_git_commits\nsvn_to_git_revision.txt\nsvn_to_git_revisions.txt" > .gitignore git add .gitignore > /dev/null svn log | grep '^r[0-9]* ' | cut -d' ' -f 1 | cut -d'r' -f 2 | sort -n > svn_to_git_revisions.txt prefix=$(svn info | grep "^Relative URL:" | sed 's/Relative URL: ^//' | sed 's#/trunk##') while ((i++)); read -r rev; do trap "echo Exited!; exit;" SIGINT SIGTERM svn up --force -r $rev | sed '/^At revision/d' | sed '/^Updating /d' | sed '/^ /d' | sed '/^ U/d' | sed '/^Updated to/d' svn log -v -r $rev > svn_to_git_revision.txt revisionLine=$(cat svn_to_git_revision.txt | grep '^r[0-9]* ') author=$(echo $revisionLine | cut -d'|' -f2 | sed 's/(no author)/none/' | cut -d' ' -f2 | sed "s/^$/none/") date=$(echo $revisionLine | cut -d'|' -f3 | cut -d'(' -f1) messageText=$(cat svn_to_git_revision.txt | awk '/^$/ {do_print=1} do_print==1 {print} NF==3 {do_print=0}' | sed '/------/d' | sed 's/\"/\\\"/g') cat svn_to_git_revision.txt | sed "s/^ *//" | sed 's/(.*)$//' | sed "s/ *$//" | grep "${prefix}/trunk/" | sed "s#${prefix}/trunk/##" | sponge svn_to_git_revision.txt grep "^[AMR]" svn_to_git_revision.txt | cut -d' ' -f 2-99 | xargs -I {} git add "{}" grep "^D" svn_to_git_revision.txt | cut -d' ' -f 2-99 | xargs -I {} git rm -q --ignore-unmatch -r "{}" git commit --author "\"${author} <${author}@unsure>\"" --date "\"${date}\"" -m "Svn Rev: ${rev}.${messageText}" > svn_to_git_commits/"${rev}".txt echo "Svn revision ${rev} on $(echo $date | cut -d' ' -f 1,2)." if [[ $(( i % 4000 )) == 0 ]]; then time -p sh -c 'git repack; git gc'; fi done < svn_to_git_revisions.txt time -p sh -c 'git repack; git gc' echo "ALL DONE WITH A GIT REPO SIZE OF $(du -h -d 0 .git | cut -f1)."
The above uses ack, sed, grep, and sponge from [a href="https://joeyh.name/code/moreutils"]moreutils. Note: ack is ack-grep on Linux.
Timings: 12 mins to convert a repository that was ultimately only 4.4MB in size (the .git folder’s disk usage), but over a fairly slow connection.
Compare that to just over 2.75 mins for the the same repo with git-svn-clone – over four times faster. The git-svn way probably preserves more meta data on the commits, but the actual files for the final revision are identical for both versions. My script is just for trunks, and would need some tweaks to cover commits happening to branches. It already covers commits merging in to trunk from branches.
I don’t think there is anything that can be done to the script that could boost the speed more than a small percentage. I even tried Gnu parallel instead of xargs, but it blew up as git does not quietly wait lock for locks to be released during its operations. Besides, 8 mins alone is just spent doing “svn up” one revision at a time.
Published at DZone with permission of Paul Hammant, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments