Subversion: Merkle Trees and Source Control
Check out this open source project that adds Merkle tree capabilities to Subversion, if you prefer that for your source control.
Join the DZone community and get the full member experience.Join For Free
I announced SvnMerkleizer some days ago on Twitter. It adds a Merkle tree capability to Subversion.
Why Subversion though, as Git has a history-retaining Merkle tree built-in? Well truth be told, I started it years ago an wanted to finish it, and it was also a testbed for Servirtium that delivers Service Virtualization (SV) to Java clients of remote HTTP services.
There are more reasons though.
In Defense of Subversion for This Merkle Tree Thing
Size of Repo
Subversion can go into terabytes quite easily, whereas Git has a hypothetical top limit. This is of history I mean. For Git, you’d use
--depth x to clone less history. With Subversion, it’s implicit
--depth 1 history at all times on the client side. Server-side in both cases keeps all history, of course. Sure, Git-LFS pushes Git into the place where it can handle video files more easily, but it’s not quite built-in.
Subversion can maintain read and write permissions for each directory. It can also group users together to make for terser config for that, even if the “Authz” technology is very confusing and error-prone in the hands of novices.
Partial Repo Checkout
With Subversion, you can ‘svn co’ a subdirectory — and that is all that comes down to your client (no parent directories). Git doesn’t have that. You have to clone the whole repo.
Git and Subversion have sparse checkout, which work slightly differently. Git’s is easier to use, I think. Git does not have a sparse clone, though, which means that there would not be any savings on the client’s storage for the
.git/folder, even if the working copy modified as part of the
checkout operation is reduced.
Direct Access to Files
Hypothetically, Subversion can PUT to a single file resource in the repo, without having checked out anything before that.
Subversion does not have to be up to date, before committing back. Git needs you to pull (and resolve conflicts) before you push changes back.
Arbitrary Branching Models
Subversion allows you to make branches at any point in the directory tree, but that’s a really sharp knife that you can hurt yourself with. Each team placing their source in the same repo could choose a different branching model and at any subdirectory that suits them. Perforce has the same arbitrary branching possibilities as Subversion. PlasticSCM, which has Perforce-scale as a design goal, doesn’t allow arbitrary branching. In that regard, it is the same as Git and Mercurial, in that the branch is created and maintained at the root directory (whole repo). In a Monorepo configuration, nobody misses arbitrary branching.
Git’s Lesser-Known Strengths
Direct Access to SHA1 Representations of the Merkle Tree
If you do ‘get checkout SOME_SHA1’ after cloning, then Git takes you back in time to that moment in time quite quickly, regardless of where which branch that may be on.
Even with SvnMerkleizer or similar, Subversion can’t give you direct access to the tree with that SHA1 as the root SHA1 (as it effectively is in Git). You could wind Subversion back to a given revision (a numeric sequence), and then recalculate the whole Merkle tree. With that, you could find the one you’re looking for with trial and error.
Really, though Subversion would be better if it calculated the whole Merkle tree with every commit, and made that accessible long-term. New problem: the tree would be different for each set of permissions for users/groups.
Portals Have Added Direct Update Access
Gitea, RhodeCode, and GitHub itself (after a fashion), have added the ability for you to effectively PUT a resource to a Git remote repo without first having cloned it. I have a few proofs of concept that utilize that:
Published at DZone with permission of Paul Hammant, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.