Who we are

We are the developers of Plastic SCM, a full version control stack (not a Git variant). We work on the strongest branching and merging you can find, and a core that doesn't cringe with huge binaries and repos. We also develop the GUIs, mergetools and everything needed to give you the full version control stack.

If you want to give it a try, download it from here.

We also code SemanticMerge, and the gmaster Git client.

Git sparse checkouts and partial clones

Tuesday, February 18, 2020 Pablo Santos , 0 Comments

A few weeks ago, Git announced support for sparse checkouts and partial clones in beta. Since reading it, I have wanted to share my thoughts with you, so that you know we don't live in isolation of what Git does, and also, that we have an opinion about it. Hence, my intention is to share what we think these new features when compared to the current Plastic SCM functionalities that we already have in production.

In short:

  • The new Git features are about trying to avoid the mandatory local clones from growing too big by just cloning parts of the original repo.
  • In Plastic, this has been possible for a few years already but in an even more powerful way. On top of that, the clones are not even required if you decide to work centralized.

Sharing our thoughts

I know that if you are reading this blogpost, chances are you don't really care about what Git does. Most likely you are already a Plastic SCM user, and this means you probably had to quit Git. If that's you, then I bet you won't continue reading, so sorry to disappoint you.

But maybe you're interested in learning more about what the now pandemic version control is doing and will enjoy finding out that we keep an eye on what others do and don't live in isolation in our own bubble. In fact, when I have the chance to exchange thoughts with customers, whether on a call or emails, we often share thoughts about our vision, where we are going and how we compare with this or that feature by this or that other system. That's what I'm doing here.

Why sparse checkouts and partial clones are relevant

Git is distributed by design. It is mandatory that you have your local clone. This is great for small repos, but it is a pain with big ones. Git developers know that and are trying to squeeze the design so that it doesn't collapse with real big repos. This is especially true for game developers where having repos exceeding 4 TB is not anything unusual. I mean, even having repos that big simply won’t fit on a developer's laptops.

A Git sparse checkout allows you to download only part of the repo to the working copy. The problem is that you still need the repo, so while the working copy will be lighter, the repo will not.

Then a partial clone is a way to clone only part of repository, so not everything needs to be downloaded from the central.

What is really new in Git version 2.25 is the ability to combine both. You can specify what subtree of the repo you want to clone, then just checkout that part.

It is definitely a good feature and while it is still in beta, no doubt it is already useful.

What are we going to do about it?

Well, the thing is that we already did this with Plastic almost two and a half years ago.

Back then, we released "nodata replicas" to production.

You want to have your local repos, which is great when you are a developer (although not if you are an artist, more on this later) but you don't want to download tons of megs.

A nodata replica lets you clone the metadata and not the data. So, you can diff, see branches, anything, without downloading a single blob.

Where's the data is coming from then? Easy — from the original server. No need to grab a single byte of data; it will be retrieved on demand when needed, and it won't enter the repository either unless you tell Plastic to do so.

You can mix nodata replicas with regular ones. You can replicate main with nodata, so it is super-fast, and the local repo is small, and then replicate just the actual changes of a given branch.

In fact, once you clone/replicate with nodata, you will continue working on your repo, and then every checkin will obviously add data to your local repo, that you can later push back to central.

What if you need to work fully disconnected?

Ok, nodata sounds great, but it really means that you need to have an Internet connection to the central server. What if you really want to be fully disconnected?

There are several options, but they all have one thing in common: hydrate. Once you replicate a branch without data, you can go and hydrate it, which means actually downloading the data from the original repo and introducing it into the local one, so that the next time you access the data it won't go to the remote anymore.

(Note: When I say retrieve it from central, I mean from the original one you used to replicate, but you can alternatively specify a different server/repo to grab the data from when you perform the hydrate.)

You can also hydrate a single changeset.

What's so good about it? You can replicate main with nodata, and then only hydrate its head. This means you get a single working copy instead of the full repo. Now you can work fully disconnected unless you switch back to a changeset previous to that. Even if that happens, you'll only need to download any changes to the workspace, normally not a full copy anymore.

How Git and Plastic are different when dealing with nodata replicas?

Git needs to hydrate the local repo to use it; Plastic can work without any local repo data, just retrieving the required blobs on demand from the original server (or a designated alternative).

There's one great thing about Git partial clones that we don't have yet, though; they can hydrate a given path inside the repo, while we operate on a changeset basis. This is definitely something we have to add because it is very useful. Instead of hydrating from the root path, you can do it just from a sub path.

There's one remark, though; in Plastic, you can use Xlinks , like Git-submodules done right, splitting projects into multiple repos while still using them in a transparent way. That's why "sub path hydrating" is not as demanding as it could be. You can have the code repo replicated with data, and the art in nodata mode. Or alternatively, you can have the code repo replicated, and use the art repo centralized, with a Xlink pointing to the central server.

One last bit: working centralized also solves the problem

Working centralized also solves the problem, but it is out of scope for Git.

The world abandoned Subversion, that was centralized, in favor of Git, that is fully distributed. But that doesn't mean centralized was bad. Subversion was weak with branching and merging, and Git excels in that (and Plastic is even better, but that's not for this post). But Git is not good with branching and merging because of being distributed! That's a different feature.

What I mean is that it is perfectly fine to work with a working copy directly connected to the central repo. Checkin (commit in Git jargon) and done. No need for the extra push/pull.

And that's solves the problem because you don't need the local clone, and you can tweak the working copy to download only what you need.

Plastic does both: centralized and distributed, even with nodata; so, we give you a few options to solve the problem with big repos. This has been available for years, not something we just launched in beta last month.

Closing

Obviously, we keep an eye on what the competitors do, which these days means Git and Perforce since most of the others have simply vanished. (I still remember Clearcase, Microsoft TFS, Accurev, Subversion, but have you heard anything about them recently?)

Git is extraordinarily good, although we don't fear it as an enemy. Yes, it is super big, extended globally, but it also taught the world about the techniques we built Plastic around, so when a team feels Git doesn't suit them anymore, they find Plastic is a natural way to move forward.

Git's sparse checkouts plus partial clones are a great feature, but similar, and even better alternatives are found in Plastic today.

Pablo Santos
I'm the CTO and Founder at Códice.
I've been leading Plastic SCM since 2005. My passion is helping teams work better through version control.
I had the opportunity to see teams from many different industries at work while I helped them improving their version control practices.
I really enjoy teaching (I've been a University professor for 6+ years) and sharing my experience in talks and articles.
And I love simple code. You can reach me at @psluaces.

0 comentarios: