The history/story behind transparent scm

Tuesday, January 10, 2012 3 Comments

I'd like to tell the story of how Plastic evolved towards "transparent scm". What you're seeing today in 4.0 is the result of quite an evolution since we first wanted to walk the transparent path.

Inspiration

Believe it or not, our original inspiration towards transparent scm was good-ol ClearCase. CC implemented MVFS, a dynamic file system able to show you the full file tree of a given version, capture all IO calls and adjust them to SCM operations.

Sample: cat foo.c@main@4

Would be captured by the CC fs driver and instead of issuing a "file not found" it would be able to get the right version of the file to be dumped by "cat".

Likewise it was able to intercept "move" operations so you didn't need to run "cleartool mv" command. What does it mean? You don't need to tell the SCM that you modified a file or that you moved it: it is intercepting all your IO actions, and making the right ops at the FS level!

So, definitely, we always wanted to have some sort "move interception" to avoid having to "tell" plastic a file or dir was moved... it would simply know.

First attempts

We code-named "glass" the first version of what we called "plastic made transparent" (hence glass).

Glass was able to use some underlying FS layer (third party FS driver code to simplify FS creation) to intercept all ops and:

  • do a checkout automatically when you modify a file (remember, checkout in plastic, perforce or clearcase sense, not svn or git. Checkout means "create a new version" instead of "download the code"
  • do move ops for you (you just move the files or dirs and glass issues a plastic mv command for you)
  • deletes and adds elements

    Well, writing your own FS is always a cool thing, but there were caveats:

  • An issue on a FS can tear down the entire system (blue screen of dead ! :P)
  • Intercepting all the IO ops impacts performance

    So, while we used glass internally, we never moved it to production mode.

    Xdiff

    Ok, you might think: what does Xdiff have to do with transparent?? Well, Xdiff implements an algorithm to find moved fragments of text even when they've been modified afterwards. It is able to calculate "similarities".

    Xdiff opened up a new door to make Plastic transparent.

    The current version

    When we started the development of 4.0 we knew we wanted a new way to deal with local changes:
  • Work in the workspace doing changes, deletes, adding files or moving them
  • Have Plastic detect what happened.

    And we applied the Xdiff technology. How?

  • Finding changes, deleted and added elements is easy (just compare disk with the loaded tree)
  • Each added/deleted pair will be run through a similarity algorithm to check whether it is the same file... moved! The same holds true for directories with a more advanced algorithm.

    This way the "pending changes view" is now able to figure out what you did on your workspace, including modified and moved files, and directories (remember dvcs like git or mercurial are not able to track directory renames or moves).

    New way or working

    Now it is straightforward to work on your favorite editor, doing refactors (renames and moves) and simply switch to plastic to checkin after detecting your changes.

    Future

    Check http://www.youtube.com/watch?v=cnJ5UgJJSkU. It is a dynamic file system based on Dokan.Net which is able to show a given configuration on a virtual "unit drive".
  • 3 comments:

    1. Meh... I use darcs, mercurial, bazaar, git, monotone (etc?) for that.

      Basically, any SCM worth its salt can detect/track changes. Plastic will just be the only one that requires a filesystem driver to do it. Note that there is FileystemWatcher (.NET on Windows) and INotify (all others) if you wanted to improve on the performance (which in practice isn't an issue).

      As the saying goes: "Now you've got two problems".

      The implications aren't trivial.

      DokanFS is not portable: it works on windows (though it is similar to what Fuse is on Mac/Linux/...).

      Plastic will require the working tree to be on a special mountpoint, which is in actual fact not (just) a filesystem: it is more like an auto-journalling database. This will hurt performance. Expect real trouble with shared fileservers, Access Control Lists (add Active Directory). Visual Studio might not perform very well working on such an FS (due to lack of/suboptimal support for memory mapped files, execute permissions; anyone work in an environment where launching executables off a non-local drive is prohibited? Right, every big corporation has that, and the mechanics rely in part on the availability of NTFS alternative streams, which can mark/sign a file as 'safe for execution'). Now, I wouldn't trust that debugging/running on IIS over the DokanFS would be problem free (will it detect changes in Web.config? Will access control be ok, or do we need to run the ASP worker under an administrator account, just so we can make it access our worktree?). I'd expect real trouble with unicode filenames, filesharing, 64 bit Windows 7/Vista support...

      Even if the filesystem is really only a proxy to an existing NTFS folder, we're not safe: that would allow concurrent edits directly to the backing filesystem. How would changes get detected when the Plastic 'transparent FS' driver wasn't properly loaded/functioning?

      Oh, and if 'renaming'/'moving' _empty_ directories is so important, by all means make stuff explicit in deployment, or consider documenting things. (I spot a rather spicy paradox between the beloved and beraved 'XDiff' (able to detect functions mov across files!) and the apparent need for tracking 'empty folder identity'. Mind you, I concur that XDiff is something of a marvel (though not entirely new) but, if you care for content over location, how would you mourn the loss of directory renames?

      I say, this looks like overkill and costs performance. The number one asset that a developer appreciates about his development workstation is swift operation. Only after that come full automation, user-friendliness and varnish.

      Cheers,
      Seth

      ReplyDelete
    2. Hi Seth,

      Thanks for the comments!

      A few remarks:

      1- The "fs stuff" is not there now. It is just a prototype. We're able to "track moves" without the help of a FS driver, of course!!!!

      2- Well, the "glassfs" will be some sort of "dynamic view", and believe me when I tell you there are huge teams out there under really high demanding environments... demanding it.

      Regarding your comments:

      > Basically, any SCM worth its
      > salt can detect/track changes.

      You now that Git and Hg can't detect directory moves, don't you? As easy as it sounds, you move a directory with 1000 files and the poor boys detect 1000 file moves... Sad, but true. Try it with Plastic :)

      > Plastic will just be the only
      > one that requires a filesystem
      > driver to do it.

      As I said, you're missing part of the picture there. We don't need it to detect changes, I'm just talking about a new potential feature.

      > Note that there is FileystemWatcher (.NET on Windows)

      I wonder why you think I didn't go through it :). FSWatcher can't detect file "moves". INotify can, but Windows CAN'T (not even the C native interface, it simply doesn't do it). File moves is the missing piece.

      Also the "watcher" is not meant for precise monitoring, you can loose changes under heavy IO load, which is basically what happens during SCM operation...

      Thanks for the remarks,

      pablo

      ReplyDelete
    3. Thanks for this response. Sadly I noticed really late (because I think it took /forever/ for the comment to be moderated?)

      Anyways. Solid arguments. Yeah, I had been misreading things a bit.

      ReplyDelete