GIT vs Plastic performance
Performance! I’m starting to be obsessed by this word. Almost since the beginning of the project one of my main concerns has been performance: being able to run update operations faster and faster and faster.
If I remember correctly, months before our initial release I was already checking whether Plastic was faster than Perforce updating the workspace, which is one of the most common version control operations.
At a certain point in time we were able to beat Perforce (which claims to be the fast scm out there), so I was extremely happy. I knew Perforce was still faster than Plastic doing group check ins and add operations, but we were faster updating.
Later on I run a test to compare Plastic to SVN and… doing “update forced” (which means downloading again everything into our workspace) was about three times faster!! Good old days!
The last weeks I’ve been working on a big change to the “solve selector” code. What does it mean? Well, selectors are the key piece of the Plastic system, whether they are visible or not (it depends on whether users are using professional version instead of the standard one, and even then many users don’t feel like bothering with them, so prefer the “switch to branch” mechanism) they still play a core role into the Plastic system. Simply put: revisions are not “directly linked” to their containing directories (as it happens with “path space branching systems” like TFS, Perforce, Subversion and so on) but the system has to “solve” where a revision is depending on the selector.
Does it mean that the same revision of a file could be placed in different locations only changing the selector? Yes! This is exactly what it means.
The mechanism is extremely powerful, but it comes with a cost: it is not especially simple to implement. In exchange you get true rename and many other possibilities, but the system has to perform some work to calculate where each element is located.
Well, the point is that BL064 brings a whole new selector solving mechanism that greatly outperforms the old one. Now not only “update forced” is fast, but also detecting what has to be changed if someone committed a change to the branch you’re working on. The new system is much, much, much faster.
-“BL064? What are you talking about? You released Plastic 1.5 a few weeks ago… what’s that?” – I can hear some of you complaining… :-)
Well, we do use an internal build numbering (which is also available on the external releases, please take a look into the versioning info of the executables) scheme (as everybody else does, nothing new under the sun), so our last 1.5 release is actually BL063. But we’re currently using BL065 internally to support our development. So, yes, I’m afraid the enhancements I’m talking about are not yet available, please be patient.
Well, I was very happy with the new “selector system”, but then I suddenly remembered GIT… Oh! We thought we were the fastest ones in town doing updates but then I tried GIT and… well, it is stupid and ugly :-), but it is damn fast…
So I installed again GIT 1.5.2.4 on our Linux FC4 box, which has a couple of Intel(R) Xeon(TM) CPU 3.00GHz. It is already >2 years old but still runs quite fast.
$ uname -a
$ uname -a
Linux juno 2.6.11-1.1369_FC4smp #1 SMP Thu Jun 2 23:08:39 EDT 2005 i686 i686 i386 GNU/Linux.
Then I imported one release of our own code into a git repository. I removed all the local files (except the .git subdirectory, of course!) and run “git checkout .” which is supposed to recreate all the files. It took about 20s…
Then I tried exactly the same scenario with Plastic (remember, BL065). It took about 29s. Yes, GIT is faster but… hey! Plastic server was actually installed on another box so… all the contents was downloaded through our local 100Mbps network!!!
To make it more dramatic I’d say that the test is made against our real server which is not only holding a copy of the repository (like GIT was doing) but all the whole development but… it won’t really make a big difference if the server were containing only one copy because fortunately size doesn’t affect update performance.
The bad news (for me) is that I then repeated the “git checkout . “ (after removing everything again) and it took only 8s!!! I don’t know exactly why (disk caches?) but if I run the “rm –rf *” just before running “git checkout .” it takes much longer than if I leave some time between the operations. Plastic always performed exactly the same ranging from 28 to 30 seconds, but not being affected by the remove operation being run immediately before.
“Ok, but what happens when Plastic server is installed in the same box as the client? Wouldn’t it be a fairer scenario and it would perform even better?” you may ask…
Well, I’m afraid if you want to get the answer you’ll have to wait till the next chapter… :-)
I have a feeling Perforce will need some new marketing messages, besides just being fast, if they're to survive. Most SCM tools I've tried are at least as fast in most operations to Perforce.
ReplyDeleteGit was fast the next time because the _OS_ used its caches.
ReplyDeleteBesides that, it is is a very _pointless_ benchmark which shows just nothing.
Compare branch switching, listing of the history, looking for merge base, merges of deep and extensive histories, retrieving of a complete state some considerable time back or applying of thousands of patches. Compare the operation people _actually_ when _working_ with an SCM, not just one _rare_ operation (how often do you expect people overwrite their tree completely?!).
And BTW, you can't really compare network performance, except in very isolated net. And even then, there is no point comparing _time_. You can compare bandwidth used, count of round-trips, latency of the server and client, but absolutely not the _overall_ time.
ReplyDeleteToo many uncontrollable factors can change the situation dramatically.
> Git was fast the next
ReplyDelete> time because the _OS_ used
> its caches.
Yes, the same happens to Plastic running on some windows boxes, but it is stable performing the operation on Linux.
> Compare branch switching, (...)
Right, branch switching or merging would be better things to compare. In fact branch switching is one of the points we've optimized for the next release, based on the same "solve path" modifications.
What about checkin performance? Checking in a lot of files is not that uncommon (e.g. documenation/help files - or imports of 3rd party stuff)
ReplyDeleteAlso compression - I've never used git I confess (nor bitkeeper which makes similar claims) but in theory it compresses data very well.
Yes, GIT seems to be fast doing check-ins (remember, just copying files from one dir into a hidden dir).
ReplyDeleteCompression is quite good on Plastic too, you can have a big amount of revisions and they are heavily compressed (sample: each baseline is 400Mb, you store work of 40 baselines and it only grows to 900Mb) (of course not every baseline is a full copy, but still).
Plastic has lots of interesting features: the branching model (yes, GIT also supports branches, but I still prefer Plastic's ones), the graphics, easy to install, the security model, the selector, the multirep capability, and so on.
Usability is more important than speed. I spend more time wrestling with the source control system to get it to do what I need than actually waiting for it to checkout/update/commit.
ReplyDelete