Version control scalability shoot-out!
Let's go straight to the point: we took 2 of the biggest mainstream version control systems and put them to work under really heavy load: 1 single server and 100 concurrent clients. And then we compared them with Plastic SCM 3.0.Test environment
A mainstream QuadCore 64bits server with 4GB Ram. Nothing fancy at all, just what you can purchase with about $500.
100 desktop computers like the ones your team can use, all of them running Linux. They're quite heterogeneous: from single core 5 years old machines to newer 4 cores, from 1.5 Gb RAM to 4Gb.
Server and clients are connected by a simple 100Mbps network.
Sample repository
We tested with a variety of different repositories, from really small ones to larger ones.
The one I'm describing today is just a small one (and I can tell you results only get worse for the slow SCMs with more data...):
Test automation
In order to automate all the client machines we used PNUnit, you know, the extension we made to NUnit for "parallel" testing. Quite useful for load testing.
Test scenario 1 - working on trunk
A really simple scenario every developer is familiar with: just commit changes to the main branch.
Every client will do the following:
Test scenario 1 - working on trunk - results
Ok, how our beautiful friends behave under really heavy load? Considering we tested with Subversion and Perforce, 2 of the most used version controls on the market, we expected high scalability... :)
We used SVN 1.5.7, Perforce 2009.2 64bits and Plastic SCM 3.0.
All results are using a Windows server and Linux clients, except for Subversion: we run the SVN server on Linux (dual boot server machine) because on Windows it couldn't handle more than 30 concurrent clients without consistently crashing (out of memory, 4Gb!!! and gone).
We run the same test described above with 1 client, 10, 20, 50 and 100. Check the results here:
Surprised????
The two old irons doesn't scale that well at all, uh? ;-)
Plastic is using a SQL Server backend and it seems it can handle the load much better than the others, even doing trunk development.
Test scenario 2 - working on branches
The second scenario tries to reproduce a "branch per task" pattern, something we strongly recommend with Plastic.
The scenario is as follows:
Test scenario 2 - working on branches - results
We always say most of the version control systems out there are not ready to handle branching, and we always hear people asking why.
Ok, a picture is worth a thousand words.
If you miss some data point in one of the version control systems compared is not because of a mistake, the reason is that the server simply starts locking too much, rejecting clients and making the test fail (even considering that the test is able to handle retries if it gets rejection errors).
More data
I'll be sharing the data regarding the Plastic server running on Linux in the coming weeks. We used MySQL on Linux and while it is slightly slower, it still consistently beats all competitors.
you should do a comparison to git.
ReplyDeleteHi @anonymous!
ReplyDeleteYes, that would be a very good idea, indeed we were talking about it today.
But in order to really compare Git under heavy load, what do you think the scenario should be?
We considered something like:
- make changes locally
- push
And then of course have 100 nodes doing the same...
I think it could be fair.
This benchmark does make you look rather good.
ReplyDeleteHowever, the repository size simply doesn't reflect something 100 developers would work on. How about you try with something closer to 50-100k files. Also, how large was the revision history?
At what frequency were the operations submitted? As fast as possible? Or something more closely resembling human behaviour?
Hi bionic,
ReplyDeleteThe respository size is tiny. As I mentioned in the post, we have more tests results but I started publishing the tiny one.
We passed exactly the same tests working on a repository with 25files on trunk (LAST) in two modes: without any data (empty files, so we don't get affected by the network) and with data. Plastic is always faster under heavy load.
The repositories are empty for SVN and Perforce when the test starts. If not, they even get slower, something that doesn't affect Plastic either.
Finally, yes, the operations are submitted as fast as possible, so in fact we're simulating much more than 100 users. I wouldn't know exactly how many but of course more than 100 since obviously a human is not able to work that fast.
I can tell that the 100 users tests generates the same workload in 30 minutes on a 25k files repo than what we consistely track on a real team, with the same size, in 8hours of work (a regular day).
Hope I've answered all your questions.
It is a bit unfair to compare against Subversion 1.5.7 when the latest version is 1.6.12. Also there's no information where FSFS or BDB backend was used, although that can make quite a difference under such load.
ReplyDeleteHi Filip,
ReplyDeleteActually we did run against the latest SVN too, but we didn't find any differences in terms of performance. When we run the last testsuite last week we just tried with 1.5.7.
In fact, we're testing against SVN under heavy load since 2006 and 1.4 was the only one (and still is) not crashing when running on Windows (server).
About the backend type, we're using the default FSFS. We can easily rerun using BDB and share the results.
We have Subversion implementations with 40,000 users, 18 million transactions per day. That's real life not a lab. That's why Subversion has 5 million implementations and probably why bogus 'bench marks' are published that are NOT independent or endorsed by a REAL customer.
ReplyDeleteI am more than happy to put your product into our lab and push it through it's paces with our simulation spec which is, by definition a MASSIVE implementation.
http://www.WANdisco.com
Hi Pablo,
ReplyDeleteI wouldn't expect dramatic differences with the newer SVN version, but it seemed a bit unfair to compare the latest and greatest Plastic SCM version against year old version of SVN.
If it's not too much effort I would welcome the comparison to SVN w/ BDB backend since it has very different performance characteristics than the FSFS one. It is faster for certain operations and slower for others. Also I hope the SVN protocol was used for accessing the repository and not the HTTP one, which arguably performs much worse than the native protocol.
That said, I welcome all the improvements to Plastic SCM and applaud all the work on the performance side.
Hi Filip,
ReplyDeleteAbsolutely. I can tell you what we tried on Windows server is 1.6.4 (r38063), and on the clients we've 1.6.11 (r934486). But we were never able to avoid crashes with the Windows server, that's why we tried with a Linux server and, as expected, it went smoother.
We used the svn protocol, not http.
We will definitely give a try to the other database backend and I'll be more than happy to share the results, of course. We've been working on performance for a long time and of course Subversion is a widely adopted product, so trying to beat it is a big challenge for a team our size.
Thanks for the remarks and comments, I hope to be sharing more info soon.
I'd also like to add Hg and Git to the comparison, although we're still trying to define a fair scenario, as I mentioned above in a previous comment.
Thanks! :)
pablo
Hi Elaine,
ReplyDelete40k users! That's impressive!
I guess you're using a good number of SVN servers to support that, aren't you?
I mean, that's still great, but it is not exactly the same kind of test.
We're just testing a simple $500 server handling as much load as it can, and there I can tell you SVN doesn't do a good job.
Of course the scenario I described maybe is not fair for SVN (although I doubt we can find anything simpler than the first one, work on trunk, it is rather simple as I wrote on my post), or maybe our configuration is not the best for SVN and there are some tweaks we can use to improve performance, as Filip was mentioning.
I'll be happy to share the commands we're running on plastic and SVN for the test too, of course.
And, definitely, I'd love you guys to give a try to Plastic. I know you've done an excellent job trying to get the best out of Subversion, so if you're looking for something more scalable and functional, we'll be more than happy to talk. If you just want to try to crunch plastic through your test suite, of course we'd love to see that too. :)
Only one remark: being smaller than Wandisco doesn't make us ''bogus'', just makes us smaller.
Thanks for your comments.
pablo
Hi Filip,
ReplyDeleteWe're testing *right now* on our testing cluster (where we've another 100 machines, like the customer site we used before) with the Berkeley backend on Linux, with the tiny repos.
The SVN Linux server directly goes out of mem as soon as the 100 clients start updating.
majestic:~ # svn log svn://localhost:8084 -r head
svn: Berkeley DB error for filesystem '/root/svnrep/db' while opening environment:
svn: Cannot allocate memory
svn: bdb: unable to allocate memory for mutex; resize mutex region
We'll continue testing and trying to correctly configure it.
With only 10 clients, the Subversion Berkeley DB backend is much, much slower than the FSFS one.
ReplyDeleteSo I think we selected the fastest for a fair comparison.
We're also trying with the latest SVN server on Linux and it is slightly slower than the one we used (1.5.x) for testing.
It would be interesting to see more details - e.g. how you provoked or avoided merge conflicts.
ReplyDeleteAlso potentially interesting to reproduce the test :)
Anyway, good to see you making progress!
Robert
Hi Robert,
ReplyDeleteThe test bots (as we call them) are configured so they always modify file ranges that don't collide, that's how we do avoid merging (that would be a subject for another test).
We do use PNUnit as the load test coordinator (you know, our humble contribution to NUnit) and yes, I'd be more than happy to share the code and the setup, no prob. I just have to find some time to make it happen ;)
Thanks for the support!
pablo
It's also unfair to judge Windows v. Linux in general. The file systems are so different that the performance is going to be QUITE different between the two. Memory management is also quite different. You might be blowing out your memory cache on Windows way faster than Linux, which means hitting disk I/O for performance.
ReplyDeleteBasically, this whole study is kind of non-scientific. Take it with a grain of salt. There's no control, no consistency. You're not testing one SCM v. another, because your environments are too different.
Re-do the tests/benchmarks with proper controls in place and maybe I'd take the numbers seriously?
Hi @astorax,
ReplyDeleteIt always amazes me when I find someone commenting on a 4 years old thread. Lucky you that I'm here to answer.
Ok, let's see: we are not comparing Windows vs Linux. We're comparing all systems under the same conditions: Windows server and Linux clients. So the conditions are the same for all systems in the test. Is it clear?
We redid the tests only for Subversion because SVN crashed consistently on Windows.
So please, before saying the results are not scientific (which probably aren't anyway) take into account that we're not comparing different OSs at all. It has nothing to do with the memory cache or anything you mention. It is just about testing different systems under the same conditions and watching how some break.
Anyway, this was written back in 2010. We don't even compare to SVN anymore because we run circles around it in such a way that it is not even worth checking. And we don't find it in the market anymore either, only teams moving away from it.
Ah, didn't even see 4 years old. :) It's making the rounds on Twitter again, that's where I saw it.
ReplyDeleteAlso, I misread when I went over the server configuration Windows v. SVN. I flipped that. :D Apologies for that.
Still, having said that, I more accurate representation (at least from most admins I know) would be to run Linux servers as that's more common since the Linux OS is better at handling massive requests. Windows has known limits in number of processes/handles for example. Particularly depending on which version of Windows we're talking about (Windows Server vs. a home edition like Win7 or WinXP).
I know, for example, that Perforce because of how they fork their processes on the back-end for incoming requests can VERY quickly, depending on load, max that out and appear slower.
Also, I'm curious what clients/setups were used? Are we talking all command client all the time across all 3? Were you using the Plastic GUI and/or P4V for Perforce?
I'd echo the sentiment of an earlier poster as well, about doing tests against much larger (read more realistic) data sets. 100 clients is reasonable for a mid-sized company, but out in the wilds, it's a pretty small count.
1400 files is also just plain unrealistic. On my own side indie project, I have thousands of files. Also, 22,4 Mb. Is that 22.4, or 22,400 (i.e. 22GB)? I'm assuming the former? So all code, no binaries then? I'd love to see tests going up into the high GB ranges for the sake of breadth of data.
Would also like to see the flip side. More commits, less files. I don't submit 100 files very often in the real world, usually it's under 10. I'd redo the submit tests with say, 10 files, 50 times. Same number of overall files, but gives a different data point.
Honestly, I'd love to see more of these types of comparisons out there, done by third party groups. Nothing against you or Plastic, it's just that anything done by a company is, by its nature, going to contain bias, be it only perceived or real. Even if the tests are done completely 'properly' it's hard to take at face value when being done by one of the companies being tested.
Again, apologies for the misread on the Windows v. Linux piece.
Hi @Astorax,
ReplyDeleteSure, lots of our customers prefer to run Linux servers, although Plastic performs quite well on Windows too.
For more updated info please check this http://plasticscm.com/version-control-for-games.html#performance-scalability-results
We used CLI for the 3 systems, in fact all commands are automated, no human intervention but an entire distributed test system in place.
Regarding more realistic datasets: sure, now we test with about 300 to 500k files. You can find these results in the other link I sent you.
And you can also check this newer one: http://plasticscm.com/under-heavy-load/index.html
The two links I sent you will go straight to the point of your questions. As I said, we don't even test against some of these systems because they directly collapse under real heavy load or really big binaries.
http://plasticscm.com/version-control-for-games.html#performance-performance-results
Regarding the comparisons made by third parties: yes, that would be great, but I'm unsure whether anyone will dedicate the time needed to set all these up just for fun. Of course whether we do it on purpose or not, we tend to be biased, first of all because we know our system better than the others.
The main reasons why our customers move away from P4 are: distributed, better branching and merging, much faster data transfer and be able to keep up (or simply faster) under heavy load.