Plastic stress testing
I've been focusing on performance for the last month or so. Basically I've been checking how Plastic works under heavy load.It is very interesting to note how behavior differs from one single user accessing the server to several users running operations at the same time. I've been always concerned about speeding up plastic operations (as you've probably seen from my previous posts) and simulating hundreds of clients against a single server has been a good experience.
Ok, which was the testing scenario and how did we set it up? Well, we have several ''independent'' testing agents performing the following operations:
Repeating the whole process 5 times. The operations from the command line look like:
$ cm mkwk-wk /tmp/ -wk
$ cd /tmp/-wk
$ cm update .
$ cm mkbr br:/main/task[ITERATION NUMBER]
$ cm co file01.c
# modify the file
$ cm ci file01.c
$ cm co file01.c
# modify the file again
$ cm ci file01.c
$ cm co file01.c
# modify the file (last time)
# (no check in is done this time)
# go to another file and repeat the process
# check in all the checked out files
$ cm fco --format={4} | cm ci -
$
Well, a really simple ''testbot'' which lets you measure how well the server scales up. We have another more complex ''scenario'' in which we have a more complete ''testbot'' which is able to mark a branch as finished (using an attribute) and jump to the ''recommended'' baseline. Another ''bot'' plays the integrator role: it waits until at least 10 branches has been finished and then integrates everything into main (of course it doesn't make clever decisions when a manual conflict arises) and moves the ''recommended baseline''. This way we can simulate heavy load with very simple and independent test bots.
How do we launch the tests and gather the results? Well, using PUNit, the NUnit extension we've developed long time ago. PNUnit is totally open source and will be integrated into the next NUnit release, so maybe it becomes better known.
So, basically we're using one server which runs the Plastic SCM server and the PNUnit launcher. The launcher reads a xml configuration file and tells the agents which tests they should run. This way, using different xml files, we can define different testing scenarios.
As I mentioned above, it is very interesting to study the server's performance under heavy load. Code working fast with only a few users tends to be horribly slow when hundreds of clients make requests at the same time. I want say anything really new here, but the main things we've changed are:
And, what about the numbers? Well, we've tried with up to 40 machines (clients) and 1 server so far. All the clients were running Linux and have exactly the same hardware configuration. We have tried up to 200 simultaneous testbots but the regular testsuite was trying 1, 20, 35, 40 and 80 testbots against a single server. It is important to note that a testbot is not the same as a user but a number of them. I mean, a regular user doesn't create a branch, modify 30 files and check all the changes back in in less than 6 seconds, which is basically what a test bot does. So in reality we're simulating hundreds (even thousands) of simultaneous users.
How good are the results? and, compared to what? Well, my intention here is publish the entire test suite in a few weeks, so Plastic users can set up the testing environment to check how it performs on their environments before they make the buy decision, and then you'll be able to really check our numbers. What I can say right now is that we've created exactly the same tests for Subversion and some other SCM products (which I won't disclose yet) and right now (using BL079 as code base, preview 1 is BL081) we're faster than any of them in this specific scenario I described above. How faster? Well, from 3 times to 6 times depending on the load if you compare us with SVN, for instance. But, we still have to run other scenarios to gain a more complete view and provide better results.
0 comentarios: