tag:blogger.com,1999:blog-27232680.post250613614780223836..comments2024-03-20T06:54:32.435+01:00Comments on Plastic SCM blog: MD5 vs SHA1F3RD3Fhttp://www.blogger.com/profile/11524626976811746062noreply@blogger.comBlogger8125tag:blogger.com,1999:blog-27232680.post-63029213241225614062009-05-19T21:33:00.000+02:002009-05-19T21:33:00.000+02:00Hi again nosatalian,
I've to admit I was a li...Hi again nosatalian,<br /><br />I've to admit I was a little bit shocked about the way you wrote your "colorful" comment but, hey!, you don't learn if people don't tell you you're wrong! :-)<br /><br />Yes, you're right about the language. In fact, when we started developing Plastic SCM (4 years ago already!!!! we're growing up) my choice was C/C++ because it sounds like the "way to do things". <br /><br />I didn't know a line of C# back then, and being a former C++ developer (ok, maybe a C++ developer wanabee since you never master C++, do you? :-P) C# really looked like a toy thing to me.<br /><br />But Dave (CTO) knew C# (also a former C++ developer) and asked me to take a look into it. And then I had to admit that things that are normally difficult (or at least don't compile and run at first try) with C/C++, simply happen on C#. From dynamically loading code, to remote method calls, memory management and so on.<br /><br />Then we tried Mono (which was much younger than it is now), and we liked it, and we got code up and running on both Linux & Windows from day one (and MacOS joined only a few weeks after).<br /><br />To be honest, sometimes I still miss C/C++ (graphics on MacOS/Linux were not easy), but most because of the tools (compilers, profilers, editors everywhere) and thirdparty libraries (Qt) than the language/library itself.<br /><br />So far I never found a place on our code where C/C++ would be faster (maybe I'm crazy wrong), and at the end you can always do a good job with C#, even on Linux (very good speed). And if I find something simply not good enough in C# (zlib), we link against the native code (which is doable on Mac/Win/Linux).<br /><br />Maybe MD5 calculation is one of these things... :-P<br /><br />The only true downside is tool start up, which is slower (for a single cm command) than if it was native C, but normally it is not a huge hit.<br /><br />We outperform SVN by a factor of 10, even more under heavy load, and the same is true for other SCMs, although, as you pointed SVN is not the one to look anymore, but GIT. We can do really fast updates, even comparing to GIT (using network against local copy), because using threading and stuff is really easy in C# (doable, of course in C/C++).<br /><br />I'm always working on performance, so I'm really pushing to make Plastic faster... every day. GIT is a good benchmark, of course.Pablo Santoshttps://www.blogger.com/profile/08083682682597484025noreply@blogger.comtag:blogger.com,1999:blog-27232680.post-71863955474270164542009-05-19T21:11:00.000+02:002009-05-19T21:11:00.000+02:00Thats for letting that through Pablo.. whenever I ...Thats for letting that through Pablo.. whenever I see that 'Comment moderation has been enabled' I assume most of my colorful comments will end up in the ether. <br /><br />My point about C# is not really one about efficiency- clever algorithms usually beat a faster language. It just seems to me that the type of people who really care about SCMs (and hence would be writing them) are probably people with a lot of technical depth and experience. Most of those people tend to eschew C# due to the fact that Linux support is a joke, and they would rather work in an open source environment/language.<br /><br />But there may be very good reasons for you to chose C#- particularly if most of your value add is on the Windows UI side, then C# is probably the best bet for a nice UI.nosatalianhttps://www.blogger.com/profile/04982915850477822240noreply@blogger.comtag:blogger.com,1999:blog-27232680.post-68535875976411772102009-05-19T11:43:00.000+02:002009-05-19T11:43:00.000+02:00nosatalian, really funny comments.
1- In case the...nosatalian, really funny comments.<br /><br />1- In case the test is limited by the FS, it is limited for the two methods, so results are still consistent. Again, reading can be really fast once data is cached, as any FS does.<br /><br />2- I "appreciate" your business advice, I hope it's based on a strong background... :-). Yes, we primarily use C#, and still we can outperform most of the SCMs out there.Pablo Santoshttps://www.blogger.com/profile/08083682682597484025noreply@blogger.comtag:blogger.com,1999:blog-27232680.post-22757237138930864852009-05-19T08:38:00.000+02:002009-05-19T08:38:00.000+02:00MD5 is significantly faster than SHA1, you don't k...MD5 is significantly faster than SHA1, you don't know a thing about benchmarking.. you are limited by your filesystem speed. Put them in a ramdisk and then run it.<br /><br />And secondly.. are you guys seriously writing an SCM in C#? Are you out of your god damn minds? The SCM debate is effectively over- Linus won, as he usually does on technical merit, and he had the good sense to do it in C. Now, if you guys need something to do/sell, go about making git easier to use, or adding candy (merge tools, history viewers, etc) which improve workflow.nosatalianhttps://www.blogger.com/profile/04982915850477822240noreply@blogger.comtag:blogger.com,1999:blog-27232680.post-47273738843461070072008-07-08T17:41:00.000+02:002008-07-08T17:41:00.000+02:00Hey,Well, you can be smart with how you read data ...Hey,<BR/><BR/>Well, you can be smart with how you read data into memory. For example, you can only read a whole file into memory if it is less than 256kB in size. Then, you can limit yourself to a max of three files in memory at any one stage. So, if your cpu is slow, you can buffer 3 into memory, wait for one to hash, then buffer the next.<BR/><BR/>If a file is large, you can read it in 256kB chunks and keep hashing the chunks one by one. The API allows you to hash files in pieces like that.<BR/><BR/>You should be able to rig up a concept app easily enough which will tell you the potential gains. It could just be a console app with two threads, one reads files into a queue[byte], the other dequeues and hashes.Alanhttps://www.blogger.com/profile/17518005985877464643noreply@blogger.comtag:blogger.com,1999:blog-27232680.post-30951372688796189222008-07-07T14:37:00.000+02:002008-07-07T14:37:00.000+02:00Indeed, you might want to look at your file stream...Indeed, you might want to look at your file stream performance, to see if that is dominating.<BR/><BR/>Compare to:<BR/>http://botan.randombit.net/bmarks.html<BR/>http://www.cryptopp.com/benchmarks.html<BR/>http://www.cryptopp.com/benchmarks-amd64.htmlAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-27232680.post-15520827189094028972008-07-06T21:16:00.000+02:002008-07-06T21:16:00.000+02:00Hi Alan,Do you mean passing bytes instead of a str...Hi Alan,<BR/><BR/>Do you mean passing bytes instead of a stream and reading in parallel?<BR/><BR/>My concern here is your proposal will eat a big amount of memory, won't it? I mean, I normally have to hash a big number of files...<BR/><BR/>I'll try to figure out how to apply it anyway. Thanks for the suggestion!Pablo Santoshttps://www.blogger.com/profile/08083682682597484025noreply@blogger.comtag:blogger.com,1999:blog-27232680.post-62606621875877046792008-07-06T02:12:00.000+02:002008-07-06T02:12:00.000+02:00Would you be able to increase performance by split...Would you be able to increase performance by splitting the hashing of the file away from the reading of the file?<BR/><BR/>i.e.:<BR/>Thread1 - Reads the files into memory<BR/>Thread2 - Hashes the files which are in memory<BR/><BR/>In my own app i was hashing large files in chunks of 16kB and by using two threads I improved performance by ~ 10%. In your case it may not help as most files are quite small. You'd also have to take care of the case where you have large files so you don't read a 100mb file into memory.Alanhttps://www.blogger.com/profile/17518005985877464643noreply@blogger.com