Who we are

We are the developers of Plastic SCM, a full version control stack (not a Git variant). We work on the strongest branching and merging you can find, and a core that doesn't cringe with huge binaries and repos. We also develop the GUIs, mergetools and everything needed to give you the full version control stack.

If you want to give it a try, download it from here.

We also code SemanticMerge, and the gmaster Git client.

The fastest way to insert 100K records

Tuesday, April 08, 2008 Pablo Santos 14 Comments

I’m doing some performance tests today with our three different database backends: MySql, Firebird and SQL Server.

My goal is to find the fastest way to insert a big number of records into a table taking into account the different backends.

My test table is the following in the three databases:

CREATE TABLE testtable (
iobjid BIGINT NOT NULL,
ifield0 BIGINT,
ifield1 BIGINT);
iobjid is a primary key and the other two fields are also indexed.So, let’s go with the first loop:
IDbConnection conn = // grab a connection somehow
conn.Open();
try
{
IDbCommand command = conn.CreateCommand();
command.CommandText = "delete from testtable";
command.ExecuteNonQuery();
int initTime = Environment.TickCount;

IDbTransaction t = conn.BeginTransaction(
IsolationLevel.RepeatableRead);
command.Transaction = t;
for( int i = 1; i < 100000; i++)
{
command.CommandText = string.Format(
"INSERT INTO testtable "+
" (iobjid, ifield0, ifield1)"+
" VALUES ( {0}, {1}, {2} )",
i, 300000-i, 50000 + i);
command.ExecuteNonQuery();
}
t.Commit();
Console.WriteLine("{0} ms", Environment.TickCount - initTime);
}
finally
{
conn.Close();
}

How long does it take to insert 100K records on my old laptop?
· Firebird 2.0.1 (embedded) -> 38s
· SQL Server 2005 -> 28s
· MySql 5.0.1 -> 40sI’ve repeated the test with all the possible IsolationLevel values and didn’t find any difference.Insert with paramsMy second test tries to get a better result using parameters on the commands... Here is the code:

IDbConnection conn = //get your connection
conn.Open();
try
{
IDbCommand command = conn.CreateCommand();
command.CommandText = "delete from testtable";
command.ExecuteNonQuery();
int initTime = Environment.TickCount;
IDbTransaction t = conn.BeginTransaction(
IsolationLevel.RepeatableRead);
command.Transaction = t;
// sqlserver and firebird use ‘@’ but mysql uses ‘?’
string indexParamName =
GetParametersNamePrefix() + "pk";
string field0ParamName =
GetParametersNamePrefix() + "field0";
string field1ParamName =
GetParametersNamePrefix() + "field1";
command.CommandText = string.Format(
"INSERT INTO testtable "+
" (iobjid, ifield0, field1) "+
"VALUES ( {0}, {1}, {2} )",
indexParamName, markerParamName, revisionParamName);
IDbDataParameter paramIndex = command.CreateParameter();
paramIndex.ParameterName = indexParamName;
command.Parameters.Add(paramIndex);
IDbDataParameter paramField0 = command.CreateParameter();
paramField0.ParameterName = field0ParamName;
command.Parameters.Add(paramField0);
IDbDataParameter paramField1 = command.CreateParameter();
paramField1.ParameterName = field1ParamName;
command.Parameters.Add(paramField1);
for( int i = 0; i < 100000; i++)
{
paramIndex.Value = i;
paramField0.Value = 300000 -i;
paramField1.Value = 50000 + i;
command.ExecuteNonQuery();
}
t.Commit();
Console.WriteLine("{0} ms",
Environment.TickCount - initTime);
}
finally
{
conn.Close();
}

How long does it take now?
· Firebird -> 19s
· SQL Server-> 20s
· MySql -> 40sSo, it seems MySql is not affected by parameters, but the other two really get a performance boost!One insert to rule them allLet’s now try a last option: what about inserting all the values in a single operation? Unfortunately neither SQLServer nor Firebird support multiple rows in the values part of an insert. I know they can use some sort of union clause to do something similar, but performance is not better.So, let’s try with MySql:

IDbConnection conn = // grab your conn
conn.Open();
try
{
IDbCommand command = conn.CreateCommand();
command.CommandText = "delete from testtable";
command.ExecuteNonQuery();
int initTime = Environment.TickCount;

IDbTransaction t = conn.BeginTransaction(
IsolationLevel.RepeatableRead);
command.Transaction = t;
StringBuilder builder = new StringBuilder();
builder.Append(string.Format(
"INSERT INTO testtable "+
" (iobjid, ifield0, ifield1) "+
"VALUES ( {0}, {1}, {2} )",
0, 300000, 50000));
for( int i = 1; i < 100000; i++)
{
builder.Append(string.Format(
", ( {0}, {1}, {2} )",
i, 300000-i, 50000 + i));
}
command.CommandText = builder.ToString();
command.ExecuteNonQuery();
t.Commit();
Console.WriteLine("{0} ms",
Environment.TickCount - initTime);
}
finally
{
conn.Close();
}

And the winner is... MySql takes only 9 seconds to insert the 100K records... but only using the multi-value insert operation.Enjoy!
Pablo Santos
I'm the CTO and Founder at Códice.
I've been leading Plastic SCM since 2005. My passion is helping teams work better through version control.
I had the opportunity to see teams from many different industries at work while I helped them improving their version control practices.
I really enjoy teaching (I've been a University professor for 6+ years) and sharing my experience in talks and articles.
And I love simple code. You can reach me at @psluaces.

14 comments:

  1. For SQL server you might want to try to pass the rows to OPENXML as a set of records. Some years ago I got good results by doing that. The following example from http://support.microsoft.com/kb/555266 should get you going...
    Hope it helps.


    DECLARE @hDoc int

    --Prepare input values as an XML documnet
    exec sp_xml_preparedocument @hDoc OUTPUT, @in_values

    --Select data from the table based on values in XML
    SELECT * FROM Orders WHERE CustomerID IN (
    SELECT CustomerID FROM OPENXML (@hdoc, '/NewDataSet/Customers', 1)
    WITH (CustomerID NCHAR(5)))

    EXEC sp_xml_removedocument @hDoc

    ReplyDelete
  2. Interesting, I just wonder why you're only giving the mysql result for the last run.
    Anyway, nice to know :)

    ReplyDelete
  3. Hi guillaume,

    I've written the mysql results for all runs, haven't I?

    I wonder if running IDbDataAdapters and ExecuteBatch will make things go faster.

    Also, the new 5.2 MySql provider introduces the ability to run Bulk Loads... which I guess will make the test go even faster...

    ReplyDelete
  4. Care to retest with PostgreSQL? It would be interesting to see how it compares with MySQL, in particular. :-)

    ReplyDelete
  5. Nah - ignore these lamers with XML voodoo and stuff like that!

    Try this pattern where you don't have multi-value inserts:

    insert foo (a, b)
    select (x1, y1)
    union select (x2, y2)
    union select (x3, y3)
    union ...

    Don't be shy now - see how many unioned rows you can do in one go!

    You will be surprised how many. And at the result.

    Hint: how often does SQLserver consider doing per-statement tasks like trigger exec in this case?

    ReplyDelete
  6. With this you're also testing communication protocol and provider code. To test *just* database use SP or similar (multivalue insert in mysql is close).

    ReplyDelete
  7. try using sql server bulk insert

    http://msdn2.microsoft.com/en-us/library/ms188365.aspx

    ReplyDelete
  8. Using external tables for bulk inserts we achieve performance of 3-4 seconds for 200K records in our POS system software.

    ReplyDelete
  9. you might want to expand this a little. I did some tests where I got 1 MILLION records to insert into sql server in 17 seconds.... using SqlBulkCopy in C#..

    http://blog.stevienova.com/2008/01/16/net-fastest-way-to-load-text-file-to-sql-sqlbulkcopy/

    ReplyDelete
  10. Hi Steve,

    Thanks for link on BulkCopy.

    One remark: I guess there shouldn't be any difference between your method 1 and your method 2, if you keep the connection open on case 1 (which is what method 2 is actually doing internally) and even use parameters as I mentioned on my post.

    I guess directly using IDbCommand must be always faster than IDbDataAdapter or any datatable, isn't it?

    Of course BulkCopy must be even faster, as Andrei pointed out also.

    I'd like to check performance using MySql bulk-copy-like functionality, but I'm afraid you've to use an intermediate file to load the data, is that correct?

    ReplyDelete
  11. hi pablo, the diff between 1 and 2 on my blog is that 2 does batches of 1000, so that is why it is a little faster, where 1 does just 1 record at at a time, not sure on DataAdapter vs DataTable, but I bet they are close.

    Yes bulk copy, there are two different things though. From "within sql" like using BULK INSERT or BCP, which you need certain rights on SQL server to even execute, or within .net code using SqlBulkCopy, which seems to do the same as BULK INSERT but through the .NET Data Layer between code and SQL.. which is nice..especially on remote web/sql servers.

    ReplyDelete
  12. One of the new features of SQL Server 2008 is that it allows multiple sets of values in an insert. Not that useful really, but cute.

    ReplyDelete
  13. Hi pablo,

    SQL Server 2008 (http://msdn2.microsoft.com/en-us/library/ms174335(SQL.100).aspx) supports the feature to insert multiple rows of data, it would be interesting to include in the set of tests.

    ReplyDelete
  14. hello!
    But i very fastest inserting records
    What is processor + memory of computer where you making tests?
    What parametrs of your computer hardware?

    ReplyDelete