The fastest way to insert 100K records
I’m doing some performance tests today with our three different database backends: MySql, Firebird and SQL Server.
My goal is to find the fastest way to insert a big number of records into a table taking into account the different backends.
My test table is the following in the three databases:
CREATE TABLE testtable (
iobjid BIGINT NOT NULL,
ifield0 BIGINT,
ifield1 BIGINT);
iobjid is a primary key and the other two fields are also indexed.So, let’s go with the first loop:
IDbConnection conn = // grab a connection somehow
conn.Open();
try
{
IDbCommand command = conn.CreateCommand();
command.CommandText = "delete from testtable";
command.ExecuteNonQuery();
int initTime = Environment.TickCount;
IDbTransaction t = conn.BeginTransaction(
IsolationLevel.RepeatableRead);
command.Transaction = t;
for( int i = 1; i < 100000; i++)
{
command.CommandText = string.Format(
"INSERT INTO testtable "+
" (iobjid, ifield0, ifield1)"+
" VALUES ( {0}, {1}, {2} )",
i, 300000-i, 50000 + i);
command.ExecuteNonQuery();
}
t.Commit();
Console.WriteLine("{0} ms", Environment.TickCount - initTime);
}
finally
{
conn.Close();
}
How long does it take to insert 100K records on my old laptop?
· Firebird 2.0.1 (embedded) -> 38s
· SQL Server 2005 -> 28s
· MySql 5.0.1 -> 40sI’ve repeated the test with all the possible IsolationLevel values and didn’t find any difference.Insert with paramsMy second test tries to get a better result using parameters on the commands... Here is the code:
IDbConnection conn = //get your connection
conn.Open();
try
{
IDbCommand command = conn.CreateCommand();
command.CommandText = "delete from testtable";
command.ExecuteNonQuery();
int initTime = Environment.TickCount;
IDbTransaction t = conn.BeginTransaction(
IsolationLevel.RepeatableRead);
command.Transaction = t;
// sqlserver and firebird use ‘@’ but mysql uses ‘?’
string indexParamName =
GetParametersNamePrefix() + "pk";
string field0ParamName =
GetParametersNamePrefix() + "field0";
string field1ParamName =
GetParametersNamePrefix() + "field1";
command.CommandText = string.Format(
"INSERT INTO testtable "+
" (iobjid, ifield0, field1) "+
"VALUES ( {0}, {1}, {2} )",
indexParamName, markerParamName, revisionParamName);
IDbDataParameter paramIndex = command.CreateParameter();
paramIndex.ParameterName = indexParamName;
command.Parameters.Add(paramIndex);
IDbDataParameter paramField0 = command.CreateParameter();
paramField0.ParameterName = field0ParamName;
command.Parameters.Add(paramField0);
IDbDataParameter paramField1 = command.CreateParameter();
paramField1.ParameterName = field1ParamName;
command.Parameters.Add(paramField1);
for( int i = 0; i < 100000; i++)
{
paramIndex.Value = i;
paramField0.Value = 300000 -i;
paramField1.Value = 50000 + i;
command.ExecuteNonQuery();
}
t.Commit();
Console.WriteLine("{0} ms",
Environment.TickCount - initTime);
}
finally
{
conn.Close();
}
How long does it take now?
· Firebird -> 19s
· SQL Server-> 20s
· MySql -> 40sSo, it seems MySql is not affected by parameters, but the other two really get a performance boost!One insert to rule them allLet’s now try a last option: what about inserting all the values in a single operation? Unfortunately neither SQLServer nor Firebird support multiple rows in the values part of an insert. I know they can use some sort of union clause to do something similar, but performance is not better.So, let’s try with MySql:
IDbConnection conn = // grab your conn
conn.Open();
try
{
IDbCommand command = conn.CreateCommand();
command.CommandText = "delete from testtable";
command.ExecuteNonQuery();
int initTime = Environment.TickCount;
IDbTransaction t = conn.BeginTransaction(
IsolationLevel.RepeatableRead);
command.Transaction = t;
StringBuilder builder = new StringBuilder();
builder.Append(string.Format(
"INSERT INTO testtable "+
" (iobjid, ifield0, ifield1) "+
"VALUES ( {0}, {1}, {2} )",
0, 300000, 50000));
for( int i = 1; i < 100000; i++)
{
builder.Append(string.Format(
", ( {0}, {1}, {2} )",
i, 300000-i, 50000 + i));
}
command.CommandText = builder.ToString();
command.ExecuteNonQuery();
t.Commit();
Console.WriteLine("{0} ms",
Environment.TickCount - initTime);
}
finally
{
conn.Close();
}
And the winner is... MySql takes only 9 seconds to insert the 100K records... but only using the multi-value insert operation.Enjoy!
For SQL server you might want to try to pass the rows to OPENXML as a set of records. Some years ago I got good results by doing that. The following example from http://support.microsoft.com/kb/555266 should get you going...
ReplyDeleteHope it helps.
DECLARE @hDoc int
--Prepare input values as an XML documnet
exec sp_xml_preparedocument @hDoc OUTPUT, @in_values
--Select data from the table based on values in XML
SELECT * FROM Orders WHERE CustomerID IN (
SELECT CustomerID FROM OPENXML (@hdoc, '/NewDataSet/Customers', 1)
WITH (CustomerID NCHAR(5)))
EXEC sp_xml_removedocument @hDoc
Interesting, I just wonder why you're only giving the mysql result for the last run.
ReplyDeleteAnyway, nice to know :)
Hi guillaume,
ReplyDeleteI've written the mysql results for all runs, haven't I?
I wonder if running IDbDataAdapters and ExecuteBatch will make things go faster.
Also, the new 5.2 MySql provider introduces the ability to run Bulk Loads... which I guess will make the test go even faster...
Care to retest with PostgreSQL? It would be interesting to see how it compares with MySQL, in particular. :-)
ReplyDeleteNah - ignore these lamers with XML voodoo and stuff like that!
ReplyDeleteTry this pattern where you don't have multi-value inserts:
insert foo (a, b)
select (x1, y1)
union select (x2, y2)
union select (x3, y3)
union ...
Don't be shy now - see how many unioned rows you can do in one go!
You will be surprised how many. And at the result.
Hint: how often does SQLserver consider doing per-statement tasks like trigger exec in this case?
With this you're also testing communication protocol and provider code. To test *just* database use SP or similar (multivalue insert in mysql is close).
ReplyDeletetry using sql server bulk insert
ReplyDeletehttp://msdn2.microsoft.com/en-us/library/ms188365.aspx
Using external tables for bulk inserts we achieve performance of 3-4 seconds for 200K records in our POS system software.
ReplyDeleteyou might want to expand this a little. I did some tests where I got 1 MILLION records to insert into sql server in 17 seconds.... using SqlBulkCopy in C#..
ReplyDeletehttp://blog.stevienova.com/2008/01/16/net-fastest-way-to-load-text-file-to-sql-sqlbulkcopy/
Hi Steve,
ReplyDeleteThanks for link on BulkCopy.
One remark: I guess there shouldn't be any difference between your method 1 and your method 2, if you keep the connection open on case 1 (which is what method 2 is actually doing internally) and even use parameters as I mentioned on my post.
I guess directly using IDbCommand must be always faster than IDbDataAdapter or any datatable, isn't it?
Of course BulkCopy must be even faster, as Andrei pointed out also.
I'd like to check performance using MySql bulk-copy-like functionality, but I'm afraid you've to use an intermediate file to load the data, is that correct?
hi pablo, the diff between 1 and 2 on my blog is that 2 does batches of 1000, so that is why it is a little faster, where 1 does just 1 record at at a time, not sure on DataAdapter vs DataTable, but I bet they are close.
ReplyDeleteYes bulk copy, there are two different things though. From "within sql" like using BULK INSERT or BCP, which you need certain rights on SQL server to even execute, or within .net code using SqlBulkCopy, which seems to do the same as BULK INSERT but through the .NET Data Layer between code and SQL.. which is nice..especially on remote web/sql servers.
One of the new features of SQL Server 2008 is that it allows multiple sets of values in an insert. Not that useful really, but cute.
ReplyDeleteHi pablo,
ReplyDeleteSQL Server 2008 (http://msdn2.microsoft.com/en-us/library/ms174335(SQL.100).aspx) supports the feature to insert multiple rows of data, it would be interesting to include in the set of tests.
hello!
ReplyDeleteBut i very fastest inserting records
What is processor + memory of computer where you making tests?
What parametrs of your computer hardware?