I am trying to optimize one part of my code that inserts data into MySQL. Should I chain INSERTs to make one huge multiple-row INSERT or are multiple separate INSERTs faster?
我正在优化代码的一部分,它将数据插入到MySQL中。我应该用链插入来做一个巨大的多行插入,还是用多行插入更快?
9 个解决方案
#1
230
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
The time required for inserting a row is determined by the following factors, where the numbers indicate approximate proportions:
插入一行所需的时间由下列因素决定,其中数字表示近似比例:
- Connecting: (3)
- 连接:(3)
- Sending query to server: (2)
- 向服务器发送查询:(2)
- Parsing query: (2)
- 解析查询:(2)
- Inserting row: (1 × size of row)
- 插入行:(1×行)的大小
- Inserting indexes: (1 × number of indexes)
- 插入索引:(1×索引的数量)
- Closing: (1)
- 关闭:(1)
From this it should be obvious, that sending one large statement will save you an overhead of 7 per insert statement, which in further reading the text also says:
显然,发送一个大的语句将为每个insert语句节省7的开销,在进一步阅读本文时,该语句还说:
If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.
如果您同时插入来自同一客户端的多个行,请使用带有多个值列表的INSERT语句来一次插入多个行。这比使用单独的单行插入语句要快得多(在某些情况下要快很多倍)。
#2
126
I know I'm answering this question almost two and a half years after it was asked, but I just wanted to provide some hard data from a project I'm working on right now that shows that indeed doing multiple VALUE blocks per insert is MUCH faster than sequential single VALUE block INSERT statements.
我知道我回答这个问题几乎两年半后问,但我只是想提供一些数据从一个项目我现在工作表明,事实上做多个值块每插入顺序单值块insert语句要快得多。
The code I wrote for this benchmark in C# uses ODBC to read data into memory from an MSSQL data source (~19,000 rows, all are read before any writing commences), and the MySql .NET connector (Mysql.Data.*) stuff to INSERT the data from memory into a table on a MySQL server via prepared statements. It was written in such a way as to allow me to dynamically adjust the number of VALUE blocks per prepared INSERT (ie, insert n rows at a time, where I could adjust the value of n before a run.) I also ran the test multiple times for each n.
我写的这个基准测试的代码在c#中使用ODBC数据源读取数据到内存中从该软件(~ 19000行,都是阅读写作开始之前),和MySql . net连接器(Mysql.Data。*)东西来将数据从内存中插入一个表在一个MySql服务器上通过准备好的语句。它的编写方式允许我动态地调整每个准备插入的值块的数量(例如,每次插入n行,在运行之前我可以调整n的值)。我还为每个n运行了多次测试。
Doing single VALUE blocks (eg, 1 row at a time) took 5.7 - 5.9 seconds to run. The other values are as follows:
执行单个值块(例如,一次执行一行)需要5.7 - 5.9秒。其他值如下:
2 rows at a time: 3.5 - 3.5 seconds
5 rows at a time: 2.2 - 2.2 seconds
10 rows at a time: 1.7 - 1.7 seconds
50 rows at a time: 1.17 - 1.18 seconds
100 rows at a time: 1.1 - 1.4 seconds
500 rows at a time: 1.1 - 1.2 seconds
1000 rows at a time: 1.17 - 1.17 seconds
2行一次:3.5 - 3.5秒5行一次:2.2 - 2.2秒10行时间:1.7 - 1.7秒一次50行:1.17 - 1.18秒一次100行:1.1 - 1.4秒一次500行:1.1 - 1.2秒一次1000行:1.17 - 1.17秒
So yes, even just bundling 2 or 3 writes together provides a dramatic improvement in speed (runtime cut by a factor of n), until you get to somewhere between n = 5 and n = 10, at which point the improvement drops off markedly, and somewhere in the n = 10 to n = 50 range the improvement becomes negligible.
是的,只是捆绑2或3写在一起提供了一个戏剧性的改善速度(n)运行时减少的一个因素,直到n = 5和n = 10之间的某个地方,此时改善显著下降,在n = 10 n = 50范围改进变得微不足道。
Hope that helps people decide on (a) whether to use the multiprepare idea, and (b) how many VALUE blocks to create per statement (assuming you want to work with data that may be large enough to push the query past the max query size for MySQL, which I believe is 16MB by default in a lot of places, possibly larger or smaller depending on the value of max_allowed_packet set on the server.)
希望帮助人们决定(a)是否使用multiprepare理念,和(b)有多少价值块创建每个语句(假设你想处理数据可能大到足以推动MySQL查询过去马克斯查询大小,我相信这是16 mb默认情况下在很多地方,可能有更大或更小的值取决于max_allowed_packet设置在服务器上。)
#3
13
A major factor will be whether you're using a transactional engine and whether you have autocommit on.
一个主要的因素是您是否在使用事务引擎,以及是否启用了autocommit。
Autocommit is on by default and you probably want to leave it on; therefore, each insert that you do does its own transaction. This means that if you do one insert per row, you're going to be committing a transaction for each row.
自动提交在默认情况下是打开的,您可能想要保留它;因此,您所做的每个插入都执行自己的事务。这意味着,如果每一行进行一次插入,就会为每一行提交一个事务。
Assuming a single thread, that means that the server needs to sync some data to disc for EVERY ROW. It needs to wait for the data to reach a persistent storage location (hopefully the battery-backed ram in your RAID controller). This is inherently rather slow and will probably become the limiting factor in these cases.
假设只有一个线程,这意味着服务器需要为每一行同步一些数据到磁盘。它需要等待数据到达一个持久存储位置(希望是RAID控制器中电池支持的ram)。这在本质上是相当缓慢的,可能会成为这些情况下的限制因素。
I'm of course assuming that you're using a transactional engine (usually innodb) AND that you haven't tweaked the settings to reduce durability.
当然,我假设您正在使用事务引擎(通常是innodb),并且您还没有调整设置以降低持久性。
I'm also assuming that you're using a single thread to do these inserts. Using multiple threads muddies things a bit because some versions of MySQL have working group-commit in innodb - this means that multiple threads doing their own commits can share a single write to the transaction log, which is good because it means fewer syncs to persistent storage.
我还假设您正在使用一个线程来执行这些插入。由于MySQL的某些版本在innodb中有工作组提交,所以使用多个线程会使事情变得有点混乱——这意味着多个线程在执行他们自己的提交时可以共享一个写入到事务日志的记录,这很好,因为它意味着更少的syncs用于持久存储。
On the other hand, the upshot is, that you REALLY WANT TO USE multi-row inserts.
另一方面,结果是,您确实希望使用多行插入。
There is a limit over which it gets counter-productive, but in most cases it's at least 10,000 rows. So if you batch them up to 1,000 rows, you're probably safe.
有一个限制,它会产生相反的效果,但在大多数情况下,它至少有10,000行。因此,如果你将它们批处理到1000行,你可能是安全的。
If you're using MyISAM, there's a whole other load of things, but I'll not bore you with those. Peace.
如果你在用MyISAM,还有很多其他的东西,但是我不会让你觉得无聊。和平。
#4
7
Send as many inserts across the wire at one time as possible. The actual insert speed should be the same, but you will see performance gains from the reduction of network overhead.
尽可能多地在同一时间发送大量的插入。实际的插入速度应该是相同的,但是您将看到从减少网络开销中获得的性能收益。
#5
5
In general the less number of calls to the database the better (meaning faster, more efficient), so try to code the inserts in such a way that it minimizes database accesses. Remember, unless your using a connection pool, each databse access has to create a connection, execute the sql, and then tear down the connection. Quite a bit of overhead!
一般来说,对数据库的调用数量越少越好(意味着更快、更有效),因此尝试以最小化数据库访问的方式对插入进行编码。记住,除非使用连接池,否则每个databse访问都必须创建一个连接,执行sql,然后断开连接。相当多的开销!
#6
3
You might want to :
你可能想:
- Check that auto-commit is off
- 检查自动提交是否关闭
- Open Connection
- 打开连接
- Send multiple batches of inserts in a single transaction (size of about 4000-10000 rows ? you see)
- 在一个事务中发送多批插入(大约4000-10000行大小?你看到)
- Close connection
- 紧密联系
Depending on how well your server scales (its definitively ok with PostgreSQl
, Oracle
and MSSQL
), do the thing above with multiple threads and multiple connections.
根据服务器的可伸缩性(PostgreSQl、Oracle和MSSQL绝对没问题),可以使用多个线程和多个连接完成上面的操作。
#7
3
MYSQL 5.5 One sql insert statement took ~300 to ~450ms. while the below stats is for inline multiple insert statments.
MYSQL 5.5一个sql insert语句花费了大约300到450ms。而下面的状态则用于内联多个插入语句。
(25492 row(s) affected)
Execution Time : 00:00:03:343
Transfer Time : 00:00:00:000
Total Time : 00:00:03:343
I would say inline is way to go :)
我认为内联是一种方法
#8
2
In general, multiple inserts will be slower because of the connection overhead. Doing multiple inserts at once will reduce the cost of overhead per insert.
通常,由于连接开销的原因,多个插入将会变慢。一次执行多个插入将减少每次插入的开销。
Depending on which language you are using, you can possibly create a batch in your programming/scripting language before going to the db and add each insert to the batch. Then you would be able to execute a large batch using one connect operation. Here's an example in Java.
根据您正在使用的语言,您可以在使用db并将每个插入添加到批处理之前,使用您的编程/脚本语言创建一个批处理。然后,您就可以使用一个connect操作执行一个大批处理。这里有一个Java的例子。
#9
1
Disable constrains checks make inserts much much faster. It dosn't matter your table has it or not. For example test disabling foreign keys and enjoy the speed:
禁用约束检查使插入快得多。不管你的桌子是否有它都没关系。例如,测试禁用外键并享受速度:
SET FOREIGN_KEY_CHECKS=0;
#1
230
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
The time required for inserting a row is determined by the following factors, where the numbers indicate approximate proportions:
插入一行所需的时间由下列因素决定,其中数字表示近似比例:
- Connecting: (3)
- 连接:(3)
- Sending query to server: (2)
- 向服务器发送查询:(2)
- Parsing query: (2)
- 解析查询:(2)
- Inserting row: (1 × size of row)
- 插入行:(1×行)的大小
- Inserting indexes: (1 × number of indexes)
- 插入索引:(1×索引的数量)
- Closing: (1)
- 关闭:(1)
From this it should be obvious, that sending one large statement will save you an overhead of 7 per insert statement, which in further reading the text also says:
显然,发送一个大的语句将为每个insert语句节省7的开销,在进一步阅读本文时,该语句还说:
If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.
如果您同时插入来自同一客户端的多个行,请使用带有多个值列表的INSERT语句来一次插入多个行。这比使用单独的单行插入语句要快得多(在某些情况下要快很多倍)。
#2
126
I know I'm answering this question almost two and a half years after it was asked, but I just wanted to provide some hard data from a project I'm working on right now that shows that indeed doing multiple VALUE blocks per insert is MUCH faster than sequential single VALUE block INSERT statements.
我知道我回答这个问题几乎两年半后问,但我只是想提供一些数据从一个项目我现在工作表明,事实上做多个值块每插入顺序单值块insert语句要快得多。
The code I wrote for this benchmark in C# uses ODBC to read data into memory from an MSSQL data source (~19,000 rows, all are read before any writing commences), and the MySql .NET connector (Mysql.Data.*) stuff to INSERT the data from memory into a table on a MySQL server via prepared statements. It was written in such a way as to allow me to dynamically adjust the number of VALUE blocks per prepared INSERT (ie, insert n rows at a time, where I could adjust the value of n before a run.) I also ran the test multiple times for each n.
我写的这个基准测试的代码在c#中使用ODBC数据源读取数据到内存中从该软件(~ 19000行,都是阅读写作开始之前),和MySql . net连接器(Mysql.Data。*)东西来将数据从内存中插入一个表在一个MySql服务器上通过准备好的语句。它的编写方式允许我动态地调整每个准备插入的值块的数量(例如,每次插入n行,在运行之前我可以调整n的值)。我还为每个n运行了多次测试。
Doing single VALUE blocks (eg, 1 row at a time) took 5.7 - 5.9 seconds to run. The other values are as follows:
执行单个值块(例如,一次执行一行)需要5.7 - 5.9秒。其他值如下:
2 rows at a time: 3.5 - 3.5 seconds
5 rows at a time: 2.2 - 2.2 seconds
10 rows at a time: 1.7 - 1.7 seconds
50 rows at a time: 1.17 - 1.18 seconds
100 rows at a time: 1.1 - 1.4 seconds
500 rows at a time: 1.1 - 1.2 seconds
1000 rows at a time: 1.17 - 1.17 seconds
2行一次:3.5 - 3.5秒5行一次:2.2 - 2.2秒10行时间:1.7 - 1.7秒一次50行:1.17 - 1.18秒一次100行:1.1 - 1.4秒一次500行:1.1 - 1.2秒一次1000行:1.17 - 1.17秒
So yes, even just bundling 2 or 3 writes together provides a dramatic improvement in speed (runtime cut by a factor of n), until you get to somewhere between n = 5 and n = 10, at which point the improvement drops off markedly, and somewhere in the n = 10 to n = 50 range the improvement becomes negligible.
是的,只是捆绑2或3写在一起提供了一个戏剧性的改善速度(n)运行时减少的一个因素,直到n = 5和n = 10之间的某个地方,此时改善显著下降,在n = 10 n = 50范围改进变得微不足道。
Hope that helps people decide on (a) whether to use the multiprepare idea, and (b) how many VALUE blocks to create per statement (assuming you want to work with data that may be large enough to push the query past the max query size for MySQL, which I believe is 16MB by default in a lot of places, possibly larger or smaller depending on the value of max_allowed_packet set on the server.)
希望帮助人们决定(a)是否使用multiprepare理念,和(b)有多少价值块创建每个语句(假设你想处理数据可能大到足以推动MySQL查询过去马克斯查询大小,我相信这是16 mb默认情况下在很多地方,可能有更大或更小的值取决于max_allowed_packet设置在服务器上。)
#3
13
A major factor will be whether you're using a transactional engine and whether you have autocommit on.
一个主要的因素是您是否在使用事务引擎,以及是否启用了autocommit。
Autocommit is on by default and you probably want to leave it on; therefore, each insert that you do does its own transaction. This means that if you do one insert per row, you're going to be committing a transaction for each row.
自动提交在默认情况下是打开的,您可能想要保留它;因此,您所做的每个插入都执行自己的事务。这意味着,如果每一行进行一次插入,就会为每一行提交一个事务。
Assuming a single thread, that means that the server needs to sync some data to disc for EVERY ROW. It needs to wait for the data to reach a persistent storage location (hopefully the battery-backed ram in your RAID controller). This is inherently rather slow and will probably become the limiting factor in these cases.
假设只有一个线程,这意味着服务器需要为每一行同步一些数据到磁盘。它需要等待数据到达一个持久存储位置(希望是RAID控制器中电池支持的ram)。这在本质上是相当缓慢的,可能会成为这些情况下的限制因素。
I'm of course assuming that you're using a transactional engine (usually innodb) AND that you haven't tweaked the settings to reduce durability.
当然,我假设您正在使用事务引擎(通常是innodb),并且您还没有调整设置以降低持久性。
I'm also assuming that you're using a single thread to do these inserts. Using multiple threads muddies things a bit because some versions of MySQL have working group-commit in innodb - this means that multiple threads doing their own commits can share a single write to the transaction log, which is good because it means fewer syncs to persistent storage.
我还假设您正在使用一个线程来执行这些插入。由于MySQL的某些版本在innodb中有工作组提交,所以使用多个线程会使事情变得有点混乱——这意味着多个线程在执行他们自己的提交时可以共享一个写入到事务日志的记录,这很好,因为它意味着更少的syncs用于持久存储。
On the other hand, the upshot is, that you REALLY WANT TO USE multi-row inserts.
另一方面,结果是,您确实希望使用多行插入。
There is a limit over which it gets counter-productive, but in most cases it's at least 10,000 rows. So if you batch them up to 1,000 rows, you're probably safe.
有一个限制,它会产生相反的效果,但在大多数情况下,它至少有10,000行。因此,如果你将它们批处理到1000行,你可能是安全的。
If you're using MyISAM, there's a whole other load of things, but I'll not bore you with those. Peace.
如果你在用MyISAM,还有很多其他的东西,但是我不会让你觉得无聊。和平。
#4
7
Send as many inserts across the wire at one time as possible. The actual insert speed should be the same, but you will see performance gains from the reduction of network overhead.
尽可能多地在同一时间发送大量的插入。实际的插入速度应该是相同的,但是您将看到从减少网络开销中获得的性能收益。
#5
5
In general the less number of calls to the database the better (meaning faster, more efficient), so try to code the inserts in such a way that it minimizes database accesses. Remember, unless your using a connection pool, each databse access has to create a connection, execute the sql, and then tear down the connection. Quite a bit of overhead!
一般来说,对数据库的调用数量越少越好(意味着更快、更有效),因此尝试以最小化数据库访问的方式对插入进行编码。记住,除非使用连接池,否则每个databse访问都必须创建一个连接,执行sql,然后断开连接。相当多的开销!
#6
3
You might want to :
你可能想:
- Check that auto-commit is off
- 检查自动提交是否关闭
- Open Connection
- 打开连接
- Send multiple batches of inserts in a single transaction (size of about 4000-10000 rows ? you see)
- 在一个事务中发送多批插入(大约4000-10000行大小?你看到)
- Close connection
- 紧密联系
Depending on how well your server scales (its definitively ok with PostgreSQl
, Oracle
and MSSQL
), do the thing above with multiple threads and multiple connections.
根据服务器的可伸缩性(PostgreSQl、Oracle和MSSQL绝对没问题),可以使用多个线程和多个连接完成上面的操作。
#7
3
MYSQL 5.5 One sql insert statement took ~300 to ~450ms. while the below stats is for inline multiple insert statments.
MYSQL 5.5一个sql insert语句花费了大约300到450ms。而下面的状态则用于内联多个插入语句。
(25492 row(s) affected)
Execution Time : 00:00:03:343
Transfer Time : 00:00:00:000
Total Time : 00:00:03:343
I would say inline is way to go :)
我认为内联是一种方法
#8
2
In general, multiple inserts will be slower because of the connection overhead. Doing multiple inserts at once will reduce the cost of overhead per insert.
通常,由于连接开销的原因,多个插入将会变慢。一次执行多个插入将减少每次插入的开销。
Depending on which language you are using, you can possibly create a batch in your programming/scripting language before going to the db and add each insert to the batch. Then you would be able to execute a large batch using one connect operation. Here's an example in Java.
根据您正在使用的语言,您可以在使用db并将每个插入添加到批处理之前,使用您的编程/脚本语言创建一个批处理。然后,您就可以使用一个connect操作执行一个大批处理。这里有一个Java的例子。
#9
1
Disable constrains checks make inserts much much faster. It dosn't matter your table has it or not. For example test disabling foreign keys and enjoy the speed:
禁用约束检查使插入快得多。不管你的桌子是否有它都没关系。例如,测试禁用外键并享受速度:
SET FOREIGN_KEY_CHECKS=0;