当索引不适合key_buffer时,快速MySQL批量加载

时间:2022-09-20 13:28:14

have an issue here of how to configure mysql (myisam) properly for the bulk insert (load data infile) to be performed fast.

这里有一个问题,如何正确配置mysql(myisam)以便快速执行批量插入(加载数据infile)。

There is 6 Gb text file to be imported, 15 mln rows, 16 columns (some int, some varchar(255), one varchar(40), one char(1) some datetime, one mediumtext).

有6个Gb文本文件要导入,15个行,16个列(一些int,一些varchar(255),一个varchar(40),一个char(1)一些datetime,一个mediumtext)。

relative my.conf settings:

相对my.conf设置:

key_buffer  = 800M
max_allowed_packet = 160M
thread_cache_size = 80
myisam_sort_buffer_size = 400M
bulk_insert_buffer_size = 400M
delay_key_write = ON
delayed_insert_limit = 10000

There are three indexes - one primary (autincrement int), one unique int and one unique varchar(40).

有三个索引 - 一个主要(autincrement int),一个唯一的int和一个唯一的varchar(40)。

The problem is that after executing the load data infile command, the first 3 gigs of data are imported quickly (based on the increasing size of table.myd - 5-8 mb/s), but uppon crossing the 3020 Mb limit the import speed decreases greatly - the size of table.myd is growing 0,5mb/s. I've noticed, that the import process slows down upon the Key_blocks_unused gets drained to zero. These are the output of mysql> show status like '%key%'; in the beginning of import:

问题是在执行load data infile命令后,快速导入前3个数据(基于table.myd增加的大小 - 5-8 mb / s),但超过3020 Mb的uppon限制了导入速度大大减少 - table.myd的大小增加了0.5mb / s。我注意到,当Key_blocks_unused被排到零时,导入过程变慢。这些是mysql> show status的输出,如'%key%';在导入开始时:

mysql> show status like '%key%';
+------------------------+---------+
| Variable_name          | Value   |
+------------------------+---------+
| Com_preload_keys       | 0       | 
| Com_show_keys          | 0       | 
| Handler_read_key       | 0       | 
| Key_blocks_not_flushed | 57664   | 
| Key_blocks_unused      | 669364  | 
| Key_blocks_used        | 57672   | 
| Key_read_requests      | 7865321 | 
| Key_reads              | 57672   | 
| Key_write_requests     | 2170158 | 
| Key_writes             | 4       | 
+------------------------+---------+
10 rows in set (0.00 sec)

and this is what how it looks after the 3020Mb limit, i.e. when key_blocks_unused gets down to zero, and that's when the bulk insert process get really slow:

这就是3020Mb限制之后的情况,即当key_blocks_unused降为零时,那就是批量插入过程变得非常慢的时候:

mysql> show status like '%key%';
+------------------------+-----------+
| Variable_name          | Value     |
+------------------------+-----------+
| Com_preload_keys       | 0         | 
| Com_show_keys          | 0         | 
| Handler_read_key       | 0         | 
| Key_blocks_not_flushed | 727031    | 
| Key_blocks_unused      | 0         | 
| Key_blocks_used        | 727036    | 
| Key_read_requests      | 171275179 | 
| Key_reads              | 1163091   | 
| Key_write_requests     | 41181024  | 
| Key_writes             | 436095    | 
+------------------------+-----------+
10 rows in set (0.00 sec)

The problem is pretty clear, to my understanding - indexes are being stored in cache, but once the cache fills in, the indexes get written to disk one by one, which is slow, therefore all the process slows down. If i disable the unique index based on varchar(40) column and, therefore, all the indexes fit into Key_blocks_used (i guess this is the variable directly dependant on key_buffer, isn't it?), all the bulk import is successfull. So, i'm curious, how to make mysql put all the Key_blocks_used data to disk at once, and free up the Key_blocks_used?. I understand that it might be doing some sorting on-the-fly, but still, i guess it should be available to do some cached RAM-disk synchronization in order to successfully manage indexes even when they don't all fit into the memory cache. So my question is "how to configure mysql so that bulk inserting would avoid writing to disk on (almost)each index, even when all indexes don't fit into a cache?" last not least - delay_key_write is set to 1 for a given table, though it didn't add any speed-up, in comparison to when it was disabled.

根据我的理解,问题非常清楚 - 索引存储在缓存中,但是一旦缓存填满,索引就会逐个写入磁盘,这很慢,因此所有进程都会变慢。如果我基于varchar(40)列禁用唯一索引,因此,所有索引都适合Key_blocks_used(我猜这是直接依赖于key_buffer的变量,不是吗?),所有批量导入都是成功的。所以,我很好奇,如何让mysql立即将所有Key_blocks_used数据放入磁盘,并释放Key_blocks_used?。我知道它可能正在进行一些动态排序,但是,我认为应该可以进行一些缓存的RAM磁盘同步,以便成功管理索引,即使它们并非全部适合内存缓存。所以我的问题是“如何配置mysql,以便批量插入可以避免(几乎)每个索引写入磁盘,即使所有索引都不适合缓存?”最后一点 - 对于给定的表,delay_key_write设置为1,但与禁用时相比,它没有添加任何加速。

Thanks for any thoughts, ideas, explanations and RTMs in advance ! (:

感谢任何想法,想法,解释和RTM提前! (:

One more little question - how would i calculate how many varchar(40) indexes would fit into cache before Key_blocks_unused gets to 0?

还有一个小问题 - 在Key_blocks_unused变为0之前,我将如何计算多少varchar(40)索引适合缓存?

P.S. disabling indexes with $myisamchk --keys-used=0 -rq /path/to/db/tbl_name and then re-enabling them with $myisamchk -rq /path/to/db/tbl_name, as described in Mysql docs is a known solution, which works, but only when bulk-inserting into an empty table. When there are some data in a table already, the index uniqueness checking is necessary, therefore disabling indexes is not a solution.

附:使用$ myisamchk --keys-used = 0 -rq / path / to / db / tbl_name禁用索引,然后使用$ myisamchk -rq / path / to / db / tbl_name重新启用它们,如Mysql文档中所述是已知的解决方案,但只有在批量插入空表时才有效。当表中已有数据时,必须进行索引唯一性检查,因此禁用索引不是解决方案。

2 个解决方案

#1


5  

When you import data with "load data infile", I think mysql perform the insert one by one and with each insert, it tries to update the index file .MYI as well and this could slow down your import as it consume bot I/O and CPU resources for each individual insert.

当您使用“加载数据infile”导入数据时,我认为mysql逐个执行插入,并且每次插入时,它都会尝试更新索引文件.MYI,这会因为消耗bot I / O而导致导入速度变慢和每个插入的CPU资源。

What you could do is add 4 files to your import file to disable the keys of your table and enable it at the end of the insert statement and you should see the difference.

您可以做的是将4个文件添加到导入文件中以禁用表的键并在insert语句的末尾启用它,您应该看到差异。

LOCK TABLES tableName WRITE;
ALTER TABLE tableName DISABLE KEYS;
----
your insert statement from  go here..
----
ALTER TABLE tableName ENABLE KEYS
UNLOCK TABLES;

If you don't want to edit your data file, try to use mysqldump to get a proper dump file and you shouldn't run into this slowness with import data.

如果您不想编辑数据文件,请尝试使用mysqldump获取正确的转储文件,并且不应该使用导入数据来处理这种缓慢的问题。

##Dump the database
mysqldump databaseName > database.sql

##Import the database
mysql databaseName < database.sql

Hope this helps!

希望这可以帮助!

#2


0  

I am not sure the key_buffer you mention is same as key_buffer_size.

我不确定你提到的key_buffer和key_buffer_size是一样的。

I had faced similar problem. My problem was resolved by bumping up the key_buffer_size value to something like 1GB. Check my question here.

我遇到过类似的问题。通过将key_buffer_size值提升到1GB之类的问题解决了我的问题。在这里查看我的问题。

#1


5  

When you import data with "load data infile", I think mysql perform the insert one by one and with each insert, it tries to update the index file .MYI as well and this could slow down your import as it consume bot I/O and CPU resources for each individual insert.

当您使用“加载数据infile”导入数据时,我认为mysql逐个执行插入,并且每次插入时,它都会尝试更新索引文件.MYI,这会因为消耗bot I / O而导致导入速度变慢和每个插入的CPU资源。

What you could do is add 4 files to your import file to disable the keys of your table and enable it at the end of the insert statement and you should see the difference.

您可以做的是将4个文件添加到导入文件中以禁用表的键并在insert语句的末尾启用它,您应该看到差异。

LOCK TABLES tableName WRITE;
ALTER TABLE tableName DISABLE KEYS;
----
your insert statement from  go here..
----
ALTER TABLE tableName ENABLE KEYS
UNLOCK TABLES;

If you don't want to edit your data file, try to use mysqldump to get a proper dump file and you shouldn't run into this slowness with import data.

如果您不想编辑数据文件,请尝试使用mysqldump获取正确的转储文件,并且不应该使用导入数据来处理这种缓慢的问题。

##Dump the database
mysqldump databaseName > database.sql

##Import the database
mysql databaseName < database.sql

Hope this helps!

希望这可以帮助!

#2


0  

I am not sure the key_buffer you mention is same as key_buffer_size.

我不确定你提到的key_buffer和key_buffer_size是一样的。

I had faced similar problem. My problem was resolved by bumping up the key_buffer_size value to something like 1GB. Check my question here.

我遇到过类似的问题。通过将key_buffer_size值提升到1GB之类的问题解决了我的问题。在这里查看我的问题。