如何在一个很大的MySQL表上提高插入性能

时间:2021-08-20 03:53:57

I am working on a large MySQL database and I need to improve INSERT performance on a specific table. This one contains about 200 Millions rows and its structure is as follows:

我正在开发一个大型的MySQL数据库,我需要改进特定表的插入性能。这个大约有2亿行,其结构如下:

(a little premise: I am not a database expert, so the code I've written could be based on wrong foundations. Please help me to understand my mistakes :) )

(一个小前提:我不是数据库专家,所以我编写的代码可能基于错误的基础。请帮助我理解我的错误

CREATE TABLE IF NOT EXISTS items (
    id INT NOT NULL AUTO_INCREMENT,
    name VARCHAR(200) NOT NULL,
    key VARCHAR(10) NOT NULL,
    busy TINYINT(1) NOT NULL DEFAULT 1,
    created_at DATETIME NOT NULL,
    updated_at DATETIME NOT NULL,

    PRIMARY KEY (id, name),
    UNIQUE KEY name_key_unique_key (name, key),
    INDEX name_index (name)
) ENGINE=MyISAM
PARTITION BY LINEAR KEY(name)
PARTITIONS 25;

Every day I receive many csv files in which each line is composed by the pair "name;key", so I have to parse these files (adding values created_at and updated_at for each row) and insert the values into my table. In this one, the combination of "name" and "key" MUST be UNIQUE, so I implemented the insert procedure as follows:

每天我都会收到许多csv文件,其中每一行都由一对“name;key”组成,因此我必须解析这些文件(为每一行添加created_at和updated_at值),并将这些值插入到我的表中。在这一项中,“name”和“key”的组合必须是唯一的,所以我实现了插入过程如下:

CREATE TEMPORARY TABLE temp_items (
    id INT NOT NULL AUTO_INCREMENT,
    name VARCHAR(200) NOT NULL, 
    key VARCHAR(10) NOT NULL, 
    busy TINYINT(1) NOT NULL DEFAULT 1,  
    created_at DATETIME NOT NULL, 
    updated_at DATETIME NOT NULL,  
    PRIMARY KEY (id) 
    ) 
ENGINE=MyISAM;

LOAD DATA LOCAL INFILE 'file_to_process.csv' 
INTO TABLE temp_items
FIELDS TERMINATED BY ',' 
OPTIONALLY ENCLOSED BY '\"' 
(name, key, created_at, updated_at); 

INSERT INTO items (name, key, busy, created_at, updated_at) 
(
    SELECT temp_items.name, temp_items.key, temp_items.busy, temp_items.created_at, temp_items.updated_at 
    FROM temp_items
) 
ON DUPLICATE KEY UPDATE busy=1, updated_at=NOW();

DROP TEMPORARY TABLE temp_items;

The code just shown allows me to reach my goal but, to complete the execution, it employs about 48 hours, and this is a problem. I think that this poor performance are caused by the fact that the script must check on a very large table (200 Millions rows) and for each insertion that the pair "name;key" is unique.

刚才显示的代码允许我实现目标,但是要完成执行,它需要大约48小时,这是一个问题。我认为这种糟糕的性能是由于脚本必须检查一个非常大的表(2亿行),并且每一个插入都必须检查这对“名称;键”是唯一的。

How can I improve the performance of my script?

如何改进脚本的性能?

Thanks to all in advance.

提前谢谢大家。

4 个解决方案

#1


2  

Your linear key on name and the large indexes slows things down.

名称和大索引上的线性键会降低速度。

LINEAR KEY needs to be calculated every insert. http://dev.mysql.com/doc/refman/5.1/en/partitioning-linear-hash.html

每次插入都需要计算线性键。http://dev.mysql.com/doc/refman/5.1/en/partitioning-linear-hash.html

can you show us some example data of file_to_process.csv maybe a better schema should be build.

您能给我们展示一下file_to_process的一些示例数据吗?csv也许是一个更好的模式。

Edit looked more closely

编辑看起来更密切

INSERT INTO items (name, key, busy, created_at, updated_at) 
(
    SELECT temp_items.name, temp_items.key, temp_items.busy, temp_items.created_at, temp_items.updated_at 
    FROM temp_items
) 

this will proberly will create a disk temp table, this is very very slow so you should not use it to get more performance or maybe you should check some mysql config settings like tmp-table-size and max-heap-table-size maybe these are misconfigured.

这将会创建一个磁盘临时表,这是非常慢的,所以你不应该用它来获得更多的性能,或者你应该检查一些mysql配置设置,比如tmp-table大小和max-heap-table大小,也许这些配置是错误的。

#2


1  

You can use the following methods to speed up inserts:

你可以使用以下方法来加速插入:

  1. If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements. If you are adding data to a nonempty table, you can tune the bulk_insert_buffer_size variable to make data insertion even faster.

    如果您同时插入来自同一客户端的多个行,请使用带有多个值列表的INSERT语句来一次插入多个行。这比使用单独的单行插入语句要快得多(在某些情况下要快很多倍)。如果要向非空表添加数据,可以调优bulk_insert_buffer_size变量,使数据插入速度更快。

  2. When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times faster than using INSERT statements.

    当从文本文件加载表时,请使用LOAD DATA INFILE。这通常比使用INSERT语句快20倍。

  3. Take advantage of the fact that columns have default values. Insert values explicitly only when the value to be inserted differs from the default. This reduces the parsing that MySQL must do and improves the insert speed.

    利用列具有默认值这一事实。仅当要插入的值与默认值不同时才显式插入值。这减少了MySQL必须进行的解析,并提高了插入速度。

#3


0  

There is a piece of documentation I would like to point out, Speed of INSERT Statements.

我想指出的是,插入语句的速度。

#4


-2  

You could use

您可以使用

load data local infile ''
REPLACE
into table 

etc...

等等……

The REPLACE ensure that any duplicate value is overwritten with the new values. Add a SET updated_at=now() at the end and you're done.

替换确保用新值覆盖任何重复值。在末尾添加一个SET updated_at=now(),就完成了。

There is no need for the temporary table.

不需要临时表。

#1


2  

Your linear key on name and the large indexes slows things down.

名称和大索引上的线性键会降低速度。

LINEAR KEY needs to be calculated every insert. http://dev.mysql.com/doc/refman/5.1/en/partitioning-linear-hash.html

每次插入都需要计算线性键。http://dev.mysql.com/doc/refman/5.1/en/partitioning-linear-hash.html

can you show us some example data of file_to_process.csv maybe a better schema should be build.

您能给我们展示一下file_to_process的一些示例数据吗?csv也许是一个更好的模式。

Edit looked more closely

编辑看起来更密切

INSERT INTO items (name, key, busy, created_at, updated_at) 
(
    SELECT temp_items.name, temp_items.key, temp_items.busy, temp_items.created_at, temp_items.updated_at 
    FROM temp_items
) 

this will proberly will create a disk temp table, this is very very slow so you should not use it to get more performance or maybe you should check some mysql config settings like tmp-table-size and max-heap-table-size maybe these are misconfigured.

这将会创建一个磁盘临时表,这是非常慢的,所以你不应该用它来获得更多的性能,或者你应该检查一些mysql配置设置,比如tmp-table大小和max-heap-table大小,也许这些配置是错误的。

#2


1  

You can use the following methods to speed up inserts:

你可以使用以下方法来加速插入:

  1. If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements. If you are adding data to a nonempty table, you can tune the bulk_insert_buffer_size variable to make data insertion even faster.

    如果您同时插入来自同一客户端的多个行,请使用带有多个值列表的INSERT语句来一次插入多个行。这比使用单独的单行插入语句要快得多(在某些情况下要快很多倍)。如果要向非空表添加数据,可以调优bulk_insert_buffer_size变量,使数据插入速度更快。

  2. When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times faster than using INSERT statements.

    当从文本文件加载表时,请使用LOAD DATA INFILE。这通常比使用INSERT语句快20倍。

  3. Take advantage of the fact that columns have default values. Insert values explicitly only when the value to be inserted differs from the default. This reduces the parsing that MySQL must do and improves the insert speed.

    利用列具有默认值这一事实。仅当要插入的值与默认值不同时才显式插入值。这减少了MySQL必须进行的解析,并提高了插入速度。

#3


0  

There is a piece of documentation I would like to point out, Speed of INSERT Statements.

我想指出的是,插入语句的速度。

#4


-2  

You could use

您可以使用

load data local infile ''
REPLACE
into table 

etc...

等等……

The REPLACE ensure that any duplicate value is overwritten with the new values. Add a SET updated_at=now() at the end and you're done.

替换确保用新值覆盖任何重复值。在末尾添加一个SET updated_at=now(),就完成了。

There is no need for the temporary table.

不需要临时表。