MySQL:有效地填充存储过程中的表

时间:2021-12-04 02:04:45

I am testing performance in a MySQL Server and filling a table with more than 200 million of records. The Stored Procedure is very slow generating the big SQL string. Any help or comment is really welcome.

我正在测试MySQL服务器中的性能并填充一个包含超过2亿条记录的表。存储过程生成大SQL字符串非常慢。任何帮助或评论都非常受欢迎。

System Info:

  • Database: MySQL 5.6.10 InnoDB database (test).
  • 数据库:MySQL 5.6.10 InnoDB数据库(测试)。

  • Processor: AMD Phenom II 1090T X6 core, 3910Mhz each core.
  • 处理器:AMD Phenom II 1090T X6核心,每核心3910Mhz。

  • RAM: 16GB DDR3 1600Mhz CL8.
  • 内存:16GB DDR3 1600Mhz CL8。

  • HD: Windows 7 64bits SP1 in SSD, mySQL installed in SSD, logs written in mechanical hard disk.
  • 高清:SSD中的Windows 7 64位SP1,安装在SSD中的mySQL,用机械硬盘写的日志。

The Stored Procedure creates a INSERT sql query with all the values to be inserted into the table.

存储过程创建一个INSERT sql查询,其中包含要插入表中的所有值。

DELIMITER $$
USE `test`$$

DROP PROCEDURE IF EXISTS `inputRowsNoRandom`$$

CREATE DEFINER=`root`@`localhost` PROCEDURE `inputRowsNoRandom`(IN NumRows BIGINT)
BEGIN
    /* BUILD INSERT SENTENCE WITH A LOS OF ROWS TO INSERT */
    DECLARE i BIGINT;
    DECLARE nMax BIGINT;
    DECLARE squery LONGTEXT;
    DECLARE svalues LONGTEXT;

    SET i = 1;
    SET nMax = NumRows + 1;
    SET squery = 'INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, DATE) VALUES ';
    SET svalues = '("1", "a1", 100, 1, 500000, "2013-06-14 12:40:45"),';

    WHILE i < nMax DO
        SET squery = CONCAT(squery, svalues);
        SET i = i + 1;
    END WHILE;

    /*SELECT squery;*/
    SET squery = LEFT(squery, CHAR_LENGTH(squery) - 1);
    SET squery = CONCAT(squery, ";");
    SELECT squery;

    /* EXECUTE INSERT SENTENCE */
    /*START TRANSACTION;*/
    /*PREPARE stmt FROM squery;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    */

    /*COMMIT;*/
END$$
DELIMITER ;


Results:

  1. Concatenating 20000 strings takes about 45 seconds to be processed:
  2. 连接20000个字符串大约需要45秒才能处理:

CALL test.inputRowsNoRandom(20000);

  1. Concatenating 100000 strings takes about +5/12 minutes O_O:
  2. 连接100000个字符串大约需要+5/12分钟O_O:

CALL test.inputRowsNoRandom(100000);

Result (ordered by duration) - stateduration (summed) in sec || percentage
freeing items 0.00005 50.00000
starting 0.00002 20.00000
executing 0.00001 10.00000
init 0.00001 10.00000
cleaning up 0.00001 10.00000
Total 0.00010 100.00000

Change Of STATUS VARIABLES Due To Execution Of Query
variable value description
Bytes_received 21 Bytes sent from the client to the server
Bytes_sent 97 Bytes sent from the server to the client
Com_select 1 Number of SELECT statements that have been executed
Questions 1 Number of statements executed by the server

结果(按持续时间排序) - 以秒为单位的声明(求和)百分比释放项目0.00005 50.00000起始0.00002 20.00000正在执行0.00001 10.00000初始化0.00001 10.00000清理0.00001 10.00000总计0.00010 100.00000由于执行查询变量值描述更改状态变量Bytes_received 21从客户端发送到服务器的字节Bytes_sent 97从服务器发送的字节客户端Com_select 1已执行的SELECT语句数问题1服务器执行的语句数

Tests:
I have already tested with different MySQL configurations from 12 to 64 threads, setting cache on and off, moving logs to another hardware disk...
Also tested using TEXT, INT..

Additional Information:

测试:我已经测试了12到64个线程的不同MySQL配置,开启和关闭缓存,将日志移动到另一个硬件磁盘......还使用TEXT,INT测试..附加信息:


Questions:

  • Is something wrong in the code? If I send 100000 strings to build the final SQL string, the result of SELECT squery; is a NULL string. Whats happening? (error must be there but I dont see it).
  • 代码中有问题吗?如果我发送100000个字符串来构建最终的SQL字符串,那么SELECT squery的结果;是一个NULL字符串。发生了什么? (错误必须在那里,但我没有看到它)。

  • Can I improve the code in any way to speed it up?
  • 我可以以任何方式改进代码以加快速度吗?

  • I have read some operations in Stored Procedures can be really slow, should I generate the file in C/Java/PHP.. and send it to mysql?

    mysql -u mysqluser -p databasename < numbers.sql

    mysql -u mysqluser -p databasename

  • MySQL seems to use only one core for one single SQL query, would nginx or other database system: Multithreadted DBs, Cassandra, Redis, MongoDB..) achieve better performance with stored procedures and use more than one CPU for one query? (Since my single query is using only 20% of total CPU with about 150 threads).

  • MySQL似乎只使用一个核心进行单个SQL查询,nginx或其他数据库系统:多线程数据库,Cassandra,Redis,MongoDB ..)使用存储过程实现更好的性能,并为一个查询使用多个CPU? (因为我的单个查询仅使用总CPU的20%和大约150个线程)。

UPDATE:

1 个解决方案

#1


5  

Don't use loops especially on that scale in RDBMS.

不要在RDBMS中使用特别是那种规模的循环。

Try to quickly fill your table with 1m rows with a query

尝试使用查询快速填充1m行的表

INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)
SELECT 1, 'a1', 100, 1, 500000, '2013-06-14 12:40:45'
  FROM
(
select a.N + b.N * 10 + c.N * 100 + d.N * 1000 + e.N * 10000 + f.N * 100000 + 1 N
from (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) b
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) c
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) d
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) e
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) f
) t

It took on my box (MacBook Pro 16GB RAM, 2.6Ghz Intel Core i7) ~8 sec to complete

它用我的盒子(MacBook Pro 16GB RAM,2.6Ghz Intel Core i7)〜8秒完成

Query OK, 1000000 rows affected (7.63 sec)
Records: 1000000  Duplicates: 0  Warnings: 0

UPDATE1 Now a version of a stored procedure that uses a prepared statement

UPDATE1现在是使用预准备语句的存储过程的一个版本

DELIMITER $$
CREATE PROCEDURE `inputRowsNoRandom`(IN NumRows INT)
BEGIN
    DECLARE i INT DEFAULT 0;

    PREPARE stmt 
       FROM 'INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)
             VALUES(?, ?, ?, ?, ?, ?)';
    SET @v1 = 1, @v2 = 'a1', @v3 = 100, @v4 = 1, @v5 = 500000, @v6 = '2013-06-14 12:40:45';

    WHILE i < NumRows DO
        EXECUTE stmt USING @v1, @v2, @v3, @v4, @v5, @v6;
        SET i = i + 1;
    END WHILE;

    DEALLOCATE PREPARE stmt;
END$$
DELIMITER ;

Completed in ~3 min:

在约3分钟内完成:

mysql> CALL inputRowsNoRandom(1000000);
Query OK, 0 rows affected (2 min 51.57 sec)

Feel the difference 8 sec vs 3 min

感觉差异8秒对3分钟

UPDATE2 To speed things up we can explicitly use transactions and commit insertions in batches. So here it goes an improved version of the SP.

UPDATE2为了加快速度,我们可以批量使用事务并提交插入。所以这里是SP的改进版本。

DELIMITER $$
CREATE PROCEDURE inputRowsNoRandom1(IN NumRows BIGINT, IN BatchSize INT)
BEGIN
    DECLARE i INT DEFAULT 0;

    PREPARE stmt 
       FROM 'INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)
             VALUES(?, ?, ?, ?, ?, ?)';
    SET @v1 = 1, @v2 = 'a1', @v3 = 100, @v4 = 1, @v5 = 500000, @v6 = '2013-06-14 12:40:45';

    START TRANSACTION;
    WHILE i < NumRows DO
        EXECUTE stmt USING @v1, @v2, @v3, @v4, @v5, @v6;
        SET i = i + 1;
        IF i % BatchSize = 0 THEN 
            COMMIT;
            START TRANSACTION;
        END IF;
    END WHILE;
    COMMIT;
    DEALLOCATE PREPARE stmt;
END$$
DELIMITER ;

Results with different batch sizes:

不同批量的结果:

mysql> CALL inputRowsNoRandom1(1000000,1000);
Query OK, 0 rows affected (27.25 sec)

mysql> CALL inputRowsNoRandom1(1000000,10000);
Query OK, 0 rows affected (26.76 sec)

mysql> CALL inputRowsNoRandom1(1000000,100000);
Query OK, 0 rows affected (26.43 sec)

You see the difference yourself. Still > 3 times worse than cross join.

你自己看到了不同之处。仍然比交叉连接差3倍。

#1


5  

Don't use loops especially on that scale in RDBMS.

不要在RDBMS中使用特别是那种规模的循环。

Try to quickly fill your table with 1m rows with a query

尝试使用查询快速填充1m行的表

INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)
SELECT 1, 'a1', 100, 1, 500000, '2013-06-14 12:40:45'
  FROM
(
select a.N + b.N * 10 + c.N * 100 + d.N * 1000 + e.N * 10000 + f.N * 100000 + 1 N
from (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) b
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) c
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) d
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) e
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) f
) t

It took on my box (MacBook Pro 16GB RAM, 2.6Ghz Intel Core i7) ~8 sec to complete

它用我的盒子(MacBook Pro 16GB RAM,2.6Ghz Intel Core i7)〜8秒完成

Query OK, 1000000 rows affected (7.63 sec)
Records: 1000000  Duplicates: 0  Warnings: 0

UPDATE1 Now a version of a stored procedure that uses a prepared statement

UPDATE1现在是使用预准备语句的存储过程的一个版本

DELIMITER $$
CREATE PROCEDURE `inputRowsNoRandom`(IN NumRows INT)
BEGIN
    DECLARE i INT DEFAULT 0;

    PREPARE stmt 
       FROM 'INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)
             VALUES(?, ?, ?, ?, ?, ?)';
    SET @v1 = 1, @v2 = 'a1', @v3 = 100, @v4 = 1, @v5 = 500000, @v6 = '2013-06-14 12:40:45';

    WHILE i < NumRows DO
        EXECUTE stmt USING @v1, @v2, @v3, @v4, @v5, @v6;
        SET i = i + 1;
    END WHILE;

    DEALLOCATE PREPARE stmt;
END$$
DELIMITER ;

Completed in ~3 min:

在约3分钟内完成:

mysql> CALL inputRowsNoRandom(1000000);
Query OK, 0 rows affected (2 min 51.57 sec)

Feel the difference 8 sec vs 3 min

感觉差异8秒对3分钟

UPDATE2 To speed things up we can explicitly use transactions and commit insertions in batches. So here it goes an improved version of the SP.

UPDATE2为了加快速度,我们可以批量使用事务并提交插入。所以这里是SP的改进版本。

DELIMITER $$
CREATE PROCEDURE inputRowsNoRandom1(IN NumRows BIGINT, IN BatchSize INT)
BEGIN
    DECLARE i INT DEFAULT 0;

    PREPARE stmt 
       FROM 'INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)
             VALUES(?, ?, ?, ?, ?, ?)';
    SET @v1 = 1, @v2 = 'a1', @v3 = 100, @v4 = 1, @v5 = 500000, @v6 = '2013-06-14 12:40:45';

    START TRANSACTION;
    WHILE i < NumRows DO
        EXECUTE stmt USING @v1, @v2, @v3, @v4, @v5, @v6;
        SET i = i + 1;
        IF i % BatchSize = 0 THEN 
            COMMIT;
            START TRANSACTION;
        END IF;
    END WHILE;
    COMMIT;
    DEALLOCATE PREPARE stmt;
END$$
DELIMITER ;

Results with different batch sizes:

不同批量的结果:

mysql> CALL inputRowsNoRandom1(1000000,1000);
Query OK, 0 rows affected (27.25 sec)

mysql> CALL inputRowsNoRandom1(1000000,10000);
Query OK, 0 rows affected (26.76 sec)

mysql> CALL inputRowsNoRandom1(1000000,100000);
Query OK, 0 rows affected (26.43 sec)

You see the difference yourself. Still > 3 times worse than cross join.

你自己看到了不同之处。仍然比交叉连接差3倍。