一个简单的插入查询,在InnoDB上花费太多。

时间:2020-12-18 02:47:39

I have this simple query:

我有一个简单的问题:

INSERT IGNORE INTO beststat (bestid,period,rawView) VALUES ( 4510724 , 201205 , 1 ) 

On the table:

表:

CREATE TABLE `beststat` (
 `bestid` int(11) unsigned NOT NULL,
 `period` mediumint(8) unsigned NOT NULL,
 `view` mediumint(8) unsigned NOT NULL DEFAULT '0',
 `rawView` mediumint(8) unsigned NOT NULL DEFAULT '0',
 PRIMARY KEY (`bestid`,`period`),
) ENGINE=InnoDB AUTO_INCREMENT=2020577 DEFAULT CHARSET=utf8

And it takes 1 sec to completes.

完成需要1秒。


Side Note: actually it doesn't take always 1sec. Sometime it's done even in 0.05 sec. But often it takes 1 sec

旁注:实际上并不总是需要1秒。有时它甚至在0.05秒内完成,但通常需要1秒。


This table (beststat) currently has ~500'000 records and its size is: 40MB. I have 4GB RAM and innodb buffer pool size = 104,857,600, with: Mysql: 5.1.49-3

这个表(beststat)目前有大约50万条记录,其大小为:40MB。我有4GB的RAM和innodb缓冲池大小= 104,857,600,其中:Mysql: 5.1.4 -3

This is the only InnoDB table in my database (others are MyISAM)

这是我数据库中唯一的InnoDB表(其他是MyISAM)

ANALYZE TABLE beststat shows: OK

分析表,显示:好的

Maybe there is something wrong with InnoDB settings?

也许InnoDB设置有问题?

2 个解决方案

#1


1  

So you have two unique indexes on the table. You primary key is a autonumber. Since this is not really part of the data as you add it to the data it is what you call a artificial primary key. Now you have a unique index on bestid and period. If bestid and period are supposed to be unique that would be a good candidate for the primary key.

表上有两个唯一的索引。你的主键是自动编号。因为这并不是数据的一部分,当你把它添加到数据中时,它就是你所说的人工主键。现在你有了一个独特的bestid和周期指数。如果bestid和周期被认为是唯一的,那将是一个很好的候选人的主要关键。

Innodb stores the table either as a tree or a heap. If you don't define a primary key on a innodb table it is a heap if you define a primary key it is defined as a tree on disk. So in your case the tree is stored on disk based on the autonumber key. So when you create the second index it actually creates a second tree on disk with the bestid and period values in the index. The index does not contain the other columns in the table only bestid, period and you primary key value.

Innodb将表存储为树或堆。如果没有在innodb表上定义主键,那么它就是堆,如果定义了主键,它就被定义为磁盘上的树。在你的例子中树是基于自动编号键存储在磁盘上的。因此,当你创建第二个索引时,它实际上会在磁盘上创建第二个树,在索引中有bestid和period值。索引不包含表中的其他列,只包含bestid、句点和主键值。

Ok so now you insert the data first thing myself does is to ensure the unique index is always unique. Thus it read the index to see if you are trying to insert a duplicate value. This is where the slow down comes into play. It first has to ensure uniqueness then if it passes the test write data. Then it also has to insert the bestid, period and primary key value into the unique index. So total operation would be 1 read index for value 1 insert row into table 1 insert bestid and period into index. A total of three operations. If you removed the autonumber and used only the unique index as the primary key it would read table if unique insert into table. In this case you would have the following number of operations 1 read table to check values 1 insert into tables. This is two operations vs three. So you do 33% less work by removing the redundant autonumber.

现在你插入数据首先要做的是确保唯一索引总是唯一的。因此,它读取索引以查看您是否试图插入一个重复的值。这就是慢下来的原因。它首先必须确保唯一性,然后如果它通过了测试写数据。然后还必须将bestid、周期和主键值插入到唯一索引中。所以总的操作是1读取索引值1插入行到表1插入bestid,周期进入索引。总共有三个操作。如果您删除了自动编号,并且只使用唯一索引作为主键,那么当惟一插入到表中时,它将读取表。在这种情况下,您将拥有以下操作数1 read table来检查值1 insert into tables。这是两个操作对三个操作。通过删除冗余的自动编号,可以减少33%的工作量。

I hope this is clear as I am typing from my Android and autocorrect keeps on changing innodb to inborn. Wish I was at a computer.

我希望这是清楚的,因为我从我的Android和autocorrect不断地把innodb改为inborn。但愿我在电脑前。

#2


2  

I ran some simulations about 3 years ago as part of some evaluation project for a customer. They had a requirement to be able to search a table where data is constantly being added, and they wanted to be up to date up to a minute.

我在3年前做了一些模拟,作为客户评估项目的一部分。他们要求能够搜索一个数据不断被添加的表格,并且他们想要更新到一分钟。

InnoDB has shown much better results in the beginning, but has quickly deteriorated (much before 1mil records), until I have removed all indexes (including primary). At that point InnoDB has become superior to MyISAM when executing inserts/updates. (I have much worse HW then you, executing tests only on my laptop.)

InnoDB在开始时显示了更好的结果,但是很快就恶化了(在1mil记录之前),直到我删除了所有索引(包括主索引)。此时,InnoDB在执行插入/更新时已经优于MyISAM。(我比你差多了,只在我的笔记本上执行测试。)

Conclusion: Insert will always suffer if you have indexes, and especially unique.

结论:如果您有索引,并且特别独特,插入总是会受到影响。

I would suggest following optimization:

我建议如下优化:

  1. Remove all indexes from your beststat table and use it as a simple dump.
  2. 从您的beststat表中删除所有索引,并将其用作一个简单的转储。
  3. If you really need these unique indexes, consider some programmable solution (like remembering the max bestid at all time, and insisting that the new record is above that number - and immediately increasing this number. (But do you really need so many unique fields - and they all sound to me just like indexes.)
  4. 如果你真的需要这些唯一的索引,考虑一些可编程的解决方案(比如随时记住最大的bestid,并坚持新的记录在那个数字之上——并立即增加这个数字。(但你真的需要这么多独特的字段吗?它们听起来就像索引一样。)
  5. Have a background thread move new records from InnoDB to another table (which can be MyISAM) where they would be indexed.
  6. 有一个后台线程将新记录从InnoDB移动到另一个表(可以是MyISAM),在那里它们将被索引。
  7. Consider dropping indexes temporarily and then after bulk update re-indexing the table, possibly switching two tables so that querying is never interrupted.
  8. 考虑暂时删除索引,然后在批量更新后重新索引表,可能切换两个表,以便查询不会中断。

These are theoretical solutions, I admit, but is the best I can say given your question.

我承认,这些都是理论上的解决方案,但就你的问题而言,这是我能说的最好的了。

Oh, and if your table is planned to grow to many millions, consider a NoSQL solution.

哦,如果您的表计划增长到数百万,请考虑使用NoSQL解决方案。

#1


1  

So you have two unique indexes on the table. You primary key is a autonumber. Since this is not really part of the data as you add it to the data it is what you call a artificial primary key. Now you have a unique index on bestid and period. If bestid and period are supposed to be unique that would be a good candidate for the primary key.

表上有两个唯一的索引。你的主键是自动编号。因为这并不是数据的一部分,当你把它添加到数据中时,它就是你所说的人工主键。现在你有了一个独特的bestid和周期指数。如果bestid和周期被认为是唯一的,那将是一个很好的候选人的主要关键。

Innodb stores the table either as a tree or a heap. If you don't define a primary key on a innodb table it is a heap if you define a primary key it is defined as a tree on disk. So in your case the tree is stored on disk based on the autonumber key. So when you create the second index it actually creates a second tree on disk with the bestid and period values in the index. The index does not contain the other columns in the table only bestid, period and you primary key value.

Innodb将表存储为树或堆。如果没有在innodb表上定义主键,那么它就是堆,如果定义了主键,它就被定义为磁盘上的树。在你的例子中树是基于自动编号键存储在磁盘上的。因此,当你创建第二个索引时,它实际上会在磁盘上创建第二个树,在索引中有bestid和period值。索引不包含表中的其他列,只包含bestid、句点和主键值。

Ok so now you insert the data first thing myself does is to ensure the unique index is always unique. Thus it read the index to see if you are trying to insert a duplicate value. This is where the slow down comes into play. It first has to ensure uniqueness then if it passes the test write data. Then it also has to insert the bestid, period and primary key value into the unique index. So total operation would be 1 read index for value 1 insert row into table 1 insert bestid and period into index. A total of three operations. If you removed the autonumber and used only the unique index as the primary key it would read table if unique insert into table. In this case you would have the following number of operations 1 read table to check values 1 insert into tables. This is two operations vs three. So you do 33% less work by removing the redundant autonumber.

现在你插入数据首先要做的是确保唯一索引总是唯一的。因此,它读取索引以查看您是否试图插入一个重复的值。这就是慢下来的原因。它首先必须确保唯一性,然后如果它通过了测试写数据。然后还必须将bestid、周期和主键值插入到唯一索引中。所以总的操作是1读取索引值1插入行到表1插入bestid,周期进入索引。总共有三个操作。如果您删除了自动编号,并且只使用唯一索引作为主键,那么当惟一插入到表中时,它将读取表。在这种情况下,您将拥有以下操作数1 read table来检查值1 insert into tables。这是两个操作对三个操作。通过删除冗余的自动编号,可以减少33%的工作量。

I hope this is clear as I am typing from my Android and autocorrect keeps on changing innodb to inborn. Wish I was at a computer.

我希望这是清楚的,因为我从我的Android和autocorrect不断地把innodb改为inborn。但愿我在电脑前。

#2


2  

I ran some simulations about 3 years ago as part of some evaluation project for a customer. They had a requirement to be able to search a table where data is constantly being added, and they wanted to be up to date up to a minute.

我在3年前做了一些模拟,作为客户评估项目的一部分。他们要求能够搜索一个数据不断被添加的表格,并且他们想要更新到一分钟。

InnoDB has shown much better results in the beginning, but has quickly deteriorated (much before 1mil records), until I have removed all indexes (including primary). At that point InnoDB has become superior to MyISAM when executing inserts/updates. (I have much worse HW then you, executing tests only on my laptop.)

InnoDB在开始时显示了更好的结果,但是很快就恶化了(在1mil记录之前),直到我删除了所有索引(包括主索引)。此时,InnoDB在执行插入/更新时已经优于MyISAM。(我比你差多了,只在我的笔记本上执行测试。)

Conclusion: Insert will always suffer if you have indexes, and especially unique.

结论:如果您有索引,并且特别独特,插入总是会受到影响。

I would suggest following optimization:

我建议如下优化:

  1. Remove all indexes from your beststat table and use it as a simple dump.
  2. 从您的beststat表中删除所有索引,并将其用作一个简单的转储。
  3. If you really need these unique indexes, consider some programmable solution (like remembering the max bestid at all time, and insisting that the new record is above that number - and immediately increasing this number. (But do you really need so many unique fields - and they all sound to me just like indexes.)
  4. 如果你真的需要这些唯一的索引,考虑一些可编程的解决方案(比如随时记住最大的bestid,并坚持新的记录在那个数字之上——并立即增加这个数字。(但你真的需要这么多独特的字段吗?它们听起来就像索引一样。)
  5. Have a background thread move new records from InnoDB to another table (which can be MyISAM) where they would be indexed.
  6. 有一个后台线程将新记录从InnoDB移动到另一个表(可以是MyISAM),在那里它们将被索引。
  7. Consider dropping indexes temporarily and then after bulk update re-indexing the table, possibly switching two tables so that querying is never interrupted.
  8. 考虑暂时删除索引,然后在批量更新后重新索引表,可能切换两个表,以便查询不会中断。

These are theoretical solutions, I admit, but is the best I can say given your question.

我承认,这些都是理论上的解决方案,但就你的问题而言,这是我能说的最好的了。

Oh, and if your table is planned to grow to many millions, consider a NoSQL solution.

哦,如果您的表计划增长到数百万,请考虑使用NoSQL解决方案。