I'm working with a huge table which has 250+ million rows. The schema is simple.
我正在处理一个有2.5亿多行的大表。模式很简单。
CREATE TABLE MyTable (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
oid INT NOT NULL,
long1 BIGINT NOT NULL,
str1 VARCHAR(30) DEFAULT NULL,
str2 VARCHAR(30) DEFAULT NULL,
str2 VARCHAR(200) DEFAULT NULL,
str4 VARCHAR(50) DEFAULT NULL,
int1 INT(6) DEFAULT NULL,
str5 VARCHAR(300) DEFAULT NULL,
date1 DATE DEFAULT NULL,
date2 DATE DEFAULT NULL,
lastUpdated TIMESTAMP NOT NULL,
hashcode INT NOT NULL,
active TINYINT(1) DEFAULT 1,
KEY oid(oid),
KEY lastUpdated(lastUpdated),
UNIQUE KEY (hashcode, active),
KEY (active)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 MAX_ROWS=1000000000;
The performance of insert has dropped significantly. Upto 150 million rows in the table, it used to take 5-6 seconds to insert 10,000 rows. Now it has gone up by 2-4 times. Innodb's ibdata file has grown to 107 GB. Innodb configuration parameters are as follows.
插入的性能显著下降。在表中有1.5亿行,它使用5-6秒插入10,000行。现在上升了2-4倍。Innodb的ibdata文件已经增长到107gb。Innodb配置参数如下。
innodb_buffer_pool_size = 36G # Machine has 48G memory
innodb_additional_mem_pool_size = 20M
innodb_data_file_path = ibdata1:10M:autoextend
innodb_log_file_size = 50M
innodb_log_buffer_size = 20M
innodb_log_files_in_group=2
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
expire_logs_days = 4
IO wait time has gone up as seen with top
. I have tried changing the flush method to O_DSYNC, but it didn't help. The disk is carved out of hardware RAID 10 setup. In an earlier setup with single disk, IO was not a problem.
IO等待时间随着顶部的增加而增加。我尝试过将刷新方法更改为O_DSYNC,但是没有帮助。磁盘是由硬件RAID 10设置雕刻出来的。在早期的单磁盘设置中,IO不是问题。
Is partitioning the table only option? Can splitting single 100G file into "smaller" files help? Are there any variables that need to be tuned for RAID?
是否只对表进行分区?将单个100G文件分割成“更小”的文件会有帮助吗?是否有任何变量需要调整为RAID?
Update: This is a test system. I have the freedom to make any changes required.
更新:这是一个测试系统。我有*做出任何必要的改变。
6 个解决方案
#1
14
You didn't say whether this was a test system or production; I'm assuming it's production.
你没有说这是一个测试系统还是生产;我假设它的生产。
It is likely that you've got the table to a size where its indexes (or the whole lot) no longer fits in memory.
很可能您已经将表的大小设置为其索引(或全部)不再适合内存的大小。
This means that InnoDB must read pages in during inserts (depending on the distribution of your new rows' index values). Reading pages (random reads) is really slow and needs to be avoided if possible.
这意味着InnoDB必须在插入期间读取页面(取决于新行索引值的分布)。阅读页面(随机阅读)非常缓慢,如果可能的话需要避免。
Partitioning seems like the most obvious solution, but MySQL's partitioning may not fit your use-case.
分区似乎是最明显的解决方案,但是MySQL的分区可能不适合您的用例。
You should certainly consider all possible options - get the table on to a test server in your lab to see how it behaves.
当然,您应该考虑所有可能的选项——将表放到实验室中的测试服务器上,以查看它的行为。
Your primary key looks to me as if it's possibly not required (you have another unique index), so eliminating that is one option.
在我看来,您的主键似乎不需要它(您有另一个惟一索引),因此消除它是一种选择。
Also consider the innodb plugin and compression, this will make your innodb_buffer_pool go further.
还要考虑innodb插件和压缩,这会使innodb_buffer_pool走得更远。
You really need to analyse your use-cases to decide whether you actually need to keep all this data, and whether partitioning is a sensible solution.
您确实需要分析您的用例,以决定是否需要保留所有这些数据,以及分区是否是一个合理的解决方案。
Making any changes on this application are likely to introduce new performance problems for your users, so you want to be really careful here. If you find a way to improve insert performance, it is possible that it will reduce search performance or performance of other operations. You will need to do a thorough performance test on production-grade hardware before releasing such a change.
对这个应用程序进行任何更改都可能会为您的用户带来新的性能问题,所以您在这里需要非常小心。如果您找到了提高插入性能的方法,它可能会降低搜索性能或其他操作的性能。在发布此类更改之前,您需要对生产级硬件进行彻底的性能测试。
#2
4
From my experience with Innodb it seems to hit a limit for write intensive systems even if you have a really optimized disk subsystem. I am surprised you managed to get it up to 100GB.
从我在Innodb上的经验来看,即使您有一个真正优化过的磁盘子系统,对编写密集的系统来说,这似乎也是一个极限。我很惊讶你竟然能把它调到100GB。
This is what twitter hit into a while ago and realized it needed to shard - see http://github.com/twitter/gizzard.
这就是twitter不久前遇到的问题,并意识到它需要shard——参见http://github.com/twitter/gizzard。
This all depends on your use cases but you could also move from mysql to cassandra as it performs really well for write intensive applications.(http://cassandra.apache.org)
这一切都取决于您的用例,但是您也可以从mysql迁移到cassandra,因为它在编写高强度的应用程序时表现良好。
#3
1
As MarkR commented above, insert performance gets worse when indexes can no longer fit in your buffer pool. InnoDB has a random IO reduction mechanism (called the insert buffer) which prevents some of this problem - but it will not work on your UNIQUE index. The index on (hashcode, active) has to be checked on each insert make sure no duplicate entries are inserted. If the hashcode does not 'follow' the primary key, this checking could be random IO.
正如MarkR所评论的那样,当索引不再适合您的缓冲池时,插入性能会变得更糟。InnoDB有一个随机的IO减少机制(称为插入缓冲区),它可以防止某些问题——但是它不会对您的惟一索引起作用。必须在每次插入时检查(hashcode, active)上的索引,确保没有插入重复的条目。如果hashcode没有'follow'主键,那么这个检查可以是random IO。
Do you have the possibility to change the schema?
你有可能改变方案吗?
Your best bet is to:
你最好的选择是:
(a) Make hashcode someone sequential, or sort by hashcode before bulk inserting (this by itself will help, since random reads will be reduced).
(a)将hashcode设置为按顺序排列,或者在批量插入之前按hashcode进行排序(这本身会有所帮助,因为随机读操作会减少)。
(b) Make (hashcode,active) the primary key - and insert data in sorted order. I am guessing your application probably reads by hashcode - and a primary key lookup is faster.
(b)使(hashcode,active)为主键,并按排序顺序插入数据。我猜您的应用程序可能是通过hashcode读取的——而主键查找速度更快。
#4
1
You didn't mention what your workload is like, but if there are not too many reads or you have enough main-memory, another option is to use a write-optimized backend for MySQL, instead of innodb. Tokutek claims 18x faster inserts and a much more flat performance curve as the dataset grows.
您没有提到您的工作负载是什么样子的,但是如果没有太多的读操作或者您有足够的主内存,那么另一个选项是为MySQL使用写优化后端,而不是innodb。Tokutek声称随着数据集的增长,18倍的插入速度和更平坦的性能曲线。
tokutek.com
tokutek.com
http://tokutek.com/downloads/tokudb-performance-brief.pdf
http://tokutek.com/downloads/tokudb-performance-brief.pdf
#5
0
I'll second @MarkR's comments about reducing the indexes. One other thing you should look at is increasing your innodb_log_file_size. It increases the crash recovery time, but should help. Be aware you need to remove the old files before you restart the server.
我将支持@MarkR关于减少索引的评论。另一件需要注意的事情是增加innodb_log_file_size。它增加了崩溃恢复时间,但应该会有所帮助。请注意,在重新启动服务器之前需要删除旧文件。
General InnoDB tuning tips: http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/
一般的InnoDB调优技巧:http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/
You should also be aware of LOAD DATA INFILE
for doing inserts. It's much faster.
您还应该了解用于执行插入的LOAD DATA INFILE。这是快得多。
#6
0
Increase from innodb_log_file_size = 50M
to innodb_log_file_size = 500M
从innodb_log_file_size = 50M增加到innodb_log_file_size = 500M
And the innodb_flush_log_at_trx_commit
should be 0 if you bear 1 sec data loss.
如果你承受1秒的数据损失,innodb_flush_log_at_trx_commit应该是0。
#1
14
You didn't say whether this was a test system or production; I'm assuming it's production.
你没有说这是一个测试系统还是生产;我假设它的生产。
It is likely that you've got the table to a size where its indexes (or the whole lot) no longer fits in memory.
很可能您已经将表的大小设置为其索引(或全部)不再适合内存的大小。
This means that InnoDB must read pages in during inserts (depending on the distribution of your new rows' index values). Reading pages (random reads) is really slow and needs to be avoided if possible.
这意味着InnoDB必须在插入期间读取页面(取决于新行索引值的分布)。阅读页面(随机阅读)非常缓慢,如果可能的话需要避免。
Partitioning seems like the most obvious solution, but MySQL's partitioning may not fit your use-case.
分区似乎是最明显的解决方案,但是MySQL的分区可能不适合您的用例。
You should certainly consider all possible options - get the table on to a test server in your lab to see how it behaves.
当然,您应该考虑所有可能的选项——将表放到实验室中的测试服务器上,以查看它的行为。
Your primary key looks to me as if it's possibly not required (you have another unique index), so eliminating that is one option.
在我看来,您的主键似乎不需要它(您有另一个惟一索引),因此消除它是一种选择。
Also consider the innodb plugin and compression, this will make your innodb_buffer_pool go further.
还要考虑innodb插件和压缩,这会使innodb_buffer_pool走得更远。
You really need to analyse your use-cases to decide whether you actually need to keep all this data, and whether partitioning is a sensible solution.
您确实需要分析您的用例,以决定是否需要保留所有这些数据,以及分区是否是一个合理的解决方案。
Making any changes on this application are likely to introduce new performance problems for your users, so you want to be really careful here. If you find a way to improve insert performance, it is possible that it will reduce search performance or performance of other operations. You will need to do a thorough performance test on production-grade hardware before releasing such a change.
对这个应用程序进行任何更改都可能会为您的用户带来新的性能问题,所以您在这里需要非常小心。如果您找到了提高插入性能的方法,它可能会降低搜索性能或其他操作的性能。在发布此类更改之前,您需要对生产级硬件进行彻底的性能测试。
#2
4
From my experience with Innodb it seems to hit a limit for write intensive systems even if you have a really optimized disk subsystem. I am surprised you managed to get it up to 100GB.
从我在Innodb上的经验来看,即使您有一个真正优化过的磁盘子系统,对编写密集的系统来说,这似乎也是一个极限。我很惊讶你竟然能把它调到100GB。
This is what twitter hit into a while ago and realized it needed to shard - see http://github.com/twitter/gizzard.
这就是twitter不久前遇到的问题,并意识到它需要shard——参见http://github.com/twitter/gizzard。
This all depends on your use cases but you could also move from mysql to cassandra as it performs really well for write intensive applications.(http://cassandra.apache.org)
这一切都取决于您的用例,但是您也可以从mysql迁移到cassandra,因为它在编写高强度的应用程序时表现良好。
#3
1
As MarkR commented above, insert performance gets worse when indexes can no longer fit in your buffer pool. InnoDB has a random IO reduction mechanism (called the insert buffer) which prevents some of this problem - but it will not work on your UNIQUE index. The index on (hashcode, active) has to be checked on each insert make sure no duplicate entries are inserted. If the hashcode does not 'follow' the primary key, this checking could be random IO.
正如MarkR所评论的那样,当索引不再适合您的缓冲池时,插入性能会变得更糟。InnoDB有一个随机的IO减少机制(称为插入缓冲区),它可以防止某些问题——但是它不会对您的惟一索引起作用。必须在每次插入时检查(hashcode, active)上的索引,确保没有插入重复的条目。如果hashcode没有'follow'主键,那么这个检查可以是random IO。
Do you have the possibility to change the schema?
你有可能改变方案吗?
Your best bet is to:
你最好的选择是:
(a) Make hashcode someone sequential, or sort by hashcode before bulk inserting (this by itself will help, since random reads will be reduced).
(a)将hashcode设置为按顺序排列,或者在批量插入之前按hashcode进行排序(这本身会有所帮助,因为随机读操作会减少)。
(b) Make (hashcode,active) the primary key - and insert data in sorted order. I am guessing your application probably reads by hashcode - and a primary key lookup is faster.
(b)使(hashcode,active)为主键,并按排序顺序插入数据。我猜您的应用程序可能是通过hashcode读取的——而主键查找速度更快。
#4
1
You didn't mention what your workload is like, but if there are not too many reads or you have enough main-memory, another option is to use a write-optimized backend for MySQL, instead of innodb. Tokutek claims 18x faster inserts and a much more flat performance curve as the dataset grows.
您没有提到您的工作负载是什么样子的,但是如果没有太多的读操作或者您有足够的主内存,那么另一个选项是为MySQL使用写优化后端,而不是innodb。Tokutek声称随着数据集的增长,18倍的插入速度和更平坦的性能曲线。
tokutek.com
tokutek.com
http://tokutek.com/downloads/tokudb-performance-brief.pdf
http://tokutek.com/downloads/tokudb-performance-brief.pdf
#5
0
I'll second @MarkR's comments about reducing the indexes. One other thing you should look at is increasing your innodb_log_file_size. It increases the crash recovery time, but should help. Be aware you need to remove the old files before you restart the server.
我将支持@MarkR关于减少索引的评论。另一件需要注意的事情是增加innodb_log_file_size。它增加了崩溃恢复时间,但应该会有所帮助。请注意,在重新启动服务器之前需要删除旧文件。
General InnoDB tuning tips: http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/
一般的InnoDB调优技巧:http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/
You should also be aware of LOAD DATA INFILE
for doing inserts. It's much faster.
您还应该了解用于执行插入的LOAD DATA INFILE。这是快得多。
#6
0
Increase from innodb_log_file_size = 50M
to innodb_log_file_size = 500M
从innodb_log_file_size = 50M增加到innodb_log_file_size = 500M
And the innodb_flush_log_at_trx_commit
should be 0 if you bear 1 sec data loss.
如果你承受1秒的数据损失,innodb_flush_log_at_trx_commit应该是0。