I have a fairly simple process running that periodically pulls RSS feeds and updates articles in a MySQL database.
我有一个相当简单的进程在运行,它定期在MySQL数据库中提取RSS提要并更新文章。
The articles table is filled to about 130k rows right now. For each article found, the processor checks to see if the article already exists. These queries almost always take 300 milliseconds, and about every 10 or 20 tries, they take more than 2 seconds.
文章表现在已经填充到大约130k行。对于找到的每一篇文章,处理器检查文章是否已经存在。这些查询几乎总是需要300毫秒,大约每10到20次,就需要超过2秒。
SELECT id FROM `articles` WHERE (guid = 'http://example.com/feed.rss') LIMIT 1;
# Query_time: 2.754567 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0
I have an index on the guid column but whenever a new article is encountered, it's added to the articles table - invalidating the query cache (right?).
我在guid列上有一个索引,但是每当遇到新文章时,它就被添加到articles表中—使查询缓存无效(对吗?)
Some of the other fields in the slow query log report 120+ rows examined.
慢速查询日志中的其他一些字段报告了超过120行。
Of course on my development machine, these queries take about 0.2 milliseconds.
当然,在我的开发机器上,这些查询需要0.2毫秒。
The server is a virtual host from Engine Yard Solo (EC2) with 1.7GB of memory and whatever CPU EC2 ships with these days.
服务器是一个来自Engine Yard Solo (EC2)的虚拟主机,拥有1.7GB的内存,以及当今所有CPU EC2提供的服务。
Any advice would be greatly appreciated.
如有任何建议,将不胜感激。
Update
更新
As it turns out the problem was between the chair and the keyboard.
原来问题出在椅子和键盘之间。
I had an index on 'id', but was querying on 'guid'.
我有一个关于“id”的索引,但在“guid”上查询。
Adding an index on 'guid' brought the query time down to 0.2 ms each.
在“guid”上添加索引将查询时间减少到0.2 ms。
Thanks for all the helpful tips everyone!
谢谢大家的帮助!
4 个解决方案
#1
4
Run:
运行:
EXPLAIN SELECT id FROM `articles` WHERE (guid = 'http://example.com/feed.rss') LIMIT 1;
Notice the EXPLAIN
in front. That'll tell you what MySQL is doing. Its hard to believe probing one row from an index could ever take 2.7s, unless your machine is seriously overloaded and/or thrashing. Considering the row counts of 0, I'm guessing MySQL did a full table scan to find nothing, which probably means you don't have the index you think you do.
注意前面的解释。它会告诉你MySQL在做什么。很难相信从索引中探查一行可能需要2.7秒,除非您的机器严重超载或抖动。考虑到行数为0,我猜MySQL做了一个完整的表扫描,没有发现任何东西,这可能意味着您没有您认为的索引。
To answer your other question, whenever you make any change to the articles
table, all the query cache entries involving that table are invalidated.
为了回答您的另一个问题,每当您对articles表做任何更改时,涉及该表的所有查询缓存条目都将无效。
#2
1
The log says that no rows were read or even examined, so the problem is not with your query but most likely with your server. EC2's Achilles' heel is its IO/s, perhaps MySQL had to load the index from disk but the server's disks were completely saturated.
日志说没有读取或检查任何行,因此问题不在于查询,而在于您的服务器。EC2的致命弱点是IO/s,也许MySQL必须从磁盘加载索引,但是服务器的磁盘完全饱和了。
If your index is small enough to fit in memory (make sure your my.cnf allocates enough memory to key_buffer
(MyISAM) or innodb_buffer_pool_size
(InnoDB)), you should be able to preload it using
如果您的索引足够小,以适合内存(请确保您的my.cnf分配足够的内存到key_buffer (MyISAM)或innodb_buffer_pool_size (InnoDB)),您应该能够预加载它。
SELECT guid FROM articles
Check out the EXPLAIN to make sure it says "Using Index." If it doesn't, this one should:
检查解释以确保它说“使用索引”。如果没有,这个应该:
SELECT guid FROM articles FORCE INDEX (guid) WHERE LENGTH(guid) > 0
Alternatively, if guid
isn't your PRIMARY KEY or UNIQUE, you may remove its index and create another indexed column used to retrieve records quickly at a fraction of the index size. The column guid_crc32
would be an INT UNSIGNED and would hold the CRC32 of guid
另外,如果guid不是您的主键或唯一键,您可以删除它的索引并创建另一个索引列,用于在索引大小的一小部分快速检索记录。列guid_crc32将是一个无符号的INT型,并保存guid的CRC32
ALTER TABLE articles ADD COLUMN guid_crc32 INT UNSIGNED, ADD INDEX guid_crc32 (guid_crc32);
UPDATE articles SET guid_crc32 = CRC32(guid);
Your SELECT query would then look like this:
您的SELECT查询将如下所示:
SELECT id FROM articles WHERE guid = 'http://example.com/feed.rss' AND guid_crc32 = CRC32('http://example.com/feed.rss') LIMIT 1;
The optimizer should use the index on guid_crc32
, which should be both faster and smaller than searching through guid
.
优化器应该使用guid_crc32上的索引,它应该比通过guid搜索更快、更小。
#3
0
if this table gets updated alot then mysql may not update the index-counts properly. try "CHECK TABLE articles" to update the index counts and see if your table is fine.
如果这个表更新了很多,那么mysql可能不会正确地更新索引计数。尝试“检查表项”来更新索引计数,看看您的表是否正常。
also try to see if doing EXPLAIN on your query give the same results on your dev and prod machines. if the results are different try OPTIMIZE TABLE.
还要尝试查看在查询中执行EXPLAIN是否会在开发和prod机器上产生相同的结果。如果结果不同,请尝试优化表。
Are these myisam or innodb tables?
这些是myisam还是innodb表?
#4
0
Assuming GUID is indexed and ID is your primary key, something is "wrong." In that scenario, it is an index only query. The index is being bumped from memory and the disks are busy, perhaps.
假设GUID已被索引,而ID是您的主键,那么有些东西是“错误的”。在该场景中,它是一个仅用于索引的查询。索引正在从内存中被删除,磁盘可能很忙。
Depending on your update / insert / delete pattern, you database may be crying for an "optimize" command.
根据您的更新/插入/删除模式,您的数据库可能需要一个“优化”命令。
SQL Commands I'd like to see the output of:
我想看到的SQL命令输出是:
show table status like 'articles';
explain SELECT id FROM `articles` WHERE (guid = 'http://example.com/feed.rss') LIMIT 1;
explain articles;
System commands I'd like to see the output of (assuming Linux):
系统命令我想看到输出(假设Linux):
iostat 5 5
Tell us how much memory you have because 1.7mb is wrong, or something really exciting is happening.
告诉我们你有多少内存,因为1.7mb是错误的,或者一些真正令人兴奋的事情正在发生。
Edit how much memory is available to your SQL server in my.cnf?
编辑my.cnf中的SQL服务器有多少内存可用?
#1
4
Run:
运行:
EXPLAIN SELECT id FROM `articles` WHERE (guid = 'http://example.com/feed.rss') LIMIT 1;
Notice the EXPLAIN
in front. That'll tell you what MySQL is doing. Its hard to believe probing one row from an index could ever take 2.7s, unless your machine is seriously overloaded and/or thrashing. Considering the row counts of 0, I'm guessing MySQL did a full table scan to find nothing, which probably means you don't have the index you think you do.
注意前面的解释。它会告诉你MySQL在做什么。很难相信从索引中探查一行可能需要2.7秒,除非您的机器严重超载或抖动。考虑到行数为0,我猜MySQL做了一个完整的表扫描,没有发现任何东西,这可能意味着您没有您认为的索引。
To answer your other question, whenever you make any change to the articles
table, all the query cache entries involving that table are invalidated.
为了回答您的另一个问题,每当您对articles表做任何更改时,涉及该表的所有查询缓存条目都将无效。
#2
1
The log says that no rows were read or even examined, so the problem is not with your query but most likely with your server. EC2's Achilles' heel is its IO/s, perhaps MySQL had to load the index from disk but the server's disks were completely saturated.
日志说没有读取或检查任何行,因此问题不在于查询,而在于您的服务器。EC2的致命弱点是IO/s,也许MySQL必须从磁盘加载索引,但是服务器的磁盘完全饱和了。
If your index is small enough to fit in memory (make sure your my.cnf allocates enough memory to key_buffer
(MyISAM) or innodb_buffer_pool_size
(InnoDB)), you should be able to preload it using
如果您的索引足够小,以适合内存(请确保您的my.cnf分配足够的内存到key_buffer (MyISAM)或innodb_buffer_pool_size (InnoDB)),您应该能够预加载它。
SELECT guid FROM articles
Check out the EXPLAIN to make sure it says "Using Index." If it doesn't, this one should:
检查解释以确保它说“使用索引”。如果没有,这个应该:
SELECT guid FROM articles FORCE INDEX (guid) WHERE LENGTH(guid) > 0
Alternatively, if guid
isn't your PRIMARY KEY or UNIQUE, you may remove its index and create another indexed column used to retrieve records quickly at a fraction of the index size. The column guid_crc32
would be an INT UNSIGNED and would hold the CRC32 of guid
另外,如果guid不是您的主键或唯一键,您可以删除它的索引并创建另一个索引列,用于在索引大小的一小部分快速检索记录。列guid_crc32将是一个无符号的INT型,并保存guid的CRC32
ALTER TABLE articles ADD COLUMN guid_crc32 INT UNSIGNED, ADD INDEX guid_crc32 (guid_crc32);
UPDATE articles SET guid_crc32 = CRC32(guid);
Your SELECT query would then look like this:
您的SELECT查询将如下所示:
SELECT id FROM articles WHERE guid = 'http://example.com/feed.rss' AND guid_crc32 = CRC32('http://example.com/feed.rss') LIMIT 1;
The optimizer should use the index on guid_crc32
, which should be both faster and smaller than searching through guid
.
优化器应该使用guid_crc32上的索引,它应该比通过guid搜索更快、更小。
#3
0
if this table gets updated alot then mysql may not update the index-counts properly. try "CHECK TABLE articles" to update the index counts and see if your table is fine.
如果这个表更新了很多,那么mysql可能不会正确地更新索引计数。尝试“检查表项”来更新索引计数,看看您的表是否正常。
also try to see if doing EXPLAIN on your query give the same results on your dev and prod machines. if the results are different try OPTIMIZE TABLE.
还要尝试查看在查询中执行EXPLAIN是否会在开发和prod机器上产生相同的结果。如果结果不同,请尝试优化表。
Are these myisam or innodb tables?
这些是myisam还是innodb表?
#4
0
Assuming GUID is indexed and ID is your primary key, something is "wrong." In that scenario, it is an index only query. The index is being bumped from memory and the disks are busy, perhaps.
假设GUID已被索引,而ID是您的主键,那么有些东西是“错误的”。在该场景中,它是一个仅用于索引的查询。索引正在从内存中被删除,磁盘可能很忙。
Depending on your update / insert / delete pattern, you database may be crying for an "optimize" command.
根据您的更新/插入/删除模式,您的数据库可能需要一个“优化”命令。
SQL Commands I'd like to see the output of:
我想看到的SQL命令输出是:
show table status like 'articles';
explain SELECT id FROM `articles` WHERE (guid = 'http://example.com/feed.rss') LIMIT 1;
explain articles;
System commands I'd like to see the output of (assuming Linux):
系统命令我想看到输出(假设Linux):
iostat 5 5
Tell us how much memory you have because 1.7mb is wrong, or something really exciting is happening.
告诉我们你有多少内存,因为1.7mb是错误的,或者一些真正令人兴奋的事情正在发生。
Edit how much memory is available to your SQL server in my.cnf?
编辑my.cnf中的SQL服务器有多少内存可用?