So I have a vps with 512mb ram, and a MySQL table like this:
所以我有一个512mb内存的vps,以及一个像这样的MySQL表:
CREATE TABLE `table1` (
`id` int(20) unsigned NOT NULL auto_increment,
`ts` timestamp NOT NULL default CURRENT_TIMESTAMP,
`value1` char(31) collate utf8_unicode_ci default NULL,
`value2` varchar(100) collate utf8_unicode_ci default NULL,
`value3` varchar(100) collate utf8_unicode_ci default NULL,
`value4` mediumtext collate utf8_unicode_ci,
`type` varchar(30) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `type` (`type`),
KEY `date` (`ts`)
) ENGINE=MyISAM AUTO_INCREMENT=469692 DEFAULT CHARSET=utf8
COLLATE=utf8_unicode_ci
If I execute a query like this, it takes 2~18 seconds to complete:
如果我执行这样的查询,则需要2~18秒才能完成:
SELECT `id`, `ts`, `value1`, `value2`, `value3` FROM table1 WHERE
`type` = 'something' ORDER BY `id` DESC limit 0,10;
EXPLAIN SELECT tells me:
EXPLAIN SELECT告诉我:
select_type: SIMPLE
type: ref
possible_keys: type
key: type
key_len: 92
ref: const
rows: 7291
Extra: Using where; Using filesort
I thought the 'using filesort' might be the problem, but it turns out that's not the case. If I remove the ORDER BY and the LIMIT, the query speed is the same (I turn off query cache for the testing with SET @@query_cache_type=0;
).
我认为'使用filesort'可能是问题,但结果并非如此。如果我删除ORDER BY和LIMIT,查询速度是相同的(我使用SET @@ query_cache_type = 0关闭测试的查询缓存)。
mysql> EXPLAIN SELECT `id`,`ts`,`value1`,`value2`, `value3`
FROM table1 WHERE `type` = 'something'\G
select_type: SIMPLE
type: ref
possible_keys: type
key: type
key_len: 92
ref: const
rows: 7291
Extra: Using where
Don't know if it matters but the rows approximation is inaccurate:
不知道它是否重要但行近似是不准确的:
SELECT COUNT(*) FROM table1 WHERE `type` = 'something';
Returns 22.8k rows.
返回22.8k行。
The query seems already optimized, I don't know how I could further improve it. The whole table contains 370k rows, and is about 4.6 GiB in size. Could it be possible that because the type is randomly changing row by row (randomly distributed in the whole table), it takes 2~18 seconds just to fetch the data from disk?
查询似乎已经优化,我不知道如何进一步改进它。整个表包含370k行,大小约为4.6 GiB。是否有可能因为类型是逐行随机变化(随机分布在整个表中),从磁盘获取数据需要2~18秒?
The funny thing is when I use a type that only has a few hundred rows, those queries are slow too. MySQL returns rows at about 100 rows/sec!
有趣的是,当我使用只有几百行的类型时,这些查询也很慢。 MySQL以大约100行/秒的速度返回行!
|-------+------+-----------|
| count | time | row/sec |
|-------+------+-----------|
| 22802 | 18.7 | 1219.3583 |
| 11 | 0.1 | 110. |
| 491 | 4.8 | 102.29167 |
| 705 | 5.6 | 125.89286 |
| 317 | 2.6 | 121.92308 |
|-------+------+-----------|
Why is it so slow? Can I further optimize the query? Should I move the data to smaller tables?
为什么这么慢?我可以进一步优化查询吗?我应该将数据移动到较小的表吗?
I thought automatic partitioning would be a good idea, to make a new partition for every type dynamically. That is not possible, for many reasons including that the maximum partition number is 1024, and there can be any types. I could also try application level partitioning, creating a new table for every new type. I wouldn't want to do that as it introduces great complexity. I don't know how I could have a unique id for all rows in all tables. Also, if I reach multiple inserts/second, performance would drop significantly.
我认为自动分区是一个好主意,可以动态地为每个类型创建一个新分区。这是不可能的,原因很多,包括最大分区数是1024,并且可以有任何类型。我还可以尝试应用程序级别分区,为每种新类型创建一个新表。我不想这样做,因为它引入了极大的复杂性。我不知道如何为所有表中的所有行设置唯一ID。此外,如果我达到多次插入/秒,性能将显着下降。
Thanks in advance.
提前致谢。
4 个解决方案
#1
4
You need a multi-column index for that query:
您需要该查询的多列索引:
KEY `typeid` (`type`, `id`)
Unfortunately, as you stated, it is also slow without the ORDER so it's slow because the records are scattered around on the disk and it has to do a lot of seeks. Once cached, it should be quite fast (Note: 22.8/370 * 4.6G = 283M, so if you do other activities/queries those record won't be in the memory for long time or might not even fit.).
不幸的是,正如你所说,没有ORDER它也很慢所以它很慢因为记录分散在磁盘上并且它必须进行大量的搜索。一旦缓存,它应该非常快(注意:22.8 / 370 * 4.6G = 283M,所以如果你做其他活动/查询,那些记录将不会长时间存在于内存中,甚至可能不适合。)。
Do an iostat 1
to verify the I/O bottleneck. Loads of RAM could solve your problem. An SSD could also solve your problem. But RAM is cheaper ;)
执行iostat 1以验证I / O瓶颈。大量的RAM可以解决您的问题。 SSD也可以解决您的问题。但RAM更便宜;)
#2
0
If you are desperate about optimizing you can try to re-arrange your table. First off you select and order every row from a type and rewrite it to a new table and add the other types to that table one-by-one. I suggest a kind of table defragmentation but I don't have any experience with this.
如果您对优化感到绝望,可以尝试重新安排您的餐桌。首先,您从类型中选择并排序每一行,然后将其重写为新表,并将其他类型逐个添加到该表中。我建议进行一种表碎片整理,但我对此没有任何经验。
#3
0
There are many ways to improve a query. In your case, I see that your index must be kind of huge because of the indexed Unicode VARCHAR(30) column responsible for key_len: 92
. Here's what you can try: replace the big VARCHAR index with something much smaller. Keep the type
column but remove the index and create a new indexed column typeidx
which you can create as a INT UNSIGNED (or SMALLINT if possible).
有许多方法可以改进查询。在你的情况下,我看到你的索引必须是巨大的,因为负责key_len的索引的Unicode VARCHAR(30)列:92。这是你可以尝试的:用更小的东西替换大的VARCHAR索引。保留type列但删除索引并创建一个新的索引列typeidx,您可以将其创建为INT UNSIGNED(如果可能,还可以创建SMALLINT)。
Create a table similar to this:
创建一个类似于这样的表:
CREATE TABLE `typetable` (
`typeidx` INT UNSIGNED NOT NULL auto_increment,
`type` varchar(30) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (`typeidx`),
UNIQUE KEY `type` (`type`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Which you fill with the existing types
您填写现有类型
INSERT INTO typetable (type) SELECT DISTINCT type FROM table1;
Then you have to update table1.typeidx
with something like
然后你必须用类似的东西更新table1.typeidx
UPDATE table1 t1 JOIN typetable tt USING (type)
SET t1.typeidx = tt.typeidx
Now your old query can become something like that
现在,您的旧查询可以变成类似的东西
SELECT `id`,`ts`,`value1`,`value2`, `value3`
FROM table1 WHERE `typeidx` = (SELECT typeidx FROM typetable WHERE type = 'something')
Of course you'll also have to maintain typetable
and insert new values from type
as they are created.
当然,您还必须维护typetable并在创建时从类型中插入新值。
#4
0
I have no better idea than to implement vertical partitioning. I made an identical table without the mediumtext column, copied the whole table without this column, and the 18 sec query takes only 100ms now! The new table is only 55mb.
我没有比实现垂直分区更好的主意。我创建了一个没有mediumtext列的相同表,没有这个列就复制了整个表,18秒查询现在只需要100ms!新表只有55mb。
#1
4
You need a multi-column index for that query:
您需要该查询的多列索引:
KEY `typeid` (`type`, `id`)
Unfortunately, as you stated, it is also slow without the ORDER so it's slow because the records are scattered around on the disk and it has to do a lot of seeks. Once cached, it should be quite fast (Note: 22.8/370 * 4.6G = 283M, so if you do other activities/queries those record won't be in the memory for long time or might not even fit.).
不幸的是,正如你所说,没有ORDER它也很慢所以它很慢因为记录分散在磁盘上并且它必须进行大量的搜索。一旦缓存,它应该非常快(注意:22.8 / 370 * 4.6G = 283M,所以如果你做其他活动/查询,那些记录将不会长时间存在于内存中,甚至可能不适合。)。
Do an iostat 1
to verify the I/O bottleneck. Loads of RAM could solve your problem. An SSD could also solve your problem. But RAM is cheaper ;)
执行iostat 1以验证I / O瓶颈。大量的RAM可以解决您的问题。 SSD也可以解决您的问题。但RAM更便宜;)
#2
0
If you are desperate about optimizing you can try to re-arrange your table. First off you select and order every row from a type and rewrite it to a new table and add the other types to that table one-by-one. I suggest a kind of table defragmentation but I don't have any experience with this.
如果您对优化感到绝望,可以尝试重新安排您的餐桌。首先,您从类型中选择并排序每一行,然后将其重写为新表,并将其他类型逐个添加到该表中。我建议进行一种表碎片整理,但我对此没有任何经验。
#3
0
There are many ways to improve a query. In your case, I see that your index must be kind of huge because of the indexed Unicode VARCHAR(30) column responsible for key_len: 92
. Here's what you can try: replace the big VARCHAR index with something much smaller. Keep the type
column but remove the index and create a new indexed column typeidx
which you can create as a INT UNSIGNED (or SMALLINT if possible).
有许多方法可以改进查询。在你的情况下,我看到你的索引必须是巨大的,因为负责key_len的索引的Unicode VARCHAR(30)列:92。这是你可以尝试的:用更小的东西替换大的VARCHAR索引。保留type列但删除索引并创建一个新的索引列typeidx,您可以将其创建为INT UNSIGNED(如果可能,还可以创建SMALLINT)。
Create a table similar to this:
创建一个类似于这样的表:
CREATE TABLE `typetable` (
`typeidx` INT UNSIGNED NOT NULL auto_increment,
`type` varchar(30) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (`typeidx`),
UNIQUE KEY `type` (`type`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Which you fill with the existing types
您填写现有类型
INSERT INTO typetable (type) SELECT DISTINCT type FROM table1;
Then you have to update table1.typeidx
with something like
然后你必须用类似的东西更新table1.typeidx
UPDATE table1 t1 JOIN typetable tt USING (type)
SET t1.typeidx = tt.typeidx
Now your old query can become something like that
现在,您的旧查询可以变成类似的东西
SELECT `id`,`ts`,`value1`,`value2`, `value3`
FROM table1 WHERE `typeidx` = (SELECT typeidx FROM typetable WHERE type = 'something')
Of course you'll also have to maintain typetable
and insert new values from type
as they are created.
当然,您还必须维护typetable并在创建时从类型中插入新值。
#4
0
I have no better idea than to implement vertical partitioning. I made an identical table without the mediumtext column, copied the whole table without this column, and the 18 sec query takes only 100ms now! The new table is only 55mb.
我没有比实现垂直分区更好的主意。我创建了一个没有mediumtext列的相同表,没有这个列就复制了整个表,18秒查询现在只需要100ms!新表只有55mb。