Mysql在600万行表上的性能

时间:2021-07-03 00:49:49

One day I suspect I'll have to learn hadoop and transfer all this data to a non-structured database, but I'm surprised to find the performance degrade so significantly in such a short period of time.

有一天,我怀疑我将不得不学习hadoop并将所有这些数据传输到一个非结构化的数据库中,但我惊讶地发现,在这么短的时间内,性能下降得如此之快。

I have a mysql table with just under 6 million rows. I am doing a very simple query on this table, and believe I have all the correct indexes in place.

我有一个不到600万行的mysql表。我正在这个表上执行一个非常简单的查询,并且相信我已经拥有了所有正确的索引。

the query is

查询

SELECT date, time FROM events WHERE venid='47975' AND date>='2009-07-11' ORDER BY date

the explain returns

解释返回

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  updateshows     range   date_idx    date_idx    7   NULL    648997  Using where

so i am using the correct index as far as I can tell, but this query is taking 11 seconds to run.

就我所知,我正在使用正确的索引,但是这个查询需要11秒才能运行。

The database is MyISAM, and phpMyAdmin says the table is 1.0GiB.

数据库是ismyam, phpadmin说表是1.0GiB。

Any ideas here?

有什么想法吗?

Edited: The date_idx is indexes both the date and venid columns. Should those be two seperate indexes?

已编辑:date_idx是日期和venid列的索引。它们应该是两个分离的指数吗?

4 个解决方案

#1


38  

What you want to make sure is that the query will use ONLY the index, so make sure that the index covers all the fields you are selecting. Also, since it is a range query involved, You need to have the venid first in the index, since it is queried as a constant. I would therefore create and index like so:

您要确保的是查询将只使用索引,所以请确保索引覆盖了所选择的所有字段。此外,由于涉及范围查询,所以需要在索引中首先使用venid,因为它是作为常量查询的。因此,我将这样创建和索引:

ALTER TABLE events ADD INDEX indexNameHere (venid, date, time);

With this index, all the information that is needed to complete the query is in the index. This means that, hopefully, the storage engine is able to fetch the information without actually seeking inside the table itself. However, MyISAM might not be able to do this, since it doesn't store the data in the leaves of the indexes, so you might not get the speed increase you desire. If that's the case, try to create a copy of the table, and use the InnoDB engine on the copy. Repeat the same steps there and see if you get a significant speed increase. InnoDB does store the field values in the index leaves, and allow covering indexes.

使用这个索引,完成查询所需的所有信息都在索引中。这意味着,希望存储引擎能够获取信息,而不需要在表本身中查找。然而,MyISAM可能无法做到这一点,因为它不将数据存储在索引的叶子中,因此您可能无法获得所需的速度提升。如果是这样,尝试创建表的副本,并在副本上使用InnoDB引擎。重复同样的步骤,看看你是否有明显的速度提高。InnoDB确实在索引叶中存储字段值,并允许覆盖索引。

Now, hopefully you'll see the following when you explain the query:

现在,希望您在解释查询时能看到以下内容:

mysql> EXPLAIN SELECT date, time FROM events WHERE venid='47975' AND date>='2009-07-11' ORDER BY date;

id  select_type table  type  possible_keys        key       [..]  Extra
1   SIMPLE   events range date_idx, indexNameHere indexNameHere   Using index, Using where

#2


2  

Try adding a key that spans venid and date (or the other way around, or both...)

尝试添加一个跨venid和date的键(或者反过来,或者两者都跨…)

#3


2  

I would imagine that a 6M row table should be able to be optimised with quite normal techniques.

我认为,一个6M行的表应该能够使用非常普通的技术进行优化。

I assume that you have a dedicated database server, and it has a sensible amount of ram (say 8G minimum).

我假设您有一个专用的数据库服务器,并且它有相当数量的ram(比如最小8G)。

You will want to ensure you've tuned mysql to use your ram efficiently. If you're running a 32-bit OS, don't. If you are using MyISAM, tune your key buffer to use a signficiant proportion, but not too much, of your ram.

您将希望确保已经调优了mysql以有效地使用ram。如果您正在运行一个32位的操作系统,请不要这样做。如果您正在使用MyISAM,请调整您的键缓冲区,以使用一个符号比例,但不要太多,您的ram。

In any case you want to run repeated performance testing on production-grade hardware.

无论如何,您都希望在生产级硬件上运行重复性能测试。

#4


1  

Try putting an index on the venid column.

尝试在venid列上添加索引。

#1


38  

What you want to make sure is that the query will use ONLY the index, so make sure that the index covers all the fields you are selecting. Also, since it is a range query involved, You need to have the venid first in the index, since it is queried as a constant. I would therefore create and index like so:

您要确保的是查询将只使用索引,所以请确保索引覆盖了所选择的所有字段。此外,由于涉及范围查询,所以需要在索引中首先使用venid,因为它是作为常量查询的。因此,我将这样创建和索引:

ALTER TABLE events ADD INDEX indexNameHere (venid, date, time);

With this index, all the information that is needed to complete the query is in the index. This means that, hopefully, the storage engine is able to fetch the information without actually seeking inside the table itself. However, MyISAM might not be able to do this, since it doesn't store the data in the leaves of the indexes, so you might not get the speed increase you desire. If that's the case, try to create a copy of the table, and use the InnoDB engine on the copy. Repeat the same steps there and see if you get a significant speed increase. InnoDB does store the field values in the index leaves, and allow covering indexes.

使用这个索引,完成查询所需的所有信息都在索引中。这意味着,希望存储引擎能够获取信息,而不需要在表本身中查找。然而,MyISAM可能无法做到这一点,因为它不将数据存储在索引的叶子中,因此您可能无法获得所需的速度提升。如果是这样,尝试创建表的副本,并在副本上使用InnoDB引擎。重复同样的步骤,看看你是否有明显的速度提高。InnoDB确实在索引叶中存储字段值,并允许覆盖索引。

Now, hopefully you'll see the following when you explain the query:

现在,希望您在解释查询时能看到以下内容:

mysql> EXPLAIN SELECT date, time FROM events WHERE venid='47975' AND date>='2009-07-11' ORDER BY date;

id  select_type table  type  possible_keys        key       [..]  Extra
1   SIMPLE   events range date_idx, indexNameHere indexNameHere   Using index, Using where

#2


2  

Try adding a key that spans venid and date (or the other way around, or both...)

尝试添加一个跨venid和date的键(或者反过来,或者两者都跨…)

#3


2  

I would imagine that a 6M row table should be able to be optimised with quite normal techniques.

我认为,一个6M行的表应该能够使用非常普通的技术进行优化。

I assume that you have a dedicated database server, and it has a sensible amount of ram (say 8G minimum).

我假设您有一个专用的数据库服务器,并且它有相当数量的ram(比如最小8G)。

You will want to ensure you've tuned mysql to use your ram efficiently. If you're running a 32-bit OS, don't. If you are using MyISAM, tune your key buffer to use a signficiant proportion, but not too much, of your ram.

您将希望确保已经调优了mysql以有效地使用ram。如果您正在运行一个32位的操作系统,请不要这样做。如果您正在使用MyISAM,请调整您的键缓冲区,以使用一个符号比例,但不要太多,您的ram。

In any case you want to run repeated performance testing on production-grade hardware.

无论如何,您都希望在生产级硬件上运行重复性能测试。

#4


1  

Try putting an index on the venid column.

尝试在venid列上添加索引。