如何在MySQL 4.1上加速这个SQL查询?

时间:2021-10-12 03:56:40

I have a SQL query that takes a very long time to run on MySQL (it takes several minutes). The query is run against a table that has over 100 million rows, so I'm not surprised it's slow. In theory, though, it should be possible to speed it up as I really only want to get back the rows from the large table (let's call it A) that have a reference in another table, B.

我有一个SQL查询需要很长时间才能在MySQL上运行(需要几分钟)。查询是针对一个有超过1亿行的表运行的,所以我并不感到惊讶它的速度很慢。但理论上,它应该可以加速它,因为我真的只想从大表中取回行(让我们称之为A)在另一个表B中有一个引用。

So my query is:

所以我的查询是:

SELECT id FROM A, B where A.ref = B.ref;

(A has over 100 million rows; B has just a few thousand).

(A有超过1亿行; B只有几千行)。

I've added INDEXes:

我添加了INDEXes:

alter table A add index(ref);
alter table B add index(ref);

But it's still very slow (several minutes -- I'd be happy with one minute).

但它仍然很慢(几分钟 - 我会很满意一分钟)。

Unfortunately, I'm stuck with MySQL 4.1.22, so I can't use views.

不幸的是,我坚持使用MySQL 4.1.22,所以我无法使用视图。

I'd rather not copy all of the relevant rows from A into a separate, smaller table, as the rows that I need will change from time to time. On the other hand, at the moment that's the only solution I can think of.

我不想将A中的所有相关行复制到一个单独的较小的表中,因为我需要的行会不时更改。另一方面,目前这是我能想到的唯一解决方案。

Any suggestions welcome!

欢迎任何建议!

EDIT: Here's the output of running EXPLAIN on my query:

编辑:这是在我的查询上运行EXPLAIN的输出:

+----+-------------+------------------------+------+------------------------------------------+-------------------------+---------+------------------------------------------------+-------+-------------+
| id | select_type | table                  | type | possible_keys                            | key                     | key_len | ref                                            | rows  | Extra       |
+----+-------------+------------------------+------+------------------------------------------+-------------------------+---------+------------------------------------------------+-------+-------------+
|  1 | SIMPLE      | B                      | ALL  | B_ref,ref                                | NULL                    |    NULL | NULL                                           | 16718 | Using where |
|  1 | SIMPLE      | A                      | ref  | A_REF,ref                                | A_ref                   |       4 | DATABASE.B.ref                                 |  5655 |             |
+----+-------------+------------------------+------+------------------------------------------+-------------------------+---------+------------------------------------------------+-------+-------------+

(In redacting my original query example, I chose to use "ref" as my column name, which happens to be the same as one of the types, but hopefully that's not too confusing...)

(在编写我的原始查询示例时,我选择使用“ref”作为我的列名,这恰好与其中一种类型相同,但希望这不会太混乱......)

3 个解决方案

#1


The query optimizer is probably already doing the best that it can, but in the unlikely event that it's reading the giant table (A) first, you can explicitly tell it to read B first using the STRAIGHT_JOIN syntax:

查询优化器可能已经尽其所能,但是在不太可能的情况下它首先读取巨型表(A),您可以使用STRAIGHT_JOIN语法明确告诉它首先读取B:

SELECT STRAIGHT_JOIN id FROM B, A where B.ref = A.ref;

#2


From the answers, it seems like you're doing the most efficient thing you can with the SQL. The A table seems to be the big problem, how about splitting it into three individual tables, kind of like a local version of sharding? Alternatively, is it worth denormalising the B table into the A table, assuming B doesn't have too many columns?

从答案来看,您似乎正在使用SQL做最有效的事情。 A表似乎是一个大问题,如何将它分成三个单独的表,有点像本地版本的分片?或者,是否值得将B表非规范化为A表,假设B没有太多列?

Finally, you could just have to buy a faster box to run it on - there's no substitute for horsepower!

最后,你可能只需购买一个更快的盒子来运行它 - 马力无法替代!

Good luck.

#3


SELECT id FROM A JOIN B ON A.ref = B.ref

SELECT id FROM A JOIN B ON A.ref = B.ref

You may be able to optimize further by using an appropriate type of join e.g. LEFT JOIN

您可以通过使用适当类型的连接进一步优化,例如LEFT JOIN

http://en.wikipedia.org/wiki/Join_(SQL)

#1


The query optimizer is probably already doing the best that it can, but in the unlikely event that it's reading the giant table (A) first, you can explicitly tell it to read B first using the STRAIGHT_JOIN syntax:

查询优化器可能已经尽其所能,但是在不太可能的情况下它首先读取巨型表(A),您可以使用STRAIGHT_JOIN语法明确告诉它首先读取B:

SELECT STRAIGHT_JOIN id FROM B, A where B.ref = A.ref;

#2


From the answers, it seems like you're doing the most efficient thing you can with the SQL. The A table seems to be the big problem, how about splitting it into three individual tables, kind of like a local version of sharding? Alternatively, is it worth denormalising the B table into the A table, assuming B doesn't have too many columns?

从答案来看,您似乎正在使用SQL做最有效的事情。 A表似乎是一个大问题,如何将它分成三个单独的表,有点像本地版本的分片?或者,是否值得将B表非规范化为A表,假设B没有太多列?

Finally, you could just have to buy a faster box to run it on - there's no substitute for horsepower!

最后,你可能只需购买一个更快的盒子来运行它 - 马力无法替代!

Good luck.

#3


SELECT id FROM A JOIN B ON A.ref = B.ref

SELECT id FROM A JOIN B ON A.ref = B.ref

You may be able to optimize further by using an appropriate type of join e.g. LEFT JOIN

您可以通过使用适当类型的连接进一步优化,例如LEFT JOIN

http://en.wikipedia.org/wiki/Join_(SQL)