MySQL多对多表加入缓慢的性能

I have two tables with a joining column having a Many to Many relationship. There are a few hundred thousand records in each table. I'm seeing some very slow query performance and am having trouble singling out the issue.

我有两个表,其中一个连接列具有多对多关系。每张表中有几十万条记录。我看到一些非常慢的查询性能,并且在单独解决问题时遇到了麻烦。

Table_A:

+---------------------------+-------------+---------------+
| ID | Name varchar (30)    | Age int(3)  | Status int(1) |
+----+----------------------+-------------+---------------+
| 1  | Tom                  | 23          | 1             |
| 2  | Jerry                | 34          | 2             |
| 3  | Smith                | 21          | 1             |
| 4  | Ben                  | 46          | 5             |
+---------------------------+-------------+---------------+

Table_B:

+---------------------------+-------------+---------------+
| ID | Name varchar (30)    | Sign int(3) | Status int(1) |
+----+----------------------+-------------+---------------+
| 1  | Tom                  | 12          | 1             |
| 2  | Smith                | 8           | 1             |
| 3  | Tom                  | 3           | 0             |
| 4  | Tom                  | 10          | 1             |
+---------------------------+-------------+---------------+

I need to get the Age of each Name in Table A who has at least one row in Table B with a match on Name and a Status (Table B) of 1.

我需要获得表A中每个名称的年龄,表B中至少有一行,名称和状态(表B)为1。

I tried:

SELECT Age FROM Table_A
LEFT JOIN Table_B ON Table_A.Name=Table_B.Name
WHERE Table_B.Status=1;

That query takes so long I haven't waited for it to return. I then tried:

该查询需要很长时间,我还没有等待它返回。然后我尝试了:

SELECT DISTINCT Age FROM Table_A
LEFT JOIN Table_B ON Table_A.Name=Table_B.Name AND Table_B.Status=1;

That returned very fast. I tested further and tried:

那回归得非常快。我进一步测试并尝试:

SELECT DISTINCT Age FROM Table_A
LEFT JOIN Table_B ON Table_A.Name=Table_B.Name
WHERE Table_B.Status=1;

That again didn't return.

那再次没有回来。

I'm confused as to what's going on here.

我很困惑这里发生了什么。

In the last query shouldn't the WHERE condition act the same as the previous query's JOIN ON condition (Status=1)?

在最后一个查询中,WHERE条件的行为是否与上一个查询的JOIN ON条件(Status = 1)相同?

Why does SELECT DISTINCT return results whereas without using DISTINCT the process takes forever?

为什么SELECT DISTINCT返回结果而不使用DISTINCT,这个过程需要永远?

4 个解决方案

#1

For a many-to-many table, do not include an AUTO_INCREMENT. Do have the PRIMARY KEY include both other ids. Do have another index. Do use InnoDB.

对于多对多表,请不要包含AUTO_INCREMENT。确保PRIMARY KEY包含其他ID。有另一个索引。使用InnoDB。

See More details, plus rationale.

查看更多详情,加上理由。

#2

Without seeing an explain plan (or whatever the MySQL equivalent is) it's impossible to say for certain.

没有看到解释计划(或任何MySQL等价物),就不可能肯定地说。

My guess would be that the server knows that your OUTER JOIN' to table B is completely irrelevant when you useSELECT DISTINCT, so it just runs against table A and gets the Age values from there without even performing theJOIN. Do you see why theOUTER JOIN` is irrelevant?

我的猜测是,当你使用SELECT DISTINCT时,服务器知道你对表B的OUTER JOIN'完全不相关,所以它只是针对表A运行并从那里获取Age值,甚至没有执行JOIN。你知道为什么theOUTER JOIN`无关紧要吗?

In the first query the server needs to perform the JOIN to get the right number of rows back.

在第一个查询中,服务器需要执行JOIN以获得正确的行数。

When you add the additional logic to your WHERE clause in the last query you've effectively turned it into an INNER JOIN, so now the JOIN has to happen again and it takes a long time.

当您在上一个查询中向WHERE子句添加其他逻辑时,您已经将其有效地转换为INNER JOIN,因此现在JOIN必须再次发生并且需要很长时间。

#3

Make sure you have indexes set on the Table_A.Name, Table_B.Name and Table_B.Status columns

确保在Table_A.Name,Table_B.Name和Table_B.Status列上设置了索引

#4

First, you don't need a LEFT JOIN, because you only care about matches:

首先,你不需要LEFT JOIN,因为你只关心匹配:

SELECT a.Age
FROM Table_A a JOIN 
     Table_B b
     ON Table_A.Name = b.Name
WHERE b.Status = 1;

For this query can take advantage of indexes on Table_B(status, name) and Table_A(Name, Age).

对于此查询,可以利用Table_B(状态,名称)和Table_A(名称,年龄)上的索引。

#1