大查询和子查询的问题

时间:2022-11-24 15:36:13

I thought that I'll be clever and use subquery to get my report in one go. But after running into problems and reading documentation I saw that my approach does not work in MySQL. My inner query returns ~100 records and outer query scans 20000 records. When I restricted outer query to 20 records then it run 20 sec - really slow.

我以为我会聪明并使用子查询一次性获取我的报告。但在遇到问题并阅读文档后,我发现我的方法在MySQL中不起作用。我的内部查询返回~100条记录,外部查询扫描20000条记录。当我将外部查询限制为20条记录时,它运行20秒 - 非常慢。

I wonder is it possible to restructure it somehow so that inner query wouldn't be run EVERY time for every record in the outer query?

我想知道是否有可能以某种方式重构它,以便外部查询中的每条记录都不会运行内部查询?

select p1.surname ,p1.name,p1.id,r1.start_date,r1.end_date,c1.short_name
FROM ejl_players p1
left JOIN ejl_registration r1 ON ( r1.player_id = p1.id )
left JOIN ejl_teams t1 ON ( r1.team_id = t1.id )
left JOIN ejl_clubs c1 ON ( t1.club_id = c1.id )
where  r1.season=2008
and p1.id in
 (
SELECT p.id
FROM ejl_players p 
left JOIN ejl_registration r ON (r.player_id = p.id) 
left JOIN ejl_teams t ON (r.team_id = t.id) 
left JOIN ejl_clubs c ON (t.club_id = c.id)
WHERE r.season = 2008
GROUP BY p.id
HAVING COUNT(DISTINCT c.id)  > 1
)

Explain (I restricted outer query to maximum 20 records:

解释(我将外部查询限制为最多20条记录:

id  select_type  table  type  possible_keys  key  key_len  ref  rows  Extra  
1 PRIMARY p1 range PRIMARY PRIMARY 4 NULL 19 Using where 
1 PRIMARY r1 ref team_id,season season 10 const,d17528sd14898.p1.id 1 Using where 
1 PRIMARY t1 eq_ref PRIMARY PRIMARY 4 d17528sd14898.r1.team_id 1   
1 PRIMARY c1 eq_ref PRIMARY PRIMARY 4 d17528sd14898.t1.club_id 1   
2 DEPENDENT SUBQUERY p index PRIMARY PRIMARY 5 NULL 23395 Using index 
2 DEPENDENT SUBQUERY r ref team_id,season season 10 const,d17528sd14898.p.id 1 Using where; Using index 
2 DEPENDENT SUBQUERY t eq_ref PRIMARY PRIMARY 4 d17528sd14898.r.team_id 1   
2 DEPENDENT SUBQUERY c eq_ref PRIMARY PRIMARY 4 d17528sd14898.t.club_id 1 Using index 

2 个解决方案

#1


Try using an INNER JOIN (something like this):

尝试使用INNER JOIN(类似这样):

SELECT p1.surname ,p1.name,p1.id,r1.start_date,r1.end_date,c1.short_name
FROM ejl_players p1
INNER JOIN (
    SELECT p.id
    FROM ejl_players p 
    LEFT JOIN ejl_registration r ON (r.player_id = p.id) 
    LEFT JOIN ejl_teams t ON (r.team_id = t.id) 
    LEFT JOIN ejl_clubs c ON (t.club_id = c.id)
    WHERE r.season = 2008
    GROUP BY p.id
    HAVING COUNT(DISTINCT c.id)  > 1
) p2 ON p1.id = p2.id
LEFT JOIN ejl_registration r1 ON ( r1.player_id = p1.id )
LEFT JOIN ejl_teams t1 ON ( r1.team_id = t1.id )
LEFT JOIN ejl_clubs c1 ON ( t1.club_id = c1.id )
WHERE  r1.season=2008

Using the subquery in this manner should be more efficient but isn't always. However, it does bypass the issue of having the subquery executed for every record returned in the main query. Instead the subquery is constructed as a virtual table in memory and then used for comparison with the main query.

以这种方式使用子查询应该更有效但并非总是如此。但是,它确实绕过了为主查询中返回的每个记录执行子查询的问题。而是将子查询构造为内存中的虚拟表,然后用于与主查询进行比较。

Edit: I should point out that you'll want to use EXPLAIN in MySQL to verify that this query is indeed performing more efficiently.

编辑:我应该指出,您将要在MySQL中使用EXPLAIN来验证此查询确实更有效地执行。

#2


Like I commented on your question the other day, you don't need to use a LEFT JOIN in this example. Outer joins often perform slower than inner joins, so you can get some better performance by using a simple inner join.

就像我前几天评论你的问题一样,在这个例子中你不需要使用LEFT JOIN。外连接通常比内连接执行速度慢,因此通过使用简单的内连接可以获得更好的性能。

You would need to use an outer join only if you need to show all players, even those who don't have any registration.

只有当您需要显示所有玩家时,您才需要使用外部联接,即使是那些没有任何注册的玩家。

It seems that your query is looking for players who have been on teams in more than one club this year (like your earlier question), and then outputting some details of their registration and club name. Here's how I would solve this query:

看来你的查询正在寻找今年在多个俱乐部参加过比赛的球员(比如你之前的问题),然后输出他们的注册和俱乐部名称的一些细节。以下是我将如何解决此查询:

SELECT p.surname, p.name, p.id, r.start_date, r.end_date, c1.short_name
FROM ejl_players p
 INNER JOIN ejl_registration r1 ON (r.player_id = p.id)
 INNER JOIN ejl_teams t1 ON (r.team_id = t1.id)
 INNER JOIN ejl_clubs c1 ON (t1.club_id = c1.id)
 INNER JOIN ejl_teams t2 ON (r.team_id = t2.id)
 INNER JOIN ejl_clubs c2 ON (t2.club_id = c2.id)
WHERE r.season = 2008
GROUP BY r.player_id, r.team_id
HAVING COUNT(DISTINCT c2.id) > 1;

This works in MySQL because MySQL is permissive about the Single-Value Rule. That is, the columns in your GROUP BY clause don't have to be the same as the non-aggregated columns named in your select-list. In other brands of RDBMS, this query would generate an error.

这适用于MySQL,因为MySQL允许单值规则。也就是说,GROUP BY子句中的列不必与select-list中指定的非聚合列相同。在其他品牌的RDBMS中,此查询会生成错误。

#1


Try using an INNER JOIN (something like this):

尝试使用INNER JOIN(类似这样):

SELECT p1.surname ,p1.name,p1.id,r1.start_date,r1.end_date,c1.short_name
FROM ejl_players p1
INNER JOIN (
    SELECT p.id
    FROM ejl_players p 
    LEFT JOIN ejl_registration r ON (r.player_id = p.id) 
    LEFT JOIN ejl_teams t ON (r.team_id = t.id) 
    LEFT JOIN ejl_clubs c ON (t.club_id = c.id)
    WHERE r.season = 2008
    GROUP BY p.id
    HAVING COUNT(DISTINCT c.id)  > 1
) p2 ON p1.id = p2.id
LEFT JOIN ejl_registration r1 ON ( r1.player_id = p1.id )
LEFT JOIN ejl_teams t1 ON ( r1.team_id = t1.id )
LEFT JOIN ejl_clubs c1 ON ( t1.club_id = c1.id )
WHERE  r1.season=2008

Using the subquery in this manner should be more efficient but isn't always. However, it does bypass the issue of having the subquery executed for every record returned in the main query. Instead the subquery is constructed as a virtual table in memory and then used for comparison with the main query.

以这种方式使用子查询应该更有效但并非总是如此。但是,它确实绕过了为主查询中返回的每个记录执行子查询的问题。而是将子查询构造为内存中的虚拟表,然后用于与主查询进行比较。

Edit: I should point out that you'll want to use EXPLAIN in MySQL to verify that this query is indeed performing more efficiently.

编辑:我应该指出,您将要在MySQL中使用EXPLAIN来验证此查询确实更有效地执行。

#2


Like I commented on your question the other day, you don't need to use a LEFT JOIN in this example. Outer joins often perform slower than inner joins, so you can get some better performance by using a simple inner join.

就像我前几天评论你的问题一样,在这个例子中你不需要使用LEFT JOIN。外连接通常比内连接执行速度慢,因此通过使用简单的内连接可以获得更好的性能。

You would need to use an outer join only if you need to show all players, even those who don't have any registration.

只有当您需要显示所有玩家时,您才需要使用外部联接,即使是那些没有任何注册的玩家。

It seems that your query is looking for players who have been on teams in more than one club this year (like your earlier question), and then outputting some details of their registration and club name. Here's how I would solve this query:

看来你的查询正在寻找今年在多个俱乐部参加过比赛的球员(比如你之前的问题),然后输出他们的注册和俱乐部名称的一些细节。以下是我将如何解决此查询:

SELECT p.surname, p.name, p.id, r.start_date, r.end_date, c1.short_name
FROM ejl_players p
 INNER JOIN ejl_registration r1 ON (r.player_id = p.id)
 INNER JOIN ejl_teams t1 ON (r.team_id = t1.id)
 INNER JOIN ejl_clubs c1 ON (t1.club_id = c1.id)
 INNER JOIN ejl_teams t2 ON (r.team_id = t2.id)
 INNER JOIN ejl_clubs c2 ON (t2.club_id = c2.id)
WHERE r.season = 2008
GROUP BY r.player_id, r.team_id
HAVING COUNT(DISTINCT c2.id) > 1;

This works in MySQL because MySQL is permissive about the Single-Value Rule. That is, the columns in your GROUP BY clause don't have to be the same as the non-aggregated columns named in your select-list. In other brands of RDBMS, this query would generate an error.

这适用于MySQL,因为MySQL允许单值规则。也就是说,GROUP BY子句中的列不必与select-list中指定的非聚合列相同。在其他品牌的RDBMS中,此查询会生成错误。