为什么self连接比or快?

时间:2022-06-08 01:12:14

I'm trying to filter a relationship table down to get a subset of the table where two conditions are met (ie: I want all of the id's of the entries who's color_ids are 1 or 2). It's a beefy table, so I'm trying to optimize as much as possible.

我试图过滤一个关系表,以得到满足两个条件的表的子集(例如:我希望所有的color_ids的条目的id都是1或2)。

I was wondering if anyone could explain my finding in this case:

我想知道是否有人能解释一下我的发现:

Why is

为什么

SELECT DISTINCT a.id 
  FROM RelationshipTable as a 
  JOIN RelationshipTable as b ON b.id = a.id 
 WHERE a.color_id = 1 
   AND b.color_id = 2;

faster than

速度比

SELECT DISTINCT id 
  FROM RelationshipTable 
 WHERE color_id = 1 
    OR color_id = 2;

in MySql 4.1?

在MySql 4.1吗?

2 个解决方案

#1


2  

The two are not the same query and should not be giving the same result set. In the first query you want all the records which meet both conditions, you have a record with a color_id = of 1 and a record with a color_id of 2 for the same ID. In the second query you will get all records that have both color ids and all records that have only one or the other. Of course since you are asking for a differnt field to be returned you might not see this. And the second query is somewhat silly anyway as it can be expressed as:

这两个不相同的查询和不应该给予相同的结果集。在第一个查询你想要的所有满足条件的记录,你有一个记录的color_id = 1和创纪录的color_id 2在第二个查询ID相同。你会得到所有的记录都颜色ID和所有记录,只有一个。当然,既然你要求的是一个不同的领域,你可能不会看到这个。第二个问题有点傻,因为它可以表示为:

select 1 as color id 
union all
select 2

And never hit a table at all. That would make it super fast.

而且从来没有打过桌子。这将使它非常快。

#2


2  

The first query is impossible and will never return a result set. It's basically saying "Give me all the records in the table where color_id is 1 AND color_id is 2" which can never happen.

第一个查询是不可能的,并且永远不会返回结果集,它的基本意思是“将color_id为1和color_id为2的表中的所有记录都给我”,这是不可能发生的。

If you want to ask the difference between

如果你想知道两者之间的区别

SELECT DISTINCT a.id 
  FROM RelationshipTable as a 
  JOIN RelationshipTable as b ON b.id = a.id 
 WHERE a.color_id = 1 
   OR b.color_id = 2;

versus

SELECT DISTINCT color_id 
  FROM RelationshipTable 
 WHERE color_id = 1 
    OR color_id = 2;

In this case the first will always be slower than the second for large tables. The first results in a full table scan for table A while the second one uses the indexes that should be used in the where clause.

在这种情况下,对于大型表,第一个总是比第二个慢。第一个结果是对表a进行全表扫描,第二个结果使用where子句中应该使用的索引。

#1


2  

The two are not the same query and should not be giving the same result set. In the first query you want all the records which meet both conditions, you have a record with a color_id = of 1 and a record with a color_id of 2 for the same ID. In the second query you will get all records that have both color ids and all records that have only one or the other. Of course since you are asking for a differnt field to be returned you might not see this. And the second query is somewhat silly anyway as it can be expressed as:

这两个不相同的查询和不应该给予相同的结果集。在第一个查询你想要的所有满足条件的记录,你有一个记录的color_id = 1和创纪录的color_id 2在第二个查询ID相同。你会得到所有的记录都颜色ID和所有记录,只有一个。当然,既然你要求的是一个不同的领域,你可能不会看到这个。第二个问题有点傻,因为它可以表示为:

select 1 as color id 
union all
select 2

And never hit a table at all. That would make it super fast.

而且从来没有打过桌子。这将使它非常快。

#2


2  

The first query is impossible and will never return a result set. It's basically saying "Give me all the records in the table where color_id is 1 AND color_id is 2" which can never happen.

第一个查询是不可能的,并且永远不会返回结果集,它的基本意思是“将color_id为1和color_id为2的表中的所有记录都给我”,这是不可能发生的。

If you want to ask the difference between

如果你想知道两者之间的区别

SELECT DISTINCT a.id 
  FROM RelationshipTable as a 
  JOIN RelationshipTable as b ON b.id = a.id 
 WHERE a.color_id = 1 
   OR b.color_id = 2;

versus

SELECT DISTINCT color_id 
  FROM RelationshipTable 
 WHERE color_id = 1 
    OR color_id = 2;

In this case the first will always be slower than the second for large tables. The first results in a full table scan for table A while the second one uses the indexes that should be used in the where clause.

在这种情况下,对于大型表,第一个总是比第二个慢。第一个结果是对表a进行全表扫描,第二个结果使用where子句中应该使用的索引。