子查询和关联查询谁的效率更高

那个sql效率更高？

SELECT product_type_id, name
FROM product_types outer
WHERE NOT EXISTS
  (SELECT 1
   FROM products inner
   WHERE inner.product_type_id = outer.product_type_id);

SELECT product_type_id, name
FROM product_types
WHERE product_type_id NOT IN
  (SELECT product_type_id
   FROM products);

SELECT outer。product_type_id, outer.name
FROM product_types outer ,products inner
WHERE outer.product_type_id  = inner.product_type_id

--------------------------------------------------------------------------------

很明显，3个SQL完成的是同样的功能,请问哪个效率更高

书上的说法是
Tip  EXISTS typically offers better performance than IN with subqueries. Therefore you should use EXISTS rather than IN wherever possible.

但是没有说子查询和关联查询谁更有效率

可否指点一下迷津呢？

13 个解决方案

#1

第一,第三一样,
第二句慢,因为in函数全部扫描

#2

也并不是绝对的,你要确定建立合适的索引
对于in exist看一下我以前收集的:
You Asked

Tom:

can you give me some example at which situation
IN is better than exist, and vice versa.

and we said...

Well, the two are processed very very differently.

Select * from T1 where x in ( select y from T2 )

is typically processed as:

select *
  from t1, ( select distinct y from t2 ) t2
where t1.x = t2.y;

The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then joined to
the original table -- typically.

As opposed to

select * from t1 where exists ( select null from t2 where y = x )

That is processed more like:

   for x in ( select * from t1 )
   loop
      if ( exists ( select null from t2 where y = x.x )
      then
         OUTPUT THE RECORD
      end if
   end loop

It always results in a full scan of T1 whereas the first query can make use of an index
on T1(x).

So, when is where exists appropriate and in appropriate?

Lets say the result of the subquery
    ( select y from T2 )

is "huge" and takes a long time.  But the table T1 is relatively small and executing (
select null from t2 where y = x.x ) is very very fast (nice index on t2(y)).  Then the
exists will be faster as the time to full scan T1 and do the index probe into T2 could be
less then the time to simply full scan T2 to build the subquery we need to distinct on.

Lets say the result of the subquery is small -- then IN is typicaly more appropriate.

If both the subquery and the outer table are huge -- either might work as well as the
other -- depends on the indexes and other factors.

EXISTS、IN、NOT EXISTS、NOT IN的区别：

in适合内外表都很大的情况，exists适合外表结果集很小的情况。
exists 和 in 使用一例
===========================================================
今天市场报告有个sql及慢，运行需要20多分钟，如下：
update p_container_decl cd
set cd.ANNUL_FLAG='0001',ANNUL_DATE = sysdate
where exists(
select 1
from (
select tc.decl_no,tc.goods_no
from p_transfer_cont tc,P_AFFIRM_DO ad
where tc.GOODS_DECL_NO = ad.DECL_NO
and ad.DECL_NO = 'sssssssssssssssss'
) a
where a.decl_no = cd.decl_no
and a.goods_no = cd.goods_no
)
上面涉及的3个表的记录数都不小，均在百万左右。根据这种情况，我想到了前不久看的tom的一篇文章，说的是exists和in的区别，
in 是把外表和那表作hash join，而exists是对外表作loop，每次loop再对那表进行查询。
这样的话，in适合内外表都很大的情况，exists适合外表结果集很小的情况。

而我目前的情况适合用in来作查询，于是我改写了sql，如下：
update p_container_decl cd
set cd.ANNUL_FLAG='0001',ANNUL_DATE = sysdate
where (decl_no,goods_no) in
(

#3

也没有绝对吧,具体看执行计划,

#4

理论上第三条查询效率最高，用连接的效率要高于EXISTS

#5

顶

引用 4 楼 yf520gn 的回复:

理论上第三条查询效率最高，用连接的效率要高于EXISTS

#6

引用楼主 etl2007 的帖子:

那个sql效率更高？

SELECT product_type_id, name
FROM product_types outer
WHERE NOT EXISTS
  (SELECT 1
  FROM products inner
  WHERE inner.product_type_id = outer.product_type_id);

SELECT product_type_id, name
FROM product_types
WHERE product_type_id NOT IN
  (SELECT product_type_id
  FROM products);

SELECT outer。product_type_id, outer.name
FROM product_types out…

理论上：
第3条最高，因为连接走的是聚集索引
第2条次之，因为not exists走的是索引(也有可能是聚集索引)
第1条最慢，因为not in 走的是全部扫描啊！

#7

引用 2 楼 zxf_feng 的回复:

也并不是绝对的,你要确定建立合适的索引
对于in exist看一下我以前收集的:
You Asked

Tom:

can you give me some example at which situation
IN is better than exist, and vice versa.

and we said...

Well, the two are processed very very differently.

Select * from T1 where x in ( select y from T2 )

is typically processed as:

select *
from t1, ( select distinct y from…

说的很有道理，学习了！

#8

该回复于2011-01-14 13:05:17被版主删除

#9

如果有绝对的说法,那么其中一种就没存在的必要了

#10

没有绝对的只看数据量

#11

一般来说 exists会比in 快，因为in要遍历所有的值再判断是否符合而exists是一边查一边判断，找到了就不查了所以，一般会比in快
联合查询和exists，如果两个表数量级差不多的话，效率也应该差不多
如果外表数量少内表数量很多而且两者之间有个一一对应关系那么用联合查询会快很多用exists会很慢

#12

没有绝对的优先，数据量大小、索引都会影响效率的，数据量大的一般说关联查询效率高于子查询。

#13

看数据量, 快慢要看具体情况吧

#1

第一,第三一样,
第二句慢,因为in函数全部扫描

#2

#3

也没有绝对吧,具体看执行计划,

#4

理论上第三条查询效率最高，用连接的效率要高于EXISTS

#5

顶

引用 4 楼 yf520gn 的回复:

理论上第三条查询效率最高，用连接的效率要高于EXISTS

#6

引用楼主 etl2007 的帖子:

那个sql效率更高？

SELECT product_type_id, name
FROM product_types outer
WHERE NOT EXISTS
  (SELECT 1
  FROM products inner
  WHERE inner.product_type_id = outer.product_type_id);

SELECT product_type_id, name
FROM product_types
WHERE product_type_id NOT IN
  (SELECT product_type_id
  FROM products);

SELECT outer。product_type_id, outer.name
FROM product_types out…

理论上：
第3条最高，因为连接走的是聚集索引
第2条次之，因为not exists走的是索引(也有可能是聚集索引)
第1条最慢，因为not in 走的是全部扫描啊！

#7

引用 2 楼 zxf_feng 的回复:

也并不是绝对的,你要确定建立合适的索引
对于in exist看一下我以前收集的:
You Asked

Tom:

can you give me some example at which situation
IN is better than exist, and vice versa.

and we said...

Well, the two are processed very very differently.

Select * from T1 where x in ( select y from T2 )

is typically processed as:

select *
from t1, ( select distinct y from…

说的很有道理，学习了！

#8

该回复于2011-01-14 13:05:17被版主删除

#9

如果有绝对的说法,那么其中一种就没存在的必要了

#10

没有绝对的只看数据量

#11

#12

没有绝对的优先，数据量大小、索引都会影响效率的，数据量大的一般说关联查询效率高于子查询。

#13

看数据量, 快慢要看具体情况吧

秒客网

子查询和关联查询谁的效率更高

13 个解决方案

#1

#2

#3

#4

#5

#6

#7

#8

#9

#10

#11

#12

#13

#1

#2

#3

#4

#5

#6

#7

#8

#9

#10

#11

#12

#13

相关文章