SQL联合查询优化 用union all来代替union

时间:2021-06-09 15:47:23

Optimizing UNION
UNION has an interesting optimization that exists across a few different databases. It's obvious when you think about how it works. UNION gives you the rows from two tables that don't exist in the other. So implicitly, you are removing duplicates. To do this the MySQL database must return distinct rows, and thus must sort the data. Sorting, as we know is expensive, especially for large tables.

UNION ALL can very well be a big speedup for you. What if you already know that your data does not contain duplicates in either row, or what if you don't care about duplicates? In either case, UNION ALL is for you. Further, there may be other ways you can avoid the duplicates in your rows using some application logic, so you know that UNION ALL will provide the results you want, without the heavy overhead of sorting the data.

union和union all的差别就在于union会对数据做一个distanct的动作,而这个distanct动作的速度则取决于现有数据的数量,数量越大则时间也越慢。而对于几个数据集,要确保数据集之间的数据互相不重复,基本是O(n)的算法复杂度。

有了理论依据后,便动手更改SQL的结构,在确保数据逻辑上不会有重复情况出现后,将2个union都改成了union all,query的反应速度从1.7秒变成了300毫秒左右,耗费时间只有以前的17%。

UNION还有一个用处,我们在海量数据的查询中,如果使用select * from c_cons where cons_id in ('691339365','3387785','3387954');这样的查询语句,会引起全表扫描,可以使用UNION ALL来代替,如:

select * from c_cons where cons_id='691339365'
UNION ALL
select * from c_cons where cons_id='3387785'
UNION ALL
select * from c_cons where cons_id='3387954'
这样查询比使用in查询要快很多,它不会去进行全表扫描。


例2:

or语句(部分节选)

SELECT * FROM tablename where (cdp= 300 and inline=301) or (cdp= 301 and inline=301) or (cdp= 302 and inline=301) or (cdp= 303 and inline=301) or (cdp= 304 and inline=301) or (cdp= 305 and inline=301) or (cdp= 306 and inline=301) or (cdp= 307 and inline=301)

union all语句(部分节选)

SELECT * FROM tablename where (inline= 300 and cdp=300) union all SELECT * FROM tablename where (inline= 301 and cdp=300) union all SELECT * FROM tablename where (inline= 302 and cdp=300) union all SELECT * FROM tablename where (inline= 303 and cdp=300)

返回不规则的900条数据,前者用了60多秒,后者用了8秒左右。

------------------------------
用DB2测试,发现还是用IN的效率高于union all