之前我写过一篇博客分析Filter也有性能好的一面,最近看了一下,感觉说的不够浅显易懂,这里再写一篇。
之前博客链接:
https://blog.csdn.net/songjian1104/article/details/98325658
1 数据准备
drop table emp1;
drop table dept1;
create table emp1 as select * from emp;
insert into emp1 select * from emp1;
insert into emp1 select * from emp1;
insert into emp1 select * from emp1;
insert into emp1 select * from emp1;
insert into emp1 select * from emp1;
insert into emp1 select * from emp1;
insert into emp1 select * from emp1;
insert into emp1 select * from emp1;
create table dept1 as select * from dept;
insert into dept1 select * from dept1;
insert into dept1 select * from dept1;
insert into dept1 select * from dept1;
insert into dept1 select * from dept1;
insert into dept1 select * from dept1;
insert into dept1 select * from dept1;
insert into dept1 select * from dept1;
insert into dept1 select * from dept1;
commit;
2 需求分析
需求很简单,大家直接看SQL理解吧。
3 SQL改写
select count(distinct e.EMPNO)
from emp1 e
where e.empno=7788
or e.deptno in (select deptno
from dept1 d
where d.dname='SALES');
虽然有Filter,但是结果还是秒出了。
select count(empno) from
(select *
from emp1 e
where e.empno=7788
union
select *
from emp1 e
where e.deptno in (select deptno
from dept1 d
where d.dname='SALES'));
我们发现,Filter消除后,执行时间大幅度增加。
原因很简单,我们回想Filter连接的算法,第一次连接后,连接的结果会被缓存,内存中会维护一个Table,结构如下
这样,emp1表向dept1表传值进行判定时,先去缓存的表里检索,如果发现结构已经缓存,那么直接从缓存里取值,避免了dept1表的再检索。
但是,消除Filter后,第一,emp1要多扫描一次,第二,看到执行计划中,内存使用也要相应多很多,在连接计算上,成本也要高出很多。
所以,表里面连接列重复值很多时,Filter有时候性能也很好。具体问题具体分析。