【Oracle】删除重复记录

时间:2021-06-06 14:28:54
--复习autotrace:
SET AUTOTRACE OFF --不生成AUTOTRACE 报告,这是缺省模式
SET AUTOTRACE ON EXPLAIN --AUTOTRACE只显示优化器执行路径报告
SET AUTOTRACE ON STATISTICS --只显示执行统计信息
SET AUTOTRACE ON --包含执行计划和统计信息
SET AUTOTRACE TRACEONLY --同set autotrace on,但是不显示查询输出 --案例模拟
--查询原始表:
23:13:36 SQL> select * from emp; EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- ------------ ---------- ---------- ----------
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30
7782 CLARK MANAGER 7839 09-JUN-81 2450 10
7788 SCOTT ANALYST 7566 19-APR-87 3000 20
7839 KING PRESIDENT 17-NOV-81 5000 10
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30
7876 ADAMS CLERK 7788 23-MAY-87 1100 20
7900 JAMES CLERK 7698 03-DEC-81 950 30
7902 FORD ANALYST 7566 03-DEC-81 3000 20
7934 MILLER CLERK 7782 23-JAN-82 1300 10
7369 SMITH CLERK 7902 17-DEC-80 800 20
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30
7566 JONES MANAGER 7839 02-APR-81 2975 20
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 14 rows selected. Elapsed: 00:00:00.09 --模拟,插入重复记录
INSERT INTO emp select * from emp; --检查重复记录
23:13:39 SQL> SELECT COUNT(empno), empno FROM emp GROUP BY empno HAVING COUNT(empno) > 1; COUNT(EMPNO) EMPNO
------------ ----------
2 7782
2 7839
2 7844
2 7698
2 7902
2 7521
2 7788
2 7934
2 7566
2 7654
2 7499
2 7876
2 7900
2 7369 14 rows selected. Elapsed: 00:00:00.04 --方法一(只保留ROWID最小的记录):
DELETE FROM emp
WHERE empno IN
(SELECT empno FROM emp GROUP BY empno HAVING COUNT(empno) > 1)
AND ROWID NOT IN
(SELECT MIN(ROWID) FROM emp GROUP BY empno HAVING COUNT(empno) > 1); [lubinsu@200-168-46-3 shell_monitor]$ sqlplus lubinsu/lubinsu SQL*Plus: Release 10.2.0.1.0 - Production on Sun Aug 18 23:09:34 2013 Copyright (c) 1982, 2005, Oracle. All rights reserved. Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options 23:09:46 SQL> set timing on
23:09:51 SQL> set autotrace traceonly
23:09:58 SQL> set linesize 200
23:17:32 SQL> DELETE FROM emp
23:17:38 2 WHERE empno IN
23:17:40 3 (SELECT empno FROM emp GROUP BY empno HAVING COUNT(empno) > 1)
23:17:45 4 AND ROWID NOT IN
23:17:49 5 (SELECT MIN(ROWID) FROM emp GROUP BY empno HAVING COUNT(empno) > 1); 14 rows deleted. Elapsed: 00:00:00.07 Execution Plan
----------------------------------------------------------
Plan hash value: 319293636 ------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 41 | 9 (34)| 00:00:01 |
| 1 | DELETE | EMP | | | | |
|* 2 | HASH JOIN ANTI | | 1 | 41 | 9 (34)| 00:00:01 |
|* 3 | HASH JOIN SEMI | | 1 | 29 | 6 (34)| 00:00:01 |
| 4 | TABLE ACCESS FULL | EMP | 14 | 224 | 2 (0)| 00:00:01 |
| 5 | VIEW | VW_NSO_1 | 1 | 13 | 3 (34)| 00:00:01 |
|* 6 | FILTER | | | | | |
| 7 | SORT GROUP BY | | 1 | 4 | 3 (34)| 00:00:01 |
| 8 | TABLE ACCESS FULL| EMP | 14 | 56 | 2 (0)| 00:00:01 |
| 9 | VIEW | VW_NSO_2 | 1 | 12 | 3 (34)| 00:00:01 |
|* 10 | FILTER | | | | | |
| 11 | SORT GROUP BY | | 1 | 16 | 3 (34)| 00:00:01 |
| 12 | TABLE ACCESS FULL | EMP | 14 | 224 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------ Predicate Information (identified by operation id):
--------------------------------------------------- 2 - access(ROWID="$nso_col_1")
3 - access("EMPNO"="$nso_col_1")
6 - filter(COUNT("EMPNO")>1)
10 - filter(COUNT("EMPNO")>1) Statistics
----------------------------------------------------------
1 recursive calls
15 db block gets
16254 consistent gets
0 physical reads
4256 redo size
925 bytes sent via SQL*Net to client
1116 bytes received via SQL*Net from client
6 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
14 rows processed --方法二、使用分析函数(只保留ROWID最小的记录):
23:17:57 SQL> rollback; Rollback complete. Elapsed: 00:00:00.02
23:19:11 SQL> DELETE FROM emp
23:22:22 2 WHERE ROWID IN (SELECT ROWID
23:22:28 3 FROM (SELECT ROWID row_id,
23:22:32 4 row_number() over(PARTITION BY empno ORDER BY ROWID ASC) row_num
23:22:37 5 FROM emp e) v
23:22:41 6 WHERE v.row_num >= 2); 14 rows deleted. Elapsed: 00:00:00.04 Execution Plan
----------------------------------------------------------
Plan hash value: 3671588261 -----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 24 | 5 (40)| 00:00:01 |
| 1 | DELETE | EMP | | | | |
| 2 | NESTED LOOPS | | 1 | 24 | 5 (40)| 00:00:01 |
| 3 | VIEW | VW_NSO_1 | 14 | 168 | 3 (34)| 00:00:01 |
| 4 | SORT UNIQUE | | 1 | 350 | | |
|* 5 | VIEW | | 14 | 350 | 3 (34)| 00:00:01 |
| 6 | WINDOW SORT | | 14 | 224 | 3 (34)| 00:00:01 |
| 7 | TABLE ACCESS FULL | EMP | 14 | 224 | 2 (0)| 00:00:01 |
| 8 | TABLE ACCESS BY USER ROWID| EMP | 1 | 12 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------- Predicate Information (identified by operation id):
--------------------------------------------------- 5 - filter("V"."ROW_NUM">=2) Statistics
----------------------------------------------------------
1 recursive calls
18 db block gets
5434 consistent gets
0 physical reads
4448 redo size
925 bytes sent via SQL*Net to client
1190 bytes received via SQL*Net from client
6 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
14 rows processed --方法三、
23:22:45 SQL> DELETE FROM emp p WHERE ROWID NOT IN(SELECT MAX(ROWID) FROM emp i WHERE i.empno = p.empno); 0 rows deleted. Elapsed: 00:00:00.13 Execution Plan
----------------------------------------------------------
Plan hash value: 1987023575 ---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 13 | 533 | 6 (34)| 00:00:01 |
| 1 | DELETE | EMP | | | | |
|* 2 | HASH JOIN | | 13 | 533 | 6 (34)| 00:00:01 |
| 3 | VIEW | VW_SQ_1 | 14 | 350 | 3 (34)| 00:00:01 |
| 4 | SORT GROUP BY | | 14 | 224 | 3 (34)| 00:00:01 |
| 5 | TABLE ACCESS FULL| EMP | 14 | 224 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | EMP | 14 | 224 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------- Predicate Information (identified by operation id):
--------------------------------------------------- 2 - access("EMPNO"="P"."EMPNO")
filter(ROWID<>"VW_COL_1") Statistics
----------------------------------------------------------
168 recursive calls
0 db block gets
10954 consistent gets
0 physical reads
0 redo size
925 bytes sent via SQL*Net to client
1008 bytes received via SQL*Net from client
6 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
0 rows processed --方法四、和方法三是类似的:
Elapsed: 00:00:00.03
23:28:25 SQL> DELETE FROM emp p WHERE ROWID < (SELECT MAX(ROWID) FROM emp i WHERE i.empno = p.empno); 14 rows deleted. Elapsed: 00:00:00.06 Execution Plan
----------------------------------------------------------
Plan hash value: 1987023575 ---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 41 | 6 (34)| 00:00:01 |
| 1 | DELETE | EMP | | | | |
|* 2 | HASH JOIN | | 1 | 41 | 6 (34)| 00:00:01 |
| 3 | VIEW | VW_SQ_1 | 14 | 350 | 3 (34)| 00:00:01 |
| 4 | SORT GROUP BY | | 14 | 224 | 3 (34)| 00:00:01 |
| 5 | TABLE ACCESS FULL| EMP | 14 | 224 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | EMP | 14 | 224 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------- --方法五、使用group by,同样类似三四
23:30:01 SQL> DELETE FROM emp p WHERE ROWID NOT IN (SELECT MAX(ROWID) FROM emp i GROUP BY i.empno); 14 rows deleted. Elapsed: 00:00:00.06 Execution Plan
----------------------------------------------------------
Plan hash value: 3239269824 ----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 24 | 6 (34)| 00:00:01 |
| 1 | DELETE | EMP | | | | |
|* 2 | HASH JOIN ANTI | | 1 | 24 | 6 (34)| 00:00:01 |
| 3 | TABLE ACCESS FULL | EMP | 14 | 168 | 2 (0)| 00:00:01 |
| 4 | VIEW | VW_NSO_1 | 14 | 168 | 3 (34)| 00:00:01 |
| 5 | SORT GROUP BY | | 14 | 224 | 3 (34)| 00:00:01 |
| 6 | TABLE ACCESS FULL| EMP | 14 | 224 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------- Predicate Information (identified by operation id):
--------------------------------------------------- 2 - access(ROWID="$nso_col_1") Statistics
----------------------------------------------------------
0 recursive calls
17 db block gets
10836 consistent gets
0 physical reads
4324 redo size
926 bytes sent via SQL*Net to client
1002 bytes received via SQL*Net from client
6 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
14 rows processed --总结,还是自己想出来的方法二效率高,:)