Comparing the Contents of Two Tables
I have two tables named A and B. They have identical columns and have the same number of rows via select count(*) from A and from B. However, the content in one of the rows is different, as shown in the following query:
SQL> select * from A where C1=1;
C1 C2 C3
------ ------------ --------
1 AAAAAAAAAAAA 100 SQL> select * from B where C1=1; C1 C2 C3
------ ------------ --------
1 AAAAAAAAAAAB 100
The only difference is the last character in column C2. It is an A in table A and a B in table B. I would like to write SQL to compare or see if tables A and B are in sync with respect to their content rather than the number of rows, but I don't know how to do it.
OK, we'll do the specific solution to this problem with columns C1, C2 , and C3 , and then we'll see how to generalize this to any number of columns. The first and immediate answer I came to was this:
(select 'A', a.* from a
MINUS
select 'A', b.* from b)
UNION ALL
(select 'B', b.* from b
MINUS
select 'B', a.* from a)
2大缺点:
That is, just take A minus B (which gives us everything in A that's not in B ) and add to that (UNION ALL ) the result of B minus A . In fact, that is correct, but it has a couple of drawbacks:
The query requires four full table scans. 查询需要4次全部扫描
If a row is duplicated in A , then MINUS will "de-dup" it silently (and do the same with B ). 如果A表记录重复,MINUS将去重
So, this solution would be slow and also hide information from us. There is a better way, however, that uses just two full scans and GROUP BY . Consider these values in A and B :
SQL> select * from a; C1 C2 C3
---------- -- --
1 x y
2 xx y
3 x y SQL> select * from b;
C1 C2 C3
---------- -- --
1 x y
2 x y
3 x yy
The first rows are the same, but the second and third rows differ. This is how we can find them:
SQL> select c1, c2, c3,
2 count(src1) CNT1,
3 count(src2) CNT2
4 from
5 ( select a.*,
6 1 src1,
7 to_number(null) src2
8 from a
9 union all
10 select b.*,
11 to_number(null) src1,
12 2 src2
13 from b
14 )
15 group by c1,c2,c3
16 having count(src1) <> count(src2)
17 / C1 C2 C3 CNT1 CNT2
--- -- -- ---- ----
2 x y 0 1
2 xx y 1 0
3 x y 1 0
3 x yy 0 1
Now, because COUNT(<expression>) returns a count of the non-null values of<expression> —we expect that after grouping by all of the columns in the table—we would have two equal counts (because COUNT(src1) counts the number of records in table A that have those values and COUNT(src2) does the same for table B CNT1 and CNT2 , that would have told us that table A has this row twice but table B has it three times (which is something the MINUS and UNION ALL operators above would not be able to do).
To give credit where credit is due, you'll want to read the original Ask Tom discussion that got us to this answer: asktom.oracle.com/~tkyte/compare.html. What I found interesting in that thread was the back and forth we had to go through in order to come to the final query. Ultimately, a combination of Marco Stefanetti's technique with a minor addition I made led to this query, but you'll see the genesis of a pretty good idea there.