oracle数据库去重,只保留一条数据

时间:2024-03-07 17:54:24

一、使用 Row_Number() Over(Partition By ... Order By ...) 为每个分组生成一个内部序号

第一步,根据需求查询重复数据,并为每个分组生成一个内部行号

Select t.*,
       Row_Number() Over(Partition By Visit_Id, Presc_Detail_Id, Rule_Code Order By t.Visit_Id) Su
  From Test t;

第二步,筛选出行号大于1的,即重复的多余的数据

Select *
  From (Select t.*,
               Row_Number() Over(Partition By Visit_Id, Presc_Detail_Id, Rule_Code Order By t.Visit_Id) Su
          From Test t)
 Where Su > 1;

第三步,删除重复数据,只保留1条

Delete From Test
 Where Id In (Select Id
                From (Select t.*,
                             Row_Number() Over(Partition By Visit_Id, Presc_Detail_Id, Rule_Code Order By t.Visit_Id) Su
                        From Test t)
               Where Su > 1);

这种方式有个弊端,如果数据本身并没有每行的唯一标识id,删除重复数据还是非常困难的。

 

二、使用 ROWID

第一步,查找重复的记录

Select *
  From Test t
 Where (Visit_Id, Presc_Detail_Id, Rule_Code) In
       (Select Visit_Id, Presc_Detail_Id, Rule_Code
          From Test
         Group By Visit_Id, Presc_Detail_Id, Rule_Code
        Having Count(*) > 1)
 Order By Visit_Id, Presc_Detail_Id, Rule_Code;

第二步,查找表中多余的重复记录,不包含rowid最小的记录

Select *
  From Test t
 Where (Visit_Id, Presc_Detail_Id, Rule_Code) In
       (Select Visit_Id, Presc_Detail_Id, Rule_Code
          From Test
         Group By Visit_Id, Presc_Detail_Id, Rule_Code
        Having Count(*) > 1)
   And Rowid Not In (Select Min(Rowid)
                       From Test
                      Group By Visit_Id, Presc_Detail_Id, Rule_Code
                     Having Count(*) > 1);

第三步,删除表中多余的重复记录,只保留rowid最小的记录

Delete From Test t
 Where (Visit_Id, Presc_Detail_Id, Rule_Code) In
       (Select Visit_Id, Presc_Detail_Id, Rule_Code
          From Test
         Group By Visit_Id, Presc_Detail_Id, Rule_Code
        Having Count(*) > 1)
   And Rowid Not In (Select Min(Rowid)
                       From Test
                      Group By Visit_Id, Presc_Detail_Id, Rule_Code
                     Having Count(*) > 1);