How to get Duplicate records from single table depends on column value?

时间:2021-04-24 22:13:57

I got table like this:

我有这样的表:

declare @t1 table(Pat_Ref int,Fname VARCHAR(20), Sname VARCHAR(20),minor VARCHAR(1),SourceSys VARCHAR(40),regdt datetime)

insert into @t1 
values   (111,'John', 'Wayne','N','ick','2015-06-09 21:31:09.253')
        ,(111,'John', 'Wayne','N','ick','2014-05-09 21:31:09.253')
        ,(111,'John', 'Wayne',null,'hpk','2015-04-09 21:31:09.253')

        ,(112,'Jill', 'Smith','N','ick','2015-01-08 21:31:09.253')
        ,(112,'Jill', 'Smith',null,'hpk','2015-05-01 21:31:09.253')

        ,(113,'Bill', 'Peyton','N','ick','2015-06-09 21:31:09.253')

        ,(114,'Gill', 'Peyton','N','hpk','2015-06-09 21:31:09.253')
        ,(114,'Gill', 'Peyton','N','hpk','2015-06-12 21:31:09.253')
        ,(114,'Gill', 'Peyton','N','ick','2006-10-22 21:31:09.253')

        ,(115,'Billy', 'Peyton','N','hpk','2015-06-09 21:31:09.253')

        ,(116,'William', 'nixon','N','ick','2015-06-09 21:31:09.253')
        ,(116,'William', 'nixon','N','ick','2015-06-09 21:31:09.253')

I want duplicate records based on Pat_Ref and values between ick and hpk on SourceSys column. If duplicate record found row must be recent date regdt

我想要基于Pat_Ref的重复记录以及SourceSys列上的ick和hpk之间的值。如果找到重复记录行必须是最近日期regdt

Note: From the above table Pat_Ref=116, there are two rows but it can't be duplicate because its SourceSys value is same ick. if that value is ick on row and hpk on another row then it can become duplicate row.

注意:从上表Pat_Ref = 116,有两行,但它不能重复,因为它的SourceSys值是相同的ick。如果该值在行上为ick而在另一行上为hpk,则它可以成为重复行。

I want result like this;

我想要这样的结果;

Pat_Ref     Fname   Sname   minor   SourceSys   regdt
111         John    Wayne   N       ick         2015-06-09 21:31:09.253
112         Jill    Smith   NULL    hpk         2015-05-01 21:31:09.253
114         Gill    Peyton  N       hpk         2015-06-12 21:31:09.253

Any help. Thanks

任何帮助。谢谢

1 个解决方案

#1


1  

I hope I understood you right, that's the query I came up with and and it does seem to return expected results:

我希望我理解正确,这是我提出的查询,它确实似乎返回了预期的结果:

SELECT Pat_Ref, Fname, Sname, minor, SourceSys, regdt
FROM (
    SELECT ROW_NUMBER() OVER (PARTITION BY Pat_ReF ORDER BY regdt DESC) AS RN, *
    FROM @t1
) AS T
CROSS APPLY (
    SELECT 1
    FROM @t1 AS T2
    WHERE T2.Pat_Ref = T.Pat_Ref
        AND T2.SourceSys BETWEEN 'hck' AND 'ick'
    GROUP BY T2.Pat_Ref
    HAVING COUNT(DISTINCT T2.SourceSys) > 1
    ) AS T2(UQ)
WHERE T.RN = 1;

Explanation: I'm using ROW_NUMBER() to get most recent regdt for each Pat_Ref in case it needs to be filtered.

说明:我正在使用ROW_NUMBER()来获取每个Pat_Ref的最新regdt,以防需要进行过滤。

And I'm using CROSS APPLY to get DISTINCT COUNT() for each Pat_Ref SourceSys and if I understood it right, it must match both ICK and HCK.

我正在使用CROSS APPLY为每个Pat_Ref SourceSys获取DISTINCT COUNT(),如果我理解正确,它必须匹配ICK和HCK。

Any questions - let me know.

有任何问题 - 让我知道。

#1


1  

I hope I understood you right, that's the query I came up with and and it does seem to return expected results:

我希望我理解正确,这是我提出的查询,它确实似乎返回了预期的结果:

SELECT Pat_Ref, Fname, Sname, minor, SourceSys, regdt
FROM (
    SELECT ROW_NUMBER() OVER (PARTITION BY Pat_ReF ORDER BY regdt DESC) AS RN, *
    FROM @t1
) AS T
CROSS APPLY (
    SELECT 1
    FROM @t1 AS T2
    WHERE T2.Pat_Ref = T.Pat_Ref
        AND T2.SourceSys BETWEEN 'hck' AND 'ick'
    GROUP BY T2.Pat_Ref
    HAVING COUNT(DISTINCT T2.SourceSys) > 1
    ) AS T2(UQ)
WHERE T.RN = 1;

Explanation: I'm using ROW_NUMBER() to get most recent regdt for each Pat_Ref in case it needs to be filtered.

说明:我正在使用ROW_NUMBER()来获取每个Pat_Ref的最新regdt,以防需要进行过滤。

And I'm using CROSS APPLY to get DISTINCT COUNT() for each Pat_Ref SourceSys and if I understood it right, it must match both ICK and HCK.

我正在使用CROSS APPLY为每个Pat_Ref SourceSys获取DISTINCT COUNT(),如果我理解正确,它必须匹配ICK和HCK。

Any questions - let me know.

有任何问题 - 让我知道。