SQL Server:只更新多个完全重复的记录中的一个

时间:2021-12-06 16:39:01

This one is bugging me.

这个让我心烦。

I am working in a data warehousing staging table where there can potentially be rows where the column values are 100% duplicates of others. There are dozens of columns, but for the sake of this argument, lets use the following example:

我正在一个数据仓库暂存表中工作,在这个表中,列值可能是其他列值的100%重复。有许多列,但是为了讨论这个问题,让我们使用以下示例:

tblExample (ID Int, Active bit, ModifiedDate DateTime)

Now, at any given time there is only supposed to be one record per ID that has Active set to 1. All others should have active set to 0. There is a process that enforces this during the data loading.

现在,在任何给定的时间,每个ID应该只有一条记录是活动集为1的。所有其他的都应该将活动设置为0。有一个过程在数据加载期间强制执行此操作。

That process can and has broken in the past, resulting in data like this:

这个过程可以也已经在过去被打破,导致这样的数据:

ID     Active ModifiedDate
123456 0      2016-05-27 12:37:46.111
123456 1      2016-05-27 12:37:46.433
123456 1      2016-05-27 12:37:46.433

In that case there are 2 "Identical" records that have Active set to 1. I need to find a way to make only one of those records active = 1.

在这种情况下,有两个“相同的”记录被有效地设置为1。我需要找到一种方法使这些记录中只有一个是活动的= 1。

Now the process I am using to do this currently assumes that the Date value is unique and in 99.99% of the times that IS the case. But there are times when the date will also be duplicated. And I can't for the life of me figure out a way to update only a single one of those records, since I have nothing to latch on to for the WHERE.

我现在使用的这个过程假设日期值是唯一的,在99。99%的情况下都是这样。但有时,日期也会被重复。而且我从来都没想过怎么去更新其中的一条记录,因为我没有什么东西可以去抓。

  • I cannot add or modify the schema.
  • 我不能添加或修改模式。
  • This is all happening inside , so I am limited to the restrictions inherent with that platform (Some things just don't work well in SSIS)
  • 这都是在ssis中发生的,所以我仅限于该平台固有的限制(在ssis中有些事情不能很好地工作)

Ideas?

想法吗?

1 个解决方案

#1


3  

This should work:

这应该工作:

with a as(
select *, ROW_NUMBER() OVER (PARTITION by ID, Active, ModifiedDate order by ModifiedDate) as rn from tblExample
)       

update a set active = 0 where rn >1
select * from tblExample;

Here is an example with your data.

这是你的数据的一个例子。

Create a CTE with Row_number() for dupes on ModifiedDate(as your solution works for non-dupes of ModifiedDate) and updates the CTE, updating your data.

创建一个带有Row_number()的CTE,用于修改日期(作为您的解决方案,用于修改日期),并更新CTE,更新您的数据。

If you want to replace your process you could use the below:

如果你想更换你的程序,你可以使用下面的:

with a as(
select *, ROW_NUMBER() OVER (PARTITION by ID, Active order by ModifiedDate desc) as rn from tblExample
)

update a set active = 0 where rn >1
select *, ROW_NUMBER() OVER (PARTITION by ID, A order by ModifiedDate desc) as rn from tblExample;

This only allows the most recent entry for each ID to be active

这只允许每个ID的最近条目是活动的

Alternative solution

可选择的解决方案

#1


3  

This should work:

这应该工作:

with a as(
select *, ROW_NUMBER() OVER (PARTITION by ID, Active, ModifiedDate order by ModifiedDate) as rn from tblExample
)       

update a set active = 0 where rn >1
select * from tblExample;

Here is an example with your data.

这是你的数据的一个例子。

Create a CTE with Row_number() for dupes on ModifiedDate(as your solution works for non-dupes of ModifiedDate) and updates the CTE, updating your data.

创建一个带有Row_number()的CTE,用于修改日期(作为您的解决方案,用于修改日期),并更新CTE,更新您的数据。

If you want to replace your process you could use the below:

如果你想更换你的程序,你可以使用下面的:

with a as(
select *, ROW_NUMBER() OVER (PARTITION by ID, Active order by ModifiedDate desc) as rn from tblExample
)

update a set active = 0 where rn >1
select *, ROW_NUMBER() OVER (PARTITION by ID, A order by ModifiedDate desc) as rn from tblExample;

This only allows the most recent entry for each ID to be active

这只允许每个ID的最近条目是活动的

Alternative solution

可选择的解决方案