I am looking to replace NULL values with a mode value based on data within a table.
我期待基于表中的数据用模式值替换NULL值。
In the following example I would like to replace the NULL InDate of the EquipmentID with the mode value of the InDates for that ProcessID. I have calculated the mode InDate for a ProcessID, I just can't work out how to use that value to replace the NULL value for the EquipmentID withing the ProcessID
在下面的示例中,我想将EquipmentID的NULL InDate替换为该ProcessID的InDates的模式值。我已经为ProcessID计算了InDate模式,我只是无法弄清楚如何使用该值来替换带有ProcessID的EquipmentID的NULL值
Here is an example setup:
以下是一个示例设置:
CREATE TABLE dbo.Table_basic (
InDate INT,
EquipmentID INT,
ProcessID nvarchar(50),
SiteID INT
)
INSERT INTO Table_basic (InDate, EquipmentID, ProcessID, SiteID)
VALUES (2001, 1,'1PAA',1),
(2001,2,'1PAA',1),
(NULL, 3,'1PAA',1),
(2001,4,'1PAA',1),
(1999, 5,'1PAA',1),
(2001,6,'1PAB',1),
(2001,7,'1PAC',1),
(2001, 8,'2AA',2),
(1999,9,'2AB',2),
(NULL, 10,'2AB',2),
(1999,11,'2AB',2),
(1998,12,'2AB',2),
(2001, 13,'2AB',2),
(1999,14,'2AB',2),
(2001, 15,'2AC',2),
(2001,16,'2AC',2),
(1986, 17,'3AA',3),
(1985,18,'3AA',3),
(1985,19,'3AA',3),
(NULL, 20,'3AC',3),
(2005,21,'3AC',3),
(2005, 22,'3AC',3),
(2005,23,'3AC',3);
This is how I find the mode of the InDate for the Equipment within the ProcessID.
这就是我在ProcessID中找到设备的InDate模式的方法。
WITH CTE_CountofEquipment AS
(
SELECT
ProcessID
,SiteID
,cnt = COUNT(1)
,rid = ROW_NUMBER() OVER (PARTITION BY ProcessID ORDER BY COUNT(1) DESC)
,InDate
FROM dbo.Table_basic
GROUP BY SiteID, ProcessID, InDate
)
SELECT
ProcessID
,cnt = cnt
,[SiteID]
,InDate
FROM CTE_CountofEquipment
WHERE rid = 1
ORDER BY SiteID;
I would like to use these determined modes to fill in the NULL InDate for a given ProcessID.
我想使用这些确定的模式来填充给定ProcessID的NULL InDate。
Desired result example:
期望的结果示例:
(NULL, 3,'1PAA',1),
(2001, 3,'1PAA',1),
(2001, 3,'1PAA',1),
(1999, 3,'1PAA',1),
(2000, 3,'1PAA',1),
(2001, 3,'1PAA',1),
becomes
变
(2001, 3,'1PAA',1), -- InDate updated to modal value
(2001, 3,'1PAA',1),
(2001, 3,'1PAA',1),
(1999, 3,'1PAA',1),
(2000, 3,'1PAA',1),
(2001, 3,'1PAA',1),
Thanks
谢谢
2 个解决方案
#1
8
I would do the calculation like this:
我会像这样计算:
with modes as (
select p.*
from (select tb.processId, tb.indate, count(*) as cnt,
row_number() over (partition by tb.processId order by count(*) desc) as seqnum
from table_basic tb
group by tb.processId, tb.indate
) p
where seqnum = 1
)
update tb
set indate = m.indate
from table_basic tb join
modes m
on tb.processId = m.processId
where indate is null;
This answers your question. I have no idea why your calculation of the mode uses the SiteId
. That is not part of the question. I don't know what the reference is to NULL
values for EquipmentId
. That is not part of the question either.
这回答了你的问题。我不知道为什么你的模式计算使用SiteId。这不是问题的一部分。我不知道对于EquipmentId的NULL值的引用是什么。这也不是问题的一部分。
However, you should be able to easily modify this for other groupings for the modes or other columns.
但是,您应该能够轻松地为模式或其他列的其他分组修改此项。
#2
0
You can use a query like the following to do the UPDATE
:
您可以使用以下查询来执行更新:
;WITH CTE_CountofEquipment AS (
SELECT InDate, ProcessID, SiteID,
COUNT(*) OVER (PARTITION BY ProcessID, SiteID, InDate) AS cnt
FROM dbo.Table_basic
), ToUpdate AS (
SELECT InDate, ProcessID, SiteID,
FIRST_VALUE(InDate)
OVER
(PARTITION BY ProcessID, SiteID
ORDER BY cnt DESC ) AS mode
FROM CTE_CountofEquipment
)
UPDATE ToUpdate
SET InDate = mode
WHERE InDate IS NULL
The query uses window functions to calculate the mode
value:
该查询使用窗口函数来计算模式值:
-
COUNT OVER()
is used in order to determine the population of eachInDate
slice within eachProcessID, SiteID
partition - COUNT OVER()用于确定每个ProcessID,SiteID分区中每个InDate切片的填充
-
FIRST_VALUE(InDate) is used to select the
InDate` having the biggest population - FIRST_VALUE(InDate)用于选择人口最多的“日期”
在这里演示
#1
8
I would do the calculation like this:
我会像这样计算:
with modes as (
select p.*
from (select tb.processId, tb.indate, count(*) as cnt,
row_number() over (partition by tb.processId order by count(*) desc) as seqnum
from table_basic tb
group by tb.processId, tb.indate
) p
where seqnum = 1
)
update tb
set indate = m.indate
from table_basic tb join
modes m
on tb.processId = m.processId
where indate is null;
This answers your question. I have no idea why your calculation of the mode uses the SiteId
. That is not part of the question. I don't know what the reference is to NULL
values for EquipmentId
. That is not part of the question either.
这回答了你的问题。我不知道为什么你的模式计算使用SiteId。这不是问题的一部分。我不知道对于EquipmentId的NULL值的引用是什么。这也不是问题的一部分。
However, you should be able to easily modify this for other groupings for the modes or other columns.
但是,您应该能够轻松地为模式或其他列的其他分组修改此项。
#2
0
You can use a query like the following to do the UPDATE
:
您可以使用以下查询来执行更新:
;WITH CTE_CountofEquipment AS (
SELECT InDate, ProcessID, SiteID,
COUNT(*) OVER (PARTITION BY ProcessID, SiteID, InDate) AS cnt
FROM dbo.Table_basic
), ToUpdate AS (
SELECT InDate, ProcessID, SiteID,
FIRST_VALUE(InDate)
OVER
(PARTITION BY ProcessID, SiteID
ORDER BY cnt DESC ) AS mode
FROM CTE_CountofEquipment
)
UPDATE ToUpdate
SET InDate = mode
WHERE InDate IS NULL
The query uses window functions to calculate the mode
value:
该查询使用窗口函数来计算模式值:
-
COUNT OVER()
is used in order to determine the population of eachInDate
slice within eachProcessID, SiteID
partition - COUNT OVER()用于确定每个ProcessID,SiteID分区中每个InDate切片的填充
-
FIRST_VALUE(InDate) is used to select the
InDate` having the biggest population - FIRST_VALUE(InDate)用于选择人口最多的“日期”
在这里演示