使用排名函数查找重复出现的事件

时间:2021-02-01 18:34:37

Please help me generate the following query i've been struggling with for some time now. Lets' say I have a simple table with month number and information whether there were any failed events in this particular month

请帮我生成以下查询,我已经挣扎了一段时间了。让我们说我有一个简单的表格,其中包含月份编号和信息,表明在这个特定月份是否有任何失败事件

Below a script to generate sample data:

在脚本下面生成示例数据:

WITH DATA(Month, Success) AS
(
    SELECT  1, 0 UNION ALL
    SELECT  2, 0 UNION ALL
    SELECT  3, 0 UNION ALL
    SELECT  4, 1 UNION ALL
    SELECT  5, 1 UNION ALL
    SELECT  6, 0 UNION ALL
    SELECT  7, 0 UNION ALL
    SELECT  8, 1 UNION ALL
    SELECT  9, 0 UNION ALL
    SELECT 10, 1 UNION ALL
    SELECT 11, 0 UNION ALL
    SELECT 12, 1 UNION ALL
    SELECT 13, 0 UNION ALL
    SELECT 14, 1 UNION ALL
    SELECT 15, 0 UNION ALL
    SELECT 16, 1 UNION ALL
    SELECT 17, 0 UNION ALL
    SELECT 18, 0
)

Given the definition of a "repeated failure ":

鉴于“重复失败”的定义:

When event failure occurs during at least 4 months in any 6 months period then the last month with such failure is a "repeated failure" my query should return the following output

当任何6个月期间至少4个月内发生事件失败时,那么失败的最后一个月是“重复失败”我的查询应返回以下输出

Month   Success RepeatedFailure
1       0   
2       0   
3       0   
4       1   
5       1   
6       0       R1
7       0       R2
8       1   
9       0   
10      1   
11      0       R3
12      1   
13      0   
14      1   
15      0   
16      1   
17      0
18      0       R1

where:

  • R1 -1st repeated failure in month no 6 (4 failures in last 6 months).
  • R1-1在第6个月重复失败(过去6个月失败4次)。

  • R2 -2nd repeated failure in month no 7 (4 failures in last 6 months).
  • R2第二次重复失败,第7个月失败(过去6个月失败4次)。

  • R3 -3rd repeated failure in month no 11 (4 failures in last 6 months).
  • R3第3次在第11个月重复失败(过去6个月失败4次)。

R1 -again 1st repeated failure in month no 18 because Repeated Failures should be again numbered from the beginning when new Repeated Failure occurs for the first time in last 6 reporting periods

R1 - 在第18个月中第一次重复失败,因为在最近6个报告期内第一次出现新的重复失败时,重复失败应该从头开始再次编号

Repeated Failures are numerated consecutively because based on its number i must apply appropriate multiplier:

重复失败是连续计算的,因为根据其编号,我必须应用适当的乘数:

  • 1st repated failure - X2
  • 第一次重复失败 - X2

  • 2nd repeated failure - X4
  • 第二次重复失败 - X4

  • 3rd and more repeated failure -X5.
  • 第3次和更多次重复失败-X5。

2 个解决方案

#1


2  

I'm sure this can be improved, but it works. We essentially do two passes - the first to establish repeated failures, the second to establish what kind of repeated failure each is. Note that Intermediate2 can definitely be done away with, I've only separated it out for clarity. All the code is one statement, my explanation is interleaved:

我确信这可以改进,但它确实有效。我们基本上做两次通过 - 第一次建立重复失败,第二次确定每次重复失败。请注意,Intermediate2绝对可以废除,为了清楚起见,我只是将它分开了。所有代码都是一个语句,我的解释是交错的:

;WITH DATA(Month, Success) AS
-- assuming your data  as defined (with my edit)
,Intermediate AS 
(
SELECT
    Month,
    Success,
    -- next column for illustration only
    (SELECT SUM(Success) 
     FROM DATA hist 
     WHERE curr.Month - hist.Month BETWEEN 0 AND 5) 
        AS SuccessesInLastSixMonths,
    -- next column for illustration only
    6 - (SELECT SUM(Success) 
     FROM DATA hist 
     WHERE curr.Month - hist.Month BETWEEN 0 AND 5) 
        AS FailuresInLastSixMonths,
    CASE WHEN 
            (6 - (SELECT SUM(Success) 
                    FROM DATA hist 
                    WHERE curr.Month - hist.Month BETWEEN 0 AND 5)) 
            >= 4 
            THEN 1
            ELSE 0 
    END AS IsRepeatedFailure
FROM DATA curr 
-- No real data until month 6
WHERE curr.Month > 5
)

At this point we have established, for each month, whether it's a repeated failure, by counting the failures in the six months up to and including it.

在这一点上,我们已经确定了每个月是否是一次重复的失败,通过计算六个月内的失败直至并包括它。

,Intermediate2 AS
(
SELECT 
    Month,
    Success,
    IsRepeatedFailure,
    (SELECT SUM(IsRepeatedFailure) 
        FROM Intermediate hist 
        WHERE curr.Month - hist.Month BETWEEN 0 AND 5) 
        AS RepeatedFailuresInLastSixMonths
FROM Intermediate curr
)

Now we have counted the number of repeated failures in the six months leading up to now

现在我们已经计算了到目前为止六个月内重复失败的次数

SELECT
    Month,
    Success,
    CASE IsRepeatedFailure 
        WHEN 1 THEN 'R' + CONVERT(varchar, RepeatedFailuresInLastSixMonths) 
        ELSE '' END
    AS RepeatedFailureText
FROM Intermediate2

so we can say, if this month is a repeated failure, what cardinality of repeated failure it is.

所以我们可以说,如果这个月是一次又一次的失败,那么重复失败的基数是多少。

Result:

Month       Success     RepeatedFailureText
----------- ----------- -------------------------------
6           0           R1
7           0           R2
8           1           
9           0           
10          1           
11          0           R3
12          1           
13          0           
14          1           
15          0           
16          1           
17          0           
18          0           R1

(13 row(s) affected)

Performance considerations will depend on on how much data you actually have.

性能考虑因素取决于您实际拥有的数据量。

#2


2  

;WITH DATA(Month, Success) AS
(
    SELECT  1, 0 UNION ALL
    SELECT  2, 0 UNION ALL
    SELECT  3, 0 UNION ALL
    SELECT  4, 1 UNION ALL
    SELECT  5, 1 UNION ALL
    SELECT  6, 0 UNION ALL
    SELECT  7, 0 UNION ALL
    SELECT  8, 1 UNION ALL
    SELECT  9, 0 UNION ALL
    SELECT 10, 1 UNION ALL
    SELECT 11, 0 UNION ALL
    SELECT 12, 1 UNION ALL
    SELECT 13, 0 UNION ALL
    SELECT 14, 1 UNION ALL
    SELECT 15, 0 UNION ALL
    SELECT 16, 1 UNION ALL
    SELECT 17, 0 UNION ALL
    SELECT 18, 0
)

SELECT DATA.Month,DATA.Success,Isnull(convert(Varchar(10),b.result),'') +         
Isnull(CONVERT(varchar(10),b.num),'') RepeatedFailure
FROM (
SELECT *, ROW_NUMBER() over (order by Month) num FROM 
( Select * ,(case  when (select sum(Success) 
from DATA where MONTH>(o.MONTH-6) and MONTH<=(o.MONTH)  ) <= 2 
and o.MONTH>=6 then 'R' else  '' end) result
from DATA o
) a where result='R'
) b 
right join DATA on DATA.Month = b.Month
order by DATA.Month

#1


2  

I'm sure this can be improved, but it works. We essentially do two passes - the first to establish repeated failures, the second to establish what kind of repeated failure each is. Note that Intermediate2 can definitely be done away with, I've only separated it out for clarity. All the code is one statement, my explanation is interleaved:

我确信这可以改进,但它确实有效。我们基本上做两次通过 - 第一次建立重复失败,第二次确定每次重复失败。请注意,Intermediate2绝对可以废除,为了清楚起见,我只是将它分开了。所有代码都是一个语句,我的解释是交错的:

;WITH DATA(Month, Success) AS
-- assuming your data  as defined (with my edit)
,Intermediate AS 
(
SELECT
    Month,
    Success,
    -- next column for illustration only
    (SELECT SUM(Success) 
     FROM DATA hist 
     WHERE curr.Month - hist.Month BETWEEN 0 AND 5) 
        AS SuccessesInLastSixMonths,
    -- next column for illustration only
    6 - (SELECT SUM(Success) 
     FROM DATA hist 
     WHERE curr.Month - hist.Month BETWEEN 0 AND 5) 
        AS FailuresInLastSixMonths,
    CASE WHEN 
            (6 - (SELECT SUM(Success) 
                    FROM DATA hist 
                    WHERE curr.Month - hist.Month BETWEEN 0 AND 5)) 
            >= 4 
            THEN 1
            ELSE 0 
    END AS IsRepeatedFailure
FROM DATA curr 
-- No real data until month 6
WHERE curr.Month > 5
)

At this point we have established, for each month, whether it's a repeated failure, by counting the failures in the six months up to and including it.

在这一点上,我们已经确定了每个月是否是一次重复的失败,通过计算六个月内的失败直至并包括它。

,Intermediate2 AS
(
SELECT 
    Month,
    Success,
    IsRepeatedFailure,
    (SELECT SUM(IsRepeatedFailure) 
        FROM Intermediate hist 
        WHERE curr.Month - hist.Month BETWEEN 0 AND 5) 
        AS RepeatedFailuresInLastSixMonths
FROM Intermediate curr
)

Now we have counted the number of repeated failures in the six months leading up to now

现在我们已经计算了到目前为止六个月内重复失败的次数

SELECT
    Month,
    Success,
    CASE IsRepeatedFailure 
        WHEN 1 THEN 'R' + CONVERT(varchar, RepeatedFailuresInLastSixMonths) 
        ELSE '' END
    AS RepeatedFailureText
FROM Intermediate2

so we can say, if this month is a repeated failure, what cardinality of repeated failure it is.

所以我们可以说,如果这个月是一次又一次的失败,那么重复失败的基数是多少。

Result:

Month       Success     RepeatedFailureText
----------- ----------- -------------------------------
6           0           R1
7           0           R2
8           1           
9           0           
10          1           
11          0           R3
12          1           
13          0           
14          1           
15          0           
16          1           
17          0           
18          0           R1

(13 row(s) affected)

Performance considerations will depend on on how much data you actually have.

性能考虑因素取决于您实际拥有的数据量。

#2


2  

;WITH DATA(Month, Success) AS
(
    SELECT  1, 0 UNION ALL
    SELECT  2, 0 UNION ALL
    SELECT  3, 0 UNION ALL
    SELECT  4, 1 UNION ALL
    SELECT  5, 1 UNION ALL
    SELECT  6, 0 UNION ALL
    SELECT  7, 0 UNION ALL
    SELECT  8, 1 UNION ALL
    SELECT  9, 0 UNION ALL
    SELECT 10, 1 UNION ALL
    SELECT 11, 0 UNION ALL
    SELECT 12, 1 UNION ALL
    SELECT 13, 0 UNION ALL
    SELECT 14, 1 UNION ALL
    SELECT 15, 0 UNION ALL
    SELECT 16, 1 UNION ALL
    SELECT 17, 0 UNION ALL
    SELECT 18, 0
)

SELECT DATA.Month,DATA.Success,Isnull(convert(Varchar(10),b.result),'') +         
Isnull(CONVERT(varchar(10),b.num),'') RepeatedFailure
FROM (
SELECT *, ROW_NUMBER() over (order by Month) num FROM 
( Select * ,(case  when (select sum(Success) 
from DATA where MONTH>(o.MONTH-6) and MONTH<=(o.MONTH)  ) <= 2 
and o.MONTH>=6 then 'R' else  '' end) result
from DATA o
) a where result='R'
) b 
right join DATA on DATA.Month = b.Month
order by DATA.Month