在sql server中查找到表中每个日期的中位数

时间:2021-10-22 12:54:37

I use below query to find the median for every sector

我使用下面的查询来查找每个扇区的中位数

SELECT DISTINCT Sector,
    PERCENTILE_DISC(0.5) WITHIN
GROUP (ORDER BY Value) OVER (PARTITION BY sector) AS Median
FROM TABLE

The table is in below format

该表格式如下

    Sector  Date    Value
    A   2014-08-01  1
    B   2014-08-01  5
    C   2014-08-01  7
    A   2014-08-02  6
    B   2014-08-02  5
    C   2014-08-02  4
    A   2014-08-03  3
    B   2014-08-03  9
    C   2014-08-03  6
    A   2014-08-04  5
    B   2014-08-04  8
    C   2014-08-04  9
    A   2014-08-05  5
    B   2014-08-05  7
    C   2014-08-05  2   

So I get the expected result as below

所以我得到了预期的结果如下

    Sector  Median
    A   5
    B   7
    C   6

Now I need to change the process such that the Medians are calculated while only considering the records upto the given date. So the new result would be

现在我需要更改流程,以便计算中位数,同时仅考虑到给定日期的记录。所以新的结果将是

    Sector  Date    Value
    A   2014-08-01  1
    B   2014-08-01  5
    C   2014-08-01  7 (Only 1 record each was considered for A, B and C) 

    A   2014-08-02  3.5
    B   2014-08-02  5
    C   2014-08-02  5.5 (2 records each was considered for A, B and C)

    A   2014-08-03  3
    B   2014-08-03  5
    C   2014-08-03  6 (3 records each was considered for A, B and C)

    A   2014-08-04  4
    B   2014-08-04  6.5
    C   2014-08-04  6.5 (4 records each was considered for A, B and C)

    A   2014-08-05  5
    B   2014-08-05  7
    C   2014-08-05  6 (All 5 records each was considered for A, B and C) 

So this will be sort of a cumulative median. Can someone please tell me how to achieve this. My table has about 2.3M records with about 1100 records each for about 1100 dates.

所以这将是一个累积中位数。有人可以告诉我如何实现这一目标。我的表有大约230万条记录,大约有1100条记录,大约1100个日期。

Please let me know if you need any info.

如果您需要任何信息,请告诉我。

2 个解决方案

#1


1  

That makes it harder, because the following does not work:

这使得它更难,因为以下不起作用:

SELECT DISTINCT Sector, Date,
       PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Value) OVER (PARTITION BY sector ORDER BY DATE) AS Median
FROM TABLE;

Alas. You can use cross apply for this purpose:

唉。您可以使用交叉申请来实现此目的:

select t.sector, t.date, t.value, m.median
from table t cross apply
     (select top 1 PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY t2.Value) OVER (PARTITION BY sector ORDER BY t2.DATE) AS Median
      from table t2
      where t2.sector = t.sector and t2.date <= t.date
     ) m;

#2


2  

Another way is to create a triangular JOIN to get all the past value for every day and use that as the data

另一种方法是创建一个三角形JOIN以获取每天的所有过去值并将其用作数据

;With T AS (
  SELECT t2.Sector, t2.[Date], t1.[Value]
  FROM   Table1 t1
         LEFT  JOIN Table1 t2 ON t1.Sector = t2.Sector and t1.[Date] <= t2.[Date]
)
SELECT DISTINCT Sector
     , [Date]
     , PERCENTILE_CONT(0.5) 
         WITHIN GROUP (ORDER BY [Value]) 
         OVER (PARTITION BY sector, [Date]) AS Median 
FROM   T
ORDER BY [Date], Sector;

SQLFiddle demo

In the query I've changed PERCENTILE_DISC with PERCENTILE_CONT to get the right median in case of even number of values, for example the second day.

在查询中,我使用PERCENTILE_CONT更改了PERCENTILE_DISC,以便在偶数个值的情况下获得正确的中位数,例如第二天。

#1


1  

That makes it harder, because the following does not work:

这使得它更难,因为以下不起作用:

SELECT DISTINCT Sector, Date,
       PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Value) OVER (PARTITION BY sector ORDER BY DATE) AS Median
FROM TABLE;

Alas. You can use cross apply for this purpose:

唉。您可以使用交叉申请来实现此目的:

select t.sector, t.date, t.value, m.median
from table t cross apply
     (select top 1 PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY t2.Value) OVER (PARTITION BY sector ORDER BY t2.DATE) AS Median
      from table t2
      where t2.sector = t.sector and t2.date <= t.date
     ) m;

#2


2  

Another way is to create a triangular JOIN to get all the past value for every day and use that as the data

另一种方法是创建一个三角形JOIN以获取每天的所有过去值并将其用作数据

;With T AS (
  SELECT t2.Sector, t2.[Date], t1.[Value]
  FROM   Table1 t1
         LEFT  JOIN Table1 t2 ON t1.Sector = t2.Sector and t1.[Date] <= t2.[Date]
)
SELECT DISTINCT Sector
     , [Date]
     , PERCENTILE_CONT(0.5) 
         WITHIN GROUP (ORDER BY [Value]) 
         OVER (PARTITION BY sector, [Date]) AS Median 
FROM   T
ORDER BY [Date], Sector;

SQLFiddle demo

In the query I've changed PERCENTILE_DISC with PERCENTILE_CONT to get the right median in case of even number of values, for example the second day.

在查询中,我使用PERCENTILE_CONT更改了PERCENTILE_DISC,以便在偶数个值的情况下获得正确的中位数,例如第二天。