I use below query to find the median for every sector
我使用下面的查询来查找每个扇区的中位数
SELECT DISTINCT Sector,
PERCENTILE_DISC(0.5) WITHIN
GROUP (ORDER BY Value) OVER (PARTITION BY sector) AS Median
FROM TABLE
The table is in below format
该表格式如下
Sector Date Value
A 2014-08-01 1
B 2014-08-01 5
C 2014-08-01 7
A 2014-08-02 6
B 2014-08-02 5
C 2014-08-02 4
A 2014-08-03 3
B 2014-08-03 9
C 2014-08-03 6
A 2014-08-04 5
B 2014-08-04 8
C 2014-08-04 9
A 2014-08-05 5
B 2014-08-05 7
C 2014-08-05 2
So I get the expected result as below
所以我得到了预期的结果如下
Sector Median
A 5
B 7
C 6
Now I need to change the process such that the Medians are calculated while only considering the records upto the given date. So the new result would be
现在我需要更改流程,以便计算中位数,同时仅考虑到给定日期的记录。所以新的结果将是
Sector Date Value
A 2014-08-01 1
B 2014-08-01 5
C 2014-08-01 7 (Only 1 record each was considered for A, B and C)
A 2014-08-02 3.5
B 2014-08-02 5
C 2014-08-02 5.5 (2 records each was considered for A, B and C)
A 2014-08-03 3
B 2014-08-03 5
C 2014-08-03 6 (3 records each was considered for A, B and C)
A 2014-08-04 4
B 2014-08-04 6.5
C 2014-08-04 6.5 (4 records each was considered for A, B and C)
A 2014-08-05 5
B 2014-08-05 7
C 2014-08-05 6 (All 5 records each was considered for A, B and C)
So this will be sort of a cumulative median. Can someone please tell me how to achieve this. My table has about 2.3M records with about 1100 records each for about 1100 dates.
所以这将是一个累积中位数。有人可以告诉我如何实现这一目标。我的表有大约230万条记录,大约有1100条记录,大约1100个日期。
Please let me know if you need any info.
如果您需要任何信息,请告诉我。
2 个解决方案
#1
1
That makes it harder, because the following does not work:
这使得它更难,因为以下不起作用:
SELECT DISTINCT Sector, Date,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Value) OVER (PARTITION BY sector ORDER BY DATE) AS Median
FROM TABLE;
Alas. You can use cross apply
for this purpose:
唉。您可以使用交叉申请来实现此目的:
select t.sector, t.date, t.value, m.median
from table t cross apply
(select top 1 PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY t2.Value) OVER (PARTITION BY sector ORDER BY t2.DATE) AS Median
from table t2
where t2.sector = t.sector and t2.date <= t.date
) m;
#2
2
Another way is to create a triangular JOIN
to get all the past value for every day and use that as the data
另一种方法是创建一个三角形JOIN以获取每天的所有过去值并将其用作数据
;With T AS (
SELECT t2.Sector, t2.[Date], t1.[Value]
FROM Table1 t1
LEFT JOIN Table1 t2 ON t1.Sector = t2.Sector and t1.[Date] <= t2.[Date]
)
SELECT DISTINCT Sector
, [Date]
, PERCENTILE_CONT(0.5)
WITHIN GROUP (ORDER BY [Value])
OVER (PARTITION BY sector, [Date]) AS Median
FROM T
ORDER BY [Date], Sector;
In the query I've changed PERCENTILE_DISC
with PERCENTILE_CONT
to get the right median in case of even number of values, for example the second day.
在查询中,我使用PERCENTILE_CONT更改了PERCENTILE_DISC,以便在偶数个值的情况下获得正确的中位数,例如第二天。
#1
1
That makes it harder, because the following does not work:
这使得它更难,因为以下不起作用:
SELECT DISTINCT Sector, Date,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Value) OVER (PARTITION BY sector ORDER BY DATE) AS Median
FROM TABLE;
Alas. You can use cross apply
for this purpose:
唉。您可以使用交叉申请来实现此目的:
select t.sector, t.date, t.value, m.median
from table t cross apply
(select top 1 PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY t2.Value) OVER (PARTITION BY sector ORDER BY t2.DATE) AS Median
from table t2
where t2.sector = t.sector and t2.date <= t.date
) m;
#2
2
Another way is to create a triangular JOIN
to get all the past value for every day and use that as the data
另一种方法是创建一个三角形JOIN以获取每天的所有过去值并将其用作数据
;With T AS (
SELECT t2.Sector, t2.[Date], t1.[Value]
FROM Table1 t1
LEFT JOIN Table1 t2 ON t1.Sector = t2.Sector and t1.[Date] <= t2.[Date]
)
SELECT DISTINCT Sector
, [Date]
, PERCENTILE_CONT(0.5)
WITHIN GROUP (ORDER BY [Value])
OVER (PARTITION BY sector, [Date]) AS Median
FROM T
ORDER BY [Date], Sector;
In the query I've changed PERCENTILE_DISC
with PERCENTILE_CONT
to get the right median in case of even number of values, for example the second day.
在查询中,我使用PERCENTILE_CONT更改了PERCENTILE_DISC,以便在偶数个值的情况下获得正确的中位数,例如第二天。