What's the best way to calculate percentile rankings (e.g. the 90th percentile or the median score) in MSSQL 2005?
在MSSQL 2005中计算百分位数排名(例如第90百分位数或中位数分数)的最佳方法是什么?
I'd like to be able to select the 25th, median, and 75th percentiles for a single column of scores (preferably in a single record so I can combine with average, max, and min). So for example, table output of the results might be:
我希望能够为单列分数选择第25个,中位数和第75个百分位数(最好是在单个记录中,这样我可以与平均值,最大值和最小值组合)。例如,结果的表输出可能是:
Group MinScore MaxScore AvgScore pct25 median pct75
----- -------- -------- -------- ----- ------ -----
T1 52 96 74 68 76 84
T2 48 98 74 68 75 85
7 个解决方案
#1
14
I would think that this would be the simplest solution:
我认为这将是最简单的解决方案:
SELECT TOP N PERCENT FROM TheTable ORDER BY TheScore DESC
Where N = (100 - desired percentile). So if you wanted all rows in the 90th percentile, you'd select the top 10%.
其中N =(100 - 期望百分位数)。因此,如果您希望所有行都在第90个百分点,那么您将选择前10%。
I'm not sure what you mean by "preferably in a single record". Do you mean calculate which percentile a given score for a single record would fall into? e.g. do you want to be able to make statements like "your score is 83, which puts you in the 91st percentile." ?
我不确定你的意思是“最好是在一张唱片中”。你的意思是计算单个记录的给定分数会落入哪个百分位数?例如你是否希望能够发表诸如“你的分数是83,这使你进入第91百分位”这样的陈述。 ?
EDIT: OK, I thought some more about your question and came up with this interpretation. Are you asking how to calculate the cutoff score for a particular percentile? e.g. something like this: to be in the 90th percentile you must have a score greater than 78.
编辑:好的,我想到了更多关于你的问题,并想出了这个解释。您是否在询问如何计算特定百分位数的截止分数?例如这样的事情:要达到第90个百分点,你必须得分大于78。
If so, this query works. I dislike sub-queries though, so depending on what it was for, I'd probably try to find a more elegant solution. It does, however, return a single record with a single score.
如果是,则此查询有效。我不喜欢子查询,所以根据它的用途,我可能会尝试找到更优雅的解决方案。但是,它会返回单个记录,只有一个分数。
-- Find the minimum score for all scores in the 90th percentile
SELECT Min(subq.TheScore) FROM
(SELECT TOP 10 PERCENT TheScore FROM TheTable
ORDER BY TheScore DESC) AS subq
#2
9
Check out the NTILE command -- it will give you percentiles pretty easily!
查看NTILE命令 - 它会非常容易地为您提供百分位数!
SELECT SalesOrderID,
OrderQty,
RowNum = Row_Number() OVER(Order By OrderQty),
Rnk = RANK() OVER(ORDER BY OrderQty),
DenseRnk = DENSE_RANK() OVER(ORDER BY OrderQty),
NTile4 = NTILE(4) OVER(ORDER BY OrderQty)
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43689, 63181)
#3
2
How about this:
这个怎么样:
SELECT
Group,
75_percentile = MAX(case when NTILE(4) OVER(ORDER BY score ASC) = 3 then score else 0 end),
90_percentile = MAX(case when NTILE(10) OVER(ORDER BY score ASC) = 9 then score else 0 end)
FROM TheScore
GROUP BY Group
#4
1
I've been working on this a little more, and here's what I've come up with so far:
我一直在研究这个问题,这是我到目前为止所提出的:
CREATE PROCEDURE [dbo].[TestGetPercentile]
@percentile as float,
@resultval as float output
AS
BEGIN
WITH scores(score, prev_rank, curr_rank, next_rank) AS (
SELECT dblScore,
(ROW_NUMBER() OVER ( ORDER BY dblScore ) - 1.0) / ((SELECT COUNT(*) FROM TestScores) + 1) [prev_rank],
(ROW_NUMBER() OVER ( ORDER BY dblScore ) + 0.0) / ((SELECT COUNT(*) FROM TestScores) + 1) [curr_rank],
(ROW_NUMBER() OVER ( ORDER BY dblScore ) + 1.0) / ((SELECT COUNT(*) FROM TestScores) + 1) [next_rank]
FROM TestScores
)
SELECT @resultval = (
SELECT TOP 1
CASE WHEN t1.score = t2.score
THEN t1.score
ELSE
t1.score + (t2.score - t1.score) * ((@percentile - t1.curr_rank) / (t2.curr_rank - t1.curr_rank))
END
FROM scores t1, scores t2
WHERE (t1.curr_rank = @percentile OR (t1.curr_rank < @percentile AND t1.next_rank > @percentile))
AND (t2.curr_rank = @percentile OR (t2.curr_rank > @percentile AND t2.prev_rank < @percentile))
)
END
Then in another stored procedure I do this:
然后在另一个存储过程中我这样做:
DECLARE @pct25 float;
DECLARE @pct50 float;
DECLARE @pct75 float;
exec SurveyGetPercentile .25, @pct25 output
exec SurveyGetPercentile .50, @pct50 output
exec SurveyGetPercentile .75, @pct75 output
Select
min(dblScore) as minScore,
max(dblScore) as maxScore,
avg(dblScore) as avgScore,
@pct25 as percentile25,
@pct50 as percentile50,
@pct75 as percentile75
From TestScores
It still doesn't do quite what I'm looking for. This will get the stats for all tests; whereas I would like to be able to select from a TestScores table that has multiple different tests in it and get back the same stats for each different test (like I have in my example table in my question).
它仍然没有做我想要的。这将获得所有测试的统计数据;虽然我希望能够从TestScores表中选择其中包含多个不同的测试,并为每个不同的测试获取相同的统计数据(就像我在我的问题中的示例表中一样)。
#5
1
The 50th percentile is same as the median. When computing other percentile, say the 80th, sort the data for the 80 percent of data in ascending order and the other 20 percent in descending order, and take the avg of the two middle value.
第50百分位数与中位数相同。计算其他百分位数时,比如80,按升序对80%数据的数据进行排序,按降序对另外20%的数据进行排序,并取两个中间值的平均值。
NB: The median query has been around for a long time, but cannot remember where exactly I got it from, I have only amended it to compute other percentiles.
注意:中位数查询已存在很长时间了,但不记得我从哪里得到它,我只修改它来计算其他百分位数。
DECLARE @Temp TABLE(Id INT IDENTITY(1,1), DATA DECIMAL(10,5))
INSERT INTO @Temp VALUES(0)
INSERT INTO @Temp VALUES(2)
INSERT INTO @Temp VALUES(8)
INSERT INTO @Temp VALUES(4)
INSERT INTO @Temp VALUES(3)
INSERT INTO @Temp VALUES(6)
INSERT INTO @Temp VALUES(6)
INSERT INTO @Temp VALUES(6)
INSERT INTO @Temp VALUES(7)
INSERT INTO @Temp VALUES(0)
INSERT INTO @Temp VALUES(1)
INSERT INTO @Temp VALUES(NULL)
--50th percentile or median
SELECT ((
SELECT TOP 1 DATA
FROM (
SELECT TOP 50 PERCENT DATA
FROM @Temp
WHERE DATA IS NOT NULL
ORDER BY DATA
) AS A
ORDER BY DATA DESC) +
(
SELECT TOP 1 DATA
FROM (
SELECT TOP 50 PERCENT DATA
FROM @Temp
WHERE DATA IS NOT NULL
ORDER BY DATA DESC
) AS A
ORDER BY DATA ASC)) / 2.0
--90th percentile
SELECT ((
SELECT TOP 1 DATA
FROM (
SELECT TOP 90 PERCENT DATA
FROM @Temp
WHERE DATA IS NOT NULL
ORDER BY DATA
) AS A
ORDER BY DATA DESC) +
(
SELECT TOP 1 DATA
FROM (
SELECT TOP 10 PERCENT DATA
FROM @Temp
WHERE DATA IS NOT NULL
ORDER BY DATA DESC
) AS A
ORDER BY DATA ASC)) / 2.0
--75th percentile
SELECT ((
SELECT TOP 1 DATA
FROM (
SELECT TOP 75 PERCENT DATA
FROM @Temp
WHERE DATA IS NOT NULL
ORDER BY DATA
) AS A
ORDER BY DATA DESC) +
(
SELECT TOP 1 DATA
FROM (
SELECT TOP 25 PERCENT DATA
FROM @Temp
WHERE DATA IS NOT NULL
ORDER BY DATA DESC
) AS A
ORDER BY DATA ASC)) / 2.0
#6
0
i'd do something like:
我会做类似的事情:
select @n = count(*) from tbl1
select @median = @n / 2
select @p75 = @n * 3 / 4
select @p90 = @n * 9 / 10
select top 1 score from (select top @median score from tbl1 order by score asc) order by score desc
is this right?
这是正确的吗?
#7
0
i'd probably use a the sql server 2005
我可能会使用sql server 2005
row_number() over (order by score ) / (select count(*) from scores)
row_number()结束(按分数排序)/(从分数中选择计数(*))
or something along those lines.
或类似的规定。
#1
14
I would think that this would be the simplest solution:
我认为这将是最简单的解决方案:
SELECT TOP N PERCENT FROM TheTable ORDER BY TheScore DESC
Where N = (100 - desired percentile). So if you wanted all rows in the 90th percentile, you'd select the top 10%.
其中N =(100 - 期望百分位数)。因此,如果您希望所有行都在第90个百分点,那么您将选择前10%。
I'm not sure what you mean by "preferably in a single record". Do you mean calculate which percentile a given score for a single record would fall into? e.g. do you want to be able to make statements like "your score is 83, which puts you in the 91st percentile." ?
我不确定你的意思是“最好是在一张唱片中”。你的意思是计算单个记录的给定分数会落入哪个百分位数?例如你是否希望能够发表诸如“你的分数是83,这使你进入第91百分位”这样的陈述。 ?
EDIT: OK, I thought some more about your question and came up with this interpretation. Are you asking how to calculate the cutoff score for a particular percentile? e.g. something like this: to be in the 90th percentile you must have a score greater than 78.
编辑:好的,我想到了更多关于你的问题,并想出了这个解释。您是否在询问如何计算特定百分位数的截止分数?例如这样的事情:要达到第90个百分点,你必须得分大于78。
If so, this query works. I dislike sub-queries though, so depending on what it was for, I'd probably try to find a more elegant solution. It does, however, return a single record with a single score.
如果是,则此查询有效。我不喜欢子查询,所以根据它的用途,我可能会尝试找到更优雅的解决方案。但是,它会返回单个记录,只有一个分数。
-- Find the minimum score for all scores in the 90th percentile
SELECT Min(subq.TheScore) FROM
(SELECT TOP 10 PERCENT TheScore FROM TheTable
ORDER BY TheScore DESC) AS subq
#2
9
Check out the NTILE command -- it will give you percentiles pretty easily!
查看NTILE命令 - 它会非常容易地为您提供百分位数!
SELECT SalesOrderID,
OrderQty,
RowNum = Row_Number() OVER(Order By OrderQty),
Rnk = RANK() OVER(ORDER BY OrderQty),
DenseRnk = DENSE_RANK() OVER(ORDER BY OrderQty),
NTile4 = NTILE(4) OVER(ORDER BY OrderQty)
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43689, 63181)
#3
2
How about this:
这个怎么样:
SELECT
Group,
75_percentile = MAX(case when NTILE(4) OVER(ORDER BY score ASC) = 3 then score else 0 end),
90_percentile = MAX(case when NTILE(10) OVER(ORDER BY score ASC) = 9 then score else 0 end)
FROM TheScore
GROUP BY Group
#4
1
I've been working on this a little more, and here's what I've come up with so far:
我一直在研究这个问题,这是我到目前为止所提出的:
CREATE PROCEDURE [dbo].[TestGetPercentile]
@percentile as float,
@resultval as float output
AS
BEGIN
WITH scores(score, prev_rank, curr_rank, next_rank) AS (
SELECT dblScore,
(ROW_NUMBER() OVER ( ORDER BY dblScore ) - 1.0) / ((SELECT COUNT(*) FROM TestScores) + 1) [prev_rank],
(ROW_NUMBER() OVER ( ORDER BY dblScore ) + 0.0) / ((SELECT COUNT(*) FROM TestScores) + 1) [curr_rank],
(ROW_NUMBER() OVER ( ORDER BY dblScore ) + 1.0) / ((SELECT COUNT(*) FROM TestScores) + 1) [next_rank]
FROM TestScores
)
SELECT @resultval = (
SELECT TOP 1
CASE WHEN t1.score = t2.score
THEN t1.score
ELSE
t1.score + (t2.score - t1.score) * ((@percentile - t1.curr_rank) / (t2.curr_rank - t1.curr_rank))
END
FROM scores t1, scores t2
WHERE (t1.curr_rank = @percentile OR (t1.curr_rank < @percentile AND t1.next_rank > @percentile))
AND (t2.curr_rank = @percentile OR (t2.curr_rank > @percentile AND t2.prev_rank < @percentile))
)
END
Then in another stored procedure I do this:
然后在另一个存储过程中我这样做:
DECLARE @pct25 float;
DECLARE @pct50 float;
DECLARE @pct75 float;
exec SurveyGetPercentile .25, @pct25 output
exec SurveyGetPercentile .50, @pct50 output
exec SurveyGetPercentile .75, @pct75 output
Select
min(dblScore) as minScore,
max(dblScore) as maxScore,
avg(dblScore) as avgScore,
@pct25 as percentile25,
@pct50 as percentile50,
@pct75 as percentile75
From TestScores
It still doesn't do quite what I'm looking for. This will get the stats for all tests; whereas I would like to be able to select from a TestScores table that has multiple different tests in it and get back the same stats for each different test (like I have in my example table in my question).
它仍然没有做我想要的。这将获得所有测试的统计数据;虽然我希望能够从TestScores表中选择其中包含多个不同的测试,并为每个不同的测试获取相同的统计数据(就像我在我的问题中的示例表中一样)。
#5
1
The 50th percentile is same as the median. When computing other percentile, say the 80th, sort the data for the 80 percent of data in ascending order and the other 20 percent in descending order, and take the avg of the two middle value.
第50百分位数与中位数相同。计算其他百分位数时,比如80,按升序对80%数据的数据进行排序,按降序对另外20%的数据进行排序,并取两个中间值的平均值。
NB: The median query has been around for a long time, but cannot remember where exactly I got it from, I have only amended it to compute other percentiles.
注意:中位数查询已存在很长时间了,但不记得我从哪里得到它,我只修改它来计算其他百分位数。
DECLARE @Temp TABLE(Id INT IDENTITY(1,1), DATA DECIMAL(10,5))
INSERT INTO @Temp VALUES(0)
INSERT INTO @Temp VALUES(2)
INSERT INTO @Temp VALUES(8)
INSERT INTO @Temp VALUES(4)
INSERT INTO @Temp VALUES(3)
INSERT INTO @Temp VALUES(6)
INSERT INTO @Temp VALUES(6)
INSERT INTO @Temp VALUES(6)
INSERT INTO @Temp VALUES(7)
INSERT INTO @Temp VALUES(0)
INSERT INTO @Temp VALUES(1)
INSERT INTO @Temp VALUES(NULL)
--50th percentile or median
SELECT ((
SELECT TOP 1 DATA
FROM (
SELECT TOP 50 PERCENT DATA
FROM @Temp
WHERE DATA IS NOT NULL
ORDER BY DATA
) AS A
ORDER BY DATA DESC) +
(
SELECT TOP 1 DATA
FROM (
SELECT TOP 50 PERCENT DATA
FROM @Temp
WHERE DATA IS NOT NULL
ORDER BY DATA DESC
) AS A
ORDER BY DATA ASC)) / 2.0
--90th percentile
SELECT ((
SELECT TOP 1 DATA
FROM (
SELECT TOP 90 PERCENT DATA
FROM @Temp
WHERE DATA IS NOT NULL
ORDER BY DATA
) AS A
ORDER BY DATA DESC) +
(
SELECT TOP 1 DATA
FROM (
SELECT TOP 10 PERCENT DATA
FROM @Temp
WHERE DATA IS NOT NULL
ORDER BY DATA DESC
) AS A
ORDER BY DATA ASC)) / 2.0
--75th percentile
SELECT ((
SELECT TOP 1 DATA
FROM (
SELECT TOP 75 PERCENT DATA
FROM @Temp
WHERE DATA IS NOT NULL
ORDER BY DATA
) AS A
ORDER BY DATA DESC) +
(
SELECT TOP 1 DATA
FROM (
SELECT TOP 25 PERCENT DATA
FROM @Temp
WHERE DATA IS NOT NULL
ORDER BY DATA DESC
) AS A
ORDER BY DATA ASC)) / 2.0
#6
0
i'd do something like:
我会做类似的事情:
select @n = count(*) from tbl1
select @median = @n / 2
select @p75 = @n * 3 / 4
select @p90 = @n * 9 / 10
select top 1 score from (select top @median score from tbl1 order by score asc) order by score desc
is this right?
这是正确的吗?
#7
0
i'd probably use a the sql server 2005
我可能会使用sql server 2005
row_number() over (order by score ) / (select count(*) from scores)
row_number()结束(按分数排序)/(从分数中选择计数(*))
or something along those lines.
或类似的规定。