T-SQL:从列计算第N个百分位值

时间:2021-01-03 08:47:12

I have a column of data, some of which are NULL values, from which I wish to extract the single 90th percentile value:

我有一列数据,其中一些是NULL值,我希望从中提取单个第90百分位值:

ColA
-----
NULL
100
200
300
NULL
400
500
600
700
800
900
1000

For the above, I am looking for a technique which returns the value 900 when searching for the 90th percentile, 800 for the 80th percentile, etc. An analogous function would be AVG(ColA) which returns 550 for the above data, or MIN(ColA) which returns 100, etc.

对于上述情况,我正在寻找一种技术,当搜索第90个百分位时返回值900,将第80个百分位数返回800,等等。类似的函数是AVG(ColA),其返回550以获得上述数据,或MIN(返回100等的ColA)

Any suggestions?

有什么建议么?

2 个解决方案

#1


9  

If you want to get exactly the 90th percentile value, excluding NULLs, I would suggest doing the calculation directly. The following version calculates the row number and number of rows, and selects the appropriate value:

如果你想得到90%的百分位值,不包括NULL,我建议你直接进行计算。以下版本计算行数和行数,并选择适当的值:

select max(case when rownum*1.0/numrows <= 0.9 then colA end) as percentile_90th
from (select colA,
             row_number() over (order by colA) as rownum,
             count(*) over (partition by NULL) as numrows
      from t
      where colA is not null
     ) t

I put the condition in the SELECT clause rather than the WHERE clause, so you can easily get the 50th percentile, 17th, or whatever values you want.

我将条件放在SELECT子句而不是WHERE子句中,因此您可以轻松获得第50个百分位数,第17个或任何您想要的值。

#2


4  

WITH
  percentiles AS
(
  SELECT
    NTILE(100) OVER (ORDER BY ColA) AS percentile,
    *
  FROM
    data
)
SELECT
  *
FROM
  percentiles
WHERE
  percentile = 90


Note: If the data has less than 100 observations, not all percentiles will have a value. Equally, if you have more than 100 observations, some percentiles will contain more values.

注意:如果数据少于100个观察值,则并非所有百分位数都具有值。同样,如果您有超过100个观测值,则某些百分位数将包含更多值。

#1


9  

If you want to get exactly the 90th percentile value, excluding NULLs, I would suggest doing the calculation directly. The following version calculates the row number and number of rows, and selects the appropriate value:

如果你想得到90%的百分位值,不包括NULL,我建议你直接进行计算。以下版本计算行数和行数,并选择适当的值:

select max(case when rownum*1.0/numrows <= 0.9 then colA end) as percentile_90th
from (select colA,
             row_number() over (order by colA) as rownum,
             count(*) over (partition by NULL) as numrows
      from t
      where colA is not null
     ) t

I put the condition in the SELECT clause rather than the WHERE clause, so you can easily get the 50th percentile, 17th, or whatever values you want.

我将条件放在SELECT子句而不是WHERE子句中,因此您可以轻松获得第50个百分位数,第17个或任何您想要的值。

#2


4  

WITH
  percentiles AS
(
  SELECT
    NTILE(100) OVER (ORDER BY ColA) AS percentile,
    *
  FROM
    data
)
SELECT
  *
FROM
  percentiles
WHERE
  percentile = 90


Note: If the data has less than 100 observations, not all percentiles will have a value. Equally, if you have more than 100 observations, some percentiles will contain more values.

注意:如果数据少于100个观察值,则并非所有百分位数都具有值。同样,如果您有超过100个观测值,则某些百分位数将包含更多值。