i'm trying to calculate the 95th percentile for multiple water quality values grouped by watershed. for example...
我试着计算出由分水岭组成的多个水质值的第95百分位数。例如……
Watershed WQ
50500101 62.370661
50500101 65.505046
50500101 58.741477
50500105 71.220034
50500105 57.917249
i reviewed this question posted - Percentile for Each Observation w/r/t Grouping Variable. it seems very close to what i want to do but it's for EACH observation. i need it for each grouping variable. so ideally,
我复习了每个w/r/t分组变量的百分位数。看起来和我想做的很接近但是对于每一个观察。每个分组变量都需要它。所以,理想情况下,
Watershed WQ - 95th
50500101 x
50500105 y
thanks
谢谢
6 个解决方案
#1
7
This can be achieved using the plyr
library. We specify the grouping variable Watershed
and ask for the 95% quantile of WQ.
这可以使用plyr库实现。我们指定分组变量分水岭并要求WQ的95%分位数。
library(plyr)
#Random seed
set.seed(42)
#Sample data
dat <- data.frame(Watershed = sample(letters[1:2], 100, TRUE), WQ = rnorm(100))
#plyr call
ddply(dat, "Watershed", summarise, WQ95 = quantile(WQ, .95))
and the results
结果
Watershed WQ95
1 a 1.353993
2 b 1.461711
#2
4
I hope I understand your question correctly. Is this what you're looking for?
我希望我没弄错你的问题。这就是你要找的吗?
my.df <- data.frame(group = gl(3, 5), var = runif(15))
aggregate(my.df$var, by = list(my.df$group), FUN = function(x) quantile(x, probs = 0.95))
Group.1 x
1 1 0.6913747
2 2 0.8067847
3 3 0.9643744
EDIT
编辑
Based on Vincent's answer,
基于文森特的回答,
aggregate(my.df$var, by = list(my.df$group), FUN = quantile, probs = 0.95)
also works (you can skin a cat 1001 ways - I've been told). A side note, you can specify a vector of desired -iles, say c(0.1, 0.2, 0.3...)
for deciles. Or you can try function summary
for some predefined statistics.
同样有效(你可以用1001种方法来剥猫皮——我听说过)。注意,你可以指定一个期望的-iles的向量,比如c(0.1, 0.2, 0.3…)表示十分位数。或者您可以尝试函数摘要来获取一些预定义的统计信息。
aggregate(my.df$var, by = list(my.df$group), FUN = summary)
#3
4
Use a combination of the tapply and quantile functions. For example, if your dataset looks like this:
使用tapply和分位数函数的组合。例如,如果数据集是这样的:
DF <- data.frame('watershed'=sample(c('a','b','c','d'), 1000, replace=T), wq=rnorm(1000))
Use this:
用这个:
with(DF, tapply(wq, watershed, quantile, probs=0.95))
#4
3
In Excel, you're going to want to use an array formula to make this easy. I suggest the following:
在Excel中,您需要使用数组公式来简化这一过程。我建议以下几点:
{=PERCENTILE(IF($A2:$A6 = Watershed ID, $B$2:$B$6), 0.95)}
Column A would be the Watershed ids, and Column B would be the WQ values.
A列是分水岭id, B列是WQ值。
Also, be sure to enter the formula as an array formula. Do so by pressing Ctrl+Shift+Enter when entering the formula.
同样,要确保将公式作为数组公式输入。在输入公式时按Ctrl+Shift+Enter键。
#5
0
Using the data.table-package you can do:
使用数据。table-package你能做什么:
set.seed(42)
#Sample data
dt <- data.table(Watershed = sample(letters[1:2], 100, TRUE), WQ = rnorm(100))
dt[ ,
j = .(WQ95 = quantile(WQ, .95, na.rm = TRUE),
by = Watershed]
#6
-1
Based on Chase's answer, here is a solution using the dplyr
package. Of course a matter of preference as far as the solution and I like the relative clarity (for me) of the "piping" (%>%
) method used in dplyr
:
根据Chase的回答,这里有一个使用dplyr包的解决方案。当然,对于解决方案,我比较喜欢dplyr中使用的“管道”方法(%>%)的相对清晰度:
library(dplyr)
#Random seed
set.seed(42)
#Sample data
dat <- data.frame(Watershed = sample(letters[1:2], 100, TRUE), WQ = rnorm(100))
#dplyr call
dat %>% group_by(Watershed) %>% summarise(WQ95 = quantile(slc, 0.95))
#1
7
This can be achieved using the plyr
library. We specify the grouping variable Watershed
and ask for the 95% quantile of WQ.
这可以使用plyr库实现。我们指定分组变量分水岭并要求WQ的95%分位数。
library(plyr)
#Random seed
set.seed(42)
#Sample data
dat <- data.frame(Watershed = sample(letters[1:2], 100, TRUE), WQ = rnorm(100))
#plyr call
ddply(dat, "Watershed", summarise, WQ95 = quantile(WQ, .95))
and the results
结果
Watershed WQ95
1 a 1.353993
2 b 1.461711
#2
4
I hope I understand your question correctly. Is this what you're looking for?
我希望我没弄错你的问题。这就是你要找的吗?
my.df <- data.frame(group = gl(3, 5), var = runif(15))
aggregate(my.df$var, by = list(my.df$group), FUN = function(x) quantile(x, probs = 0.95))
Group.1 x
1 1 0.6913747
2 2 0.8067847
3 3 0.9643744
EDIT
编辑
Based on Vincent's answer,
基于文森特的回答,
aggregate(my.df$var, by = list(my.df$group), FUN = quantile, probs = 0.95)
also works (you can skin a cat 1001 ways - I've been told). A side note, you can specify a vector of desired -iles, say c(0.1, 0.2, 0.3...)
for deciles. Or you can try function summary
for some predefined statistics.
同样有效(你可以用1001种方法来剥猫皮——我听说过)。注意,你可以指定一个期望的-iles的向量,比如c(0.1, 0.2, 0.3…)表示十分位数。或者您可以尝试函数摘要来获取一些预定义的统计信息。
aggregate(my.df$var, by = list(my.df$group), FUN = summary)
#3
4
Use a combination of the tapply and quantile functions. For example, if your dataset looks like this:
使用tapply和分位数函数的组合。例如,如果数据集是这样的:
DF <- data.frame('watershed'=sample(c('a','b','c','d'), 1000, replace=T), wq=rnorm(1000))
Use this:
用这个:
with(DF, tapply(wq, watershed, quantile, probs=0.95))
#4
3
In Excel, you're going to want to use an array formula to make this easy. I suggest the following:
在Excel中,您需要使用数组公式来简化这一过程。我建议以下几点:
{=PERCENTILE(IF($A2:$A6 = Watershed ID, $B$2:$B$6), 0.95)}
Column A would be the Watershed ids, and Column B would be the WQ values.
A列是分水岭id, B列是WQ值。
Also, be sure to enter the formula as an array formula. Do so by pressing Ctrl+Shift+Enter when entering the formula.
同样,要确保将公式作为数组公式输入。在输入公式时按Ctrl+Shift+Enter键。
#5
0
Using the data.table-package you can do:
使用数据。table-package你能做什么:
set.seed(42)
#Sample data
dt <- data.table(Watershed = sample(letters[1:2], 100, TRUE), WQ = rnorm(100))
dt[ ,
j = .(WQ95 = quantile(WQ, .95, na.rm = TRUE),
by = Watershed]
#6
-1
Based on Chase's answer, here is a solution using the dplyr
package. Of course a matter of preference as far as the solution and I like the relative clarity (for me) of the "piping" (%>%
) method used in dplyr
:
根据Chase的回答,这里有一个使用dplyr包的解决方案。当然,对于解决方案,我比较喜欢dplyr中使用的“管道”方法(%>%)的相对清晰度:
library(dplyr)
#Random seed
set.seed(42)
#Sample data
dat <- data.frame(Watershed = sample(letters[1:2], 100, TRUE), WQ = rnorm(100))
#dplyr call
dat %>% group_by(Watershed) %>% summarise(WQ95 = quantile(slc, 0.95))