In Calculating percentiles by factor using ave() in r, I asked how to calculate percentiles within the ave()
function. With that task finished, I'm faced with a more difficult task.
在使用r中的ave()按因子计算百分位数时,我询问如何计算ave()函数中的百分位数。完成这项任务后,我面临着一项更艰巨的任务。
Take the following data:
请参考以下数据:
DistrictName Building Name X2.Yr.AVG Thirty Seventy
Ionia Public Schools Emerson -0.337464323 -0.196387489 -0.046524185
Ionia Public Schools Jefferson -0.318673587 -0.196387489 -0.046524185
Ionia Public Schools Ionia Middle -0.290854669 -0.196387489 -0.046524185
Ionia Public Schools Ionia Middle -0.288202752 -0.196387489 -0.046524185
Ionia Public Schools Twin Rivers El -0.23426755 -0.196387489 -0.046524185
Ionia Public Schools R.B. Boyce El -0.202319963 -0.196387489 -0.046524185
Ionia Public Schools Twin Rivers El -0.142995221 -0.196387489 -0.046524185
Ionia Public Schools Emerson -0.141620372 -0.196387489 -0.046524185
Ionia Public Schools Jefferson -0.141407078 -0.196387489 -0.046524185
Ionia Public Schools R.B. Boyce El -0.115530249 -0.196387489 -0.046524185
Ionia Public Schools Ionia Middle -0.111449269 -0.196387489 -0.046524185
Ionia Public Schools Twin Rivers El -0.054918339 -0.196387489 -0.046524185
Ionia Public Schools Jefferson -0.045591501 -0.196387489 -0.046524185
Ionia Public Schools A.A. Rather 0.002251298 -0.196387489 -0.046524185
Ionia Public Schools R.B. Boyce El 0.020669633 -0.196387489 -0.046524185
Ionia Public Schools Emerson 0.065064968 -0.196387489 -0.046524185
Ionia Public Schools A.A. Rather 0.182776319 -0.196387489 -0.046524185
What I'm trying to do is something akin to what the AVERAGEIF
function in Excel. In Excel, I can say =AVERAGEIF(C2:C18, "<-.196387489")
, which spits out the average value -0.278630474. I need something in R that allows me to do the following: I want to create new variables for the average value of: 1) any values of X2.Yr.AVG
that are smaller than the value of Thirty
2) any values that are larger than the value of Seventy
我想要做的是类似于Excel中的AVERAGEIF功能。在Excel中,我可以说= AVERAGEIF(C2:C18,“< - 。196387489”),它吐出平均值-0.278630474。我需要在R中允许我执行以下操作:我想为平均值创建新变量:1)X2.Yr.AVG的任何值小于Thirty 2的值2)任何更大的值比七十的价值
The catch is that I need to be able to perform this operation in a large data frame with 722 levels for the factor DistrictName
. In the step for calculating the percentiles, I used the ave()
function to create percentiles according to the desired factor as follows:
问题是我需要能够在一个大型数据框架中执行此操作,该数据框架具有722级别的因子DistrictName。在计算百分位数的步骤中,我使用ave()函数根据所需因子创建百分位数,如下所示:
MATHgap$Thirty<-ave(MATHgap$X2.Yr.AVG, MATHgap$DistrictName,
FUN= function(x) quantile(x, 0.3))
and
和
MATHgap$Seventy<-ave(MATHgap$X2.Yr.AVG, MATHgap$DistrictName,
FUN= function(x) quantile(x, 0.7))
Is there any way to do something akin to AVERAGEIF within ave()
so that the operation is repeated for each value of DistrictName
independently of the others? I.e, Ionia Public Schools should have an average value for X2.Yr.AVG
less than -0.196387489 and for X2.Yr.AVG
greater than -0.046524185, and I want to be able to perform the same function for all districts using their respective values for X2.Yr.AVG
, Thirty
, and Seventy
.
是否有办法在ave()中执行类似于AVERAGEIF的操作,以便为其他每个AreaName重复操作?即,Ionia公立学校的X2.Yr.AVG平均值应小于-0.196387489,X2.Yr.AVG的平均值应大于-0.046524185,我希望能够使用各自的值为所有地区执行相同的功能对于X2.Yr.AVG,三十和七十。
If this is confusing, apologies.
如果这令人困惑,请道歉。
1 个解决方案
#1
1
Here's a solution using dplyr
:
这是使用dplyr的解决方案:
MATHgap %>%
group_by(DistrictName) %>%
mutate(MeanLT30 = mean(X2.Yr.AVG[X2.Yr.AVG < Thirty]),
MeantGT70 = mean(X2.Yr.AVG[X2.Yr.AVG > Seventy]))
#1
1
Here's a solution using dplyr
:
这是使用dplyr的解决方案:
MATHgap %>%
group_by(DistrictName) %>%
mutate(MeanLT30 = mean(X2.Yr.AVG[X2.Yr.AVG < Thirty]),
MeantGT70 = mean(X2.Yr.AVG[X2.Yr.AVG > Seventy]))