R使用具有特定值的所有行创建新列[重复]

This question already has an answer here:

这个问题在这里已有答案：

Calculating statistics on subsets of data [duplicate] 3 answers
计算数据子集的统计数据[重复] 3个答案

I want to define a new column of a dataframe as a function of all the other rows that have a particular value in a particular column.

我想将数据框的新列定义为在特定列中具有特定值的所有其他行的函数。

For example:

例如：

mtcars

I want the difference between the mpg of each car and the average mpg of all the cars with the same cyl. It's something like the code below, but obviously the second mtcars$cyl would need to be different!

我想要每辆车的mpg和所有具有相同cyl的汽车的平均mpg之间的差异。它类似于下面的代码，但显然第二个mtcars $ cyl需要不同！

mtcars$dif_mpg = mtcars$mpg - mean(mtcars[mtcars$cyl == mtcars$cyl, ]$mpg)

2 个解决方案

#1

Something like this should do the job (in base R):

像这样的东西应该做的工作（在基地R）：

transform(mtcars, dif_mpg=mpg-ave(mpg, cyl, FUN=mean))

ave computes FUN on subgroups of mpg defined by cyl. transform allows you to add/modify columns to a data frame, and also evaluates expressions in the context of the data frame (so you don't have to type out mtcars$mpg, etc.). Here are the first 6 rows of the result:

ave计算由cyl定义的mpg子组的FUN。 transform允许您向数据框添加/修改列，还可以在数据框的上下文中计算表达式（因此您不必键入mtcars $ mpg等）。以下是结果的前6行：

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb     dif_mpg
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  1.25714286
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  1.25714286
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1 -3.86363636
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1  1.65714286
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2  3.60000000
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1 -1.64285714

Other alternatives include dplyr package (as shown by David Robinson), data.table:

其他替代方案包括dplyr包（如David Robinson所示），data.table：

library(data.table)
(data.table(mtcars, keep.rownames=T)[, dif_mpg:=mpg - mean(mpg), by=cyl])

And plyr (though you should use dplyr over plyr, as it is much faster):

和plyr（虽然你应该使用dplyr而不是plyr，因为它更快）：

library(plyr)
ddply(mtcars, "cyl", transform, dif_mpg=mpg-mean(mpg))

#2

This kind of grouping operation is well handled by the dplyr package (which you would need to install first). In this case the solution would be:

dplyr包（您需要先安装）才能很好地处理这种分组操作。在这种情况下，解决方案将是：

library(dplyr)
mtcars <- mtcars %>% group_by(cyl) %>% mutate(dif_mpg=mpg - mean(mpg))

#1