
时间:2022-04-11 09:16:09

I have a dataset that has household ID ('id') and fuel economy of vehicles owned by the household ('mpg'). This is in long form, with only the two columns 'id' and 'mpg'.


I am trying to use either the aggregate() function or ddply() to apply the following function to the data:


logratio <- function(data=x, mpg=mpg)
    if (length(data[mpg])>1) {
        ratio <- log(max(data[mpg])/min(data[mpg]))
    else return(0)

I have tried the following:


mpgdf <- aggregate(mpg~id, FUN=logratio, data=mpgdata)


df <- ddply(mpgdata,~id,logratio)

Neither work.


The key here is that my theoretical wide format would be an 'id' column with one row for each id, and then columns for the mpg of each vehicle up to the maximum number of vehicles (ie if the house with the most vehicles has three vehicles, 'mpg1, 'mpg2', 'mpg3'). And I would like to find the natural log of the ratio of the highest fuel economy to the smallest, returning 0 (log of 1) if there is only one vehicle.


I'm starting to get a bit frustrated as both plyr and reshape seem to want to set columns as the values of the extant 'mpg' column, whereas I would like them as explained above.


I would like this be returned as a dataframe with two columns - 'id' with each of the household IDs a single time set against 'mpglogratio', so that I can then merge that back into a larger dataset I have.

我希望这可以作为一个带有两列的数据框返回 - 'id',每个家庭ID一次设置为'mpglogratio',这样我就可以将它合并回我拥有的更大的数据集中。

And help would be greatly appreciated!




1 个解决方案



With plyr you can try this


logratio <- function(x)

mtcars <- mtcars[,c("cyl", "mpg")]
mtcars <- rbind(mtcars, c(5, 30))

ddply(mtcars, .(cyl), summarise, mpglogratio = logratio(mpg))
##   cyl mpglogratio
## 1   4     0.46002
## 2   5     0.00000
## 3   6     0.18419
## 4   8     0.61310

Just replace cyl by id and mtcars with your actual data to make it work with your data and actually there's no need to test for the length because if your mpg is of length one then max == min thus max/min == 1 so you'll end up with log(1) also known as 0

只需将id和mtcars替换为您的实际数据,以使其与您的数据一起使用,实际上不需要测试长度,因为如果您的mpg长度为1,那么max == min因此max / min == 1所以你最终会得到log(1),也称为0

A final note, if you want to merge it back quickly, use transform instead of summarise like this


ddply(mtcars, .(cyl), transform, mpglogratio = logratio(mpg))



With plyr you can try this


logratio <- function(x)

mtcars <- mtcars[,c("cyl", "mpg")]
mtcars <- rbind(mtcars, c(5, 30))

ddply(mtcars, .(cyl), summarise, mpglogratio = logratio(mpg))
##   cyl mpglogratio
## 1   4     0.46002
## 2   5     0.00000
## 3   6     0.18419
## 4   8     0.61310

Just replace cyl by id and mtcars with your actual data to make it work with your data and actually there's no need to test for the length because if your mpg is of length one then max == min thus max/min == 1 so you'll end up with log(1) also known as 0

只需将id和mtcars替换为您的实际数据,以使其与您的数据一起使用,实际上不需要测试长度,因为如果您的mpg长度为1,那么max == min因此max / min == 1所以你最终会得到log(1),也称为0

A final note, if you want to merge it back quickly, use transform instead of summarise like this


ddply(mtcars, .(cyl), transform, mpglogratio = logratio(mpg))