I need to turn this
我需要转过来
id | amount | day
---------------------
A | 10 | 0
A | 54 | 8
A | 23 | 18
A | 43 | 28
A | 87 | 51
B | 34 | 0
B | 76 | 1
B | 12 | 7
into this
进入这个
id | a1 | a2 | a3 | a4 | a5 | d1 | d2 | d3 | d4 | d5
--------------------------------------------------------
A | 10 | 54 | 23 | 43 | 87 | 0 | 8 | 18 | 28 | 51
B | 34 | 76 | 12 | 0 | 0 | 0 | 1 | 7 | 0 | 0
ie. transpose/cast the rows of the df to an unknown number of columns by id, putting zeros where there would be empty values because of the inequal length.
即。通过id将df的行转置/转换为未知数量的列,将零置于由于长度不等而存在空值的位置。
I've experimented with
我试过了
df <- data.frame(id=c('A','A','A','A','A','B','B','B'), amount=c(10,54,23,43,87,34,76,12), day=c(0,8,18,28,51,0,1,7))
library(reshape2)
x <- dcast(df, id ~ day, mean, value = 'amount')
but it's not quite right. How do I do it?
但这不太对劲。我该怎么做?
2 个解决方案
#1
4
Use base R reshape()
after introducing a "time" variable:
在引入“time”变量后使用base R reshape():
df$time <- ave(as.numeric(as.character(df$id)), df$id, FUN = seq_along)
df
# id amount day time
# 1 A 10 0 1
# 2 A 54 8 2
# 3 A 23 18 3
# 4 A 43 28 4
# 5 A 87 51 5
# 6 B 34 0 1
# 7 B 76 1 2
# 8 B 12 7 3
reshape(df, direction = "wide", idvar="id", timevar="time")
# id amount.1 day.1 amount.2 day.2 amount.3 day.3 amount.4 day.4 amount.5 day.5
# 1 A 10 0 54 8 23 18 43 28 87 51
# 6 B 34 0 76 1 12 7 NA NA NA NA
Optional steps:
可选步骤:
-
Reorganize column order:
重新组织列顺序:
df2 <- df2[c("id", grep("amount", names(df2), value=TRUE), grep("day", names(df2), value = TRUE))]
-
Replace
NA
with0
:将NA替换为0:
df2[is.na(df2)] <- 0 df2 # id amount.1 amount.2 amount.3 amount.4 amount.5 day.1 day.2 day.3 day.4 day.5 # 1 A 10 54 23 43 87 0 8 18 28 51 # 6 B 34 76 12 0 0 0 1 7 0 0
#2
3
I have to create a new variable dd
我必须创建一个新的变量dd
df$dd <-unlist(by(df$id,df$id, FUN= function(x)seq(1,length(x))))
id amount day dd
1 A 10 0 1
2 A 54 8 2
3 A 23 18 3
4 A 43 28 4
5 A 87 51 5
6 B 34 0 1
7 B 76 1 2
8 B 12 7 3
mm <- melt(df,id.vars=c('id','dd'),measure.vars=c('amount','day'))
dcast(mm,id~variable+dd,fun.aggregate=mean)
id amount_1 amount_2 amount_3 amount_4 amount_5 day_1 day_2 day_3 day_4 day_5
1 A 10 54 23 43 87 0 8 18 28 51
2 B 34 76 12 NaN NaN 0 1 7 NaN NaN
EDIT To get a nice ouput with 0 , I replace mean with a custom function
编辑为了得到一个很好的输出0,我用自定义函数替换mean
dcast(mm,id~variable+dd,fun.aggregate=
function(x) ifelse(is.nan(mean(x)),0,mean(x)))
id amount_1 amount_2 amount_3 amount_4 amount_5 day_1 day_2 day_3 day_4 day_5
1 A 10 54 23 43 87 0 8 18 28 51
2 B 34 76 12 0 0 0 1 7 0 0
#1
4
Use base R reshape()
after introducing a "time" variable:
在引入“time”变量后使用base R reshape():
df$time <- ave(as.numeric(as.character(df$id)), df$id, FUN = seq_along)
df
# id amount day time
# 1 A 10 0 1
# 2 A 54 8 2
# 3 A 23 18 3
# 4 A 43 28 4
# 5 A 87 51 5
# 6 B 34 0 1
# 7 B 76 1 2
# 8 B 12 7 3
reshape(df, direction = "wide", idvar="id", timevar="time")
# id amount.1 day.1 amount.2 day.2 amount.3 day.3 amount.4 day.4 amount.5 day.5
# 1 A 10 0 54 8 23 18 43 28 87 51
# 6 B 34 0 76 1 12 7 NA NA NA NA
Optional steps:
可选步骤:
-
Reorganize column order:
重新组织列顺序:
df2 <- df2[c("id", grep("amount", names(df2), value=TRUE), grep("day", names(df2), value = TRUE))]
-
Replace
NA
with0
:将NA替换为0:
df2[is.na(df2)] <- 0 df2 # id amount.1 amount.2 amount.3 amount.4 amount.5 day.1 day.2 day.3 day.4 day.5 # 1 A 10 54 23 43 87 0 8 18 28 51 # 6 B 34 76 12 0 0 0 1 7 0 0
#2
3
I have to create a new variable dd
我必须创建一个新的变量dd
df$dd <-unlist(by(df$id,df$id, FUN= function(x)seq(1,length(x))))
id amount day dd
1 A 10 0 1
2 A 54 8 2
3 A 23 18 3
4 A 43 28 4
5 A 87 51 5
6 B 34 0 1
7 B 76 1 2
8 B 12 7 3
mm <- melt(df,id.vars=c('id','dd'),measure.vars=c('amount','day'))
dcast(mm,id~variable+dd,fun.aggregate=mean)
id amount_1 amount_2 amount_3 amount_4 amount_5 day_1 day_2 day_3 day_4 day_5
1 A 10 54 23 43 87 0 8 18 28 51
2 B 34 76 12 NaN NaN 0 1 7 NaN NaN
EDIT To get a nice ouput with 0 , I replace mean with a custom function
编辑为了得到一个很好的输出0,我用自定义函数替换mean
dcast(mm,id~variable+dd,fun.aggregate=
function(x) ifelse(is.nan(mean(x)),0,mean(x)))
id amount_1 amount_2 amount_3 amount_4 amount_5 day_1 day_2 day_3 day_4 day_5
1 A 10 54 23 43 87 0 8 18 28 51
2 B 34 76 12 0 0 0 1 7 0 0