如何将两个变量转置/转换为一行?

时间:2021-10-13 14:19:51

I need to turn this

我需要转过来

id  |  amount |  day
---------------------
 A  |   10    |  0 
 A  |   54    |  8
 A  |   23    |  18
 A  |   43    |  28
 A  |   87    |  51
 B  |   34    |  0
 B  |   76    |  1 
 B  |   12    |  7

into this

进入这个

id | a1 | a2 | a3 | a4 | a5 | d1 | d2 | d3 | d4 | d5 
--------------------------------------------------------
 A | 10 | 54 | 23 | 43 | 87 | 0  |  8 | 18 | 28 | 51
 B | 34 | 76 | 12 | 0  |  0 | 0  |  1 |  7 | 0  | 0

ie. transpose/cast the rows of the df to an unknown number of columns by id, putting zeros where there would be empty values because of the inequal length.

即。通过id将df的行转置/转换为未知数量的列,将零置于由于长度不等而存在空值的位置。

I've experimented with

我试过了

df <- data.frame(id=c('A','A','A','A','A','B','B','B'),    amount=c(10,54,23,43,87,34,76,12), day=c(0,8,18,28,51,0,1,7))
library(reshape2)
x <- dcast(df, id ~ day, mean, value = 'amount')

but it's not quite right. How do I do it?

但这不太对劲。我该怎么做?

2 个解决方案

#1


4  

Use base R reshape() after introducing a "time" variable:

在引入“time”变量后使用base R reshape():

df$time <- ave(as.numeric(as.character(df$id)), df$id, FUN = seq_along)
df
#   id amount day time
# 1  A     10   0    1
# 2  A     54   8    2
# 3  A     23  18    3
# 4  A     43  28    4
# 5  A     87  51    5
# 6  B     34   0    1
# 7  B     76   1    2
# 8  B     12   7    3
reshape(df, direction = "wide", idvar="id", timevar="time")
#   id amount.1 day.1 amount.2 day.2 amount.3 day.3 amount.4 day.4 amount.5 day.5
# 1  A       10     0       54     8       23    18       43    28       87    51
# 6  B       34     0       76     1       12     7       NA    NA       NA    NA

Optional steps:

可选步骤:

  1. Reorganize column order:

    重新组织列顺序:

    df2 <- df2[c("id", 
                 grep("amount", names(df2), value=TRUE), 
                 grep("day", names(df2), value = TRUE))]
    
  2. Replace NA with 0:

    将NA替换为0:

    df2[is.na(df2)] <- 0
    df2
    #   id amount.1 amount.2 amount.3 amount.4 amount.5 day.1 day.2 day.3 day.4 day.5
    # 1  A       10       54       23       43       87     0     8    18    28    51
    # 6  B       34       76       12        0        0     0     1     7     0     0
    

#2


3  

I have to create a new variable dd

我必须创建一个新的变量dd

df$dd <-unlist(by(df$id,df$id, FUN= function(x)seq(1,length(x))))


id amount day dd
1  A     10   0  1
2  A     54   8  2
3  A     23  18  3
4  A     43  28  4
5  A     87  51  5
6  B     34   0  1
7  B     76   1  2
8  B     12   7  3

mm <- melt(df,id.vars=c('id','dd'),measure.vars=c('amount','day'))
dcast(mm,id~variable+dd,fun.aggregate=mean)
id amount_1 amount_2 amount_3 amount_4 amount_5 day_1 day_2 day_3 day_4 day_5
1  A       10       54       23       43       87     0     8    18    28    51
2  B       34       76       12      NaN      NaN     0     1     7   NaN   NaN

EDIT To get a nice ouput with 0 , I replace mean with a custom function

编辑为了得到一个很好的输出0,我用自定义函数替换mean

dcast(mm,id~variable+dd,fun.aggregate=
                       function(x) ifelse(is.nan(mean(x)),0,mean(x)))
  id amount_1 amount_2 amount_3 amount_4 amount_5 day_1 day_2 day_3 day_4 day_5
1  A       10       54       23       43       87     0     8    18    28    51
2  B       34       76       12        0        0     0     1     7     0     0

#1


4  

Use base R reshape() after introducing a "time" variable:

在引入“time”变量后使用base R reshape():

df$time <- ave(as.numeric(as.character(df$id)), df$id, FUN = seq_along)
df
#   id amount day time
# 1  A     10   0    1
# 2  A     54   8    2
# 3  A     23  18    3
# 4  A     43  28    4
# 5  A     87  51    5
# 6  B     34   0    1
# 7  B     76   1    2
# 8  B     12   7    3
reshape(df, direction = "wide", idvar="id", timevar="time")
#   id amount.1 day.1 amount.2 day.2 amount.3 day.3 amount.4 day.4 amount.5 day.5
# 1  A       10     0       54     8       23    18       43    28       87    51
# 6  B       34     0       76     1       12     7       NA    NA       NA    NA

Optional steps:

可选步骤:

  1. Reorganize column order:

    重新组织列顺序:

    df2 <- df2[c("id", 
                 grep("amount", names(df2), value=TRUE), 
                 grep("day", names(df2), value = TRUE))]
    
  2. Replace NA with 0:

    将NA替换为0:

    df2[is.na(df2)] <- 0
    df2
    #   id amount.1 amount.2 amount.3 amount.4 amount.5 day.1 day.2 day.3 day.4 day.5
    # 1  A       10       54       23       43       87     0     8    18    28    51
    # 6  B       34       76       12        0        0     0     1     7     0     0
    

#2


3  

I have to create a new variable dd

我必须创建一个新的变量dd

df$dd <-unlist(by(df$id,df$id, FUN= function(x)seq(1,length(x))))


id amount day dd
1  A     10   0  1
2  A     54   8  2
3  A     23  18  3
4  A     43  28  4
5  A     87  51  5
6  B     34   0  1
7  B     76   1  2
8  B     12   7  3

mm <- melt(df,id.vars=c('id','dd'),measure.vars=c('amount','day'))
dcast(mm,id~variable+dd,fun.aggregate=mean)
id amount_1 amount_2 amount_3 amount_4 amount_5 day_1 day_2 day_3 day_4 day_5
1  A       10       54       23       43       87     0     8    18    28    51
2  B       34       76       12      NaN      NaN     0     1     7   NaN   NaN

EDIT To get a nice ouput with 0 , I replace mean with a custom function

编辑为了得到一个很好的输出0,我用自定义函数替换mean

dcast(mm,id~variable+dd,fun.aggregate=
                       function(x) ifelse(is.nan(mean(x)),0,mean(x)))
  id amount_1 amount_2 amount_3 amount_4 amount_5 day_1 day_2 day_3 day_4 day_5
1  A       10       54       23       43       87     0     8    18    28    51
2  B       34       76       12        0        0     0     1     7     0     0