从具有多列的长到大数据

时间:2021-09-06 04:27:08

Suggestions for how to smoothly get from foo to foo2 (preferably with tidyr or reshape2 packages)?

关于如何顺利地从foo到foo2(最好是使用tidyr或reshape2包)的建议?

This is kind of like this question, but not exactly I think, because I don't want to auto-number columns, just widen multiple columns. It's also kind of like this question, but again, I don't think I want the columns to vary with a row value as in that answer. Or, a valid answer to this question is to convince me it's exactly like one of the others. The solution in the second question of "two dcasts plus a merge" is the most attractive right now, because it is comprehensible to me.

这有点像这个问题,但不完全是我想的,因为我不想自动编号列,只需加宽多列。它也有点像这个问题,但同样,我不认为我希望列的变化与行值一致,如答案中所示。或者,这个问题的有效答案是让我相信它与其他人完全一样。第二个问题“两个dcasts加合并”的解决方案现在最具吸引力,因为它对我来说是可以理解的。

foo:

foo = data.frame(group=c('a', 'a', 'b', 'b', 'c', 'c'),
                  times=c('before', 'after', 'before', 'after', 'before', 'after'),
                  action_rate=c(0.1,0.15, 0.2, 0.18,0.3, 0.35),
                  num_users=c(100, 100, 200, 200, 300, 300))
foo <- transform(foo,
                 action_rate_c95 = 1.95 * sqrt(action_rate*(1-action_rate)/num_users))

> foo
  group  times action_rate num_users action_rate_c95
1     a before        0.10       100      0.05850000
2     a  after        0.15       100      0.06962893
3     b before        0.20       200      0.05515433
4     b  after        0.18       200      0.05297400
5     c before        0.30       300      0.05159215
6     c  after        0.35       300      0.05369881

foo2:

foo2 <- data.frame(group=c('a', 'b', 'c'),
                   action_rate_before=c(0.1,0.2, 0.3),
                   action_rate_after=c(0.15, 0.18,0.35),
                   action_rate_c95_before=c(0.0585,0.055, 0.05159),
                   action_rate_c95_after=c(0.069, 0.0530,0.0537),
                   num_users=c(100, 200, 300))

> foo2
  group action_rate_before action_rate_after action_rate_c95_before
1     a                0.1              0.15                 0.0585
2     b                0.2              0.18                 0.0550
3     c                0.3              0.35                 0.05159
  action_rate_c95_after num_users
1                 0.0690       100
2                 0.0530       200
3                 0.0537       300

3 个解决方案

#1


5  

Here's another alternative using tidyr:

这是使用tidyr的另一种选择:

library(tidyr)
foo %>%
  gather(key, value, -group, -times, -num_users) %>%
  unite(col, key, times) %>%
  spread(col, value)

Which gives:

#  group num_users action_rate_after action_rate_before action_rate_c95_after
#1     a       100              0.15                0.1            0.06962893
#2     b       200              0.18                0.2            0.05297400
#3     c       300              0.35                0.3            0.05369881
#  action_rate_c95_before
#1             0.05850000
#2             0.05515433
#3             0.05159215

#2


7  

You can use data.table instead of reshape2, because its dcast() function accepts several variables, and is faster too:

您可以使用data.table而不是reshape2,因为它的dcast()函数接受多个变量,并且也更快:

require(data.table)
setDT(foo)
dcast(foo,group+num_users~times,value.var=c("action_rate","action_rate_c95"))

   group num_users action_rate_after action_rate_before action_rate_c95_after action_rate_c95_before
1:     a       100              0.15                0.1            0.06962893             0.05850000
2:     b       200              0.18                0.2            0.05297400             0.05515433
3:     c       300              0.35                0.3            0.05369881             0.05159215

#3


4  

Here is a base R option with reshape

这是一个带有重塑的基本R选项

reshape(foo, idvar=c("group", "num_users"), timevar="times", direction="wide")
#  group num_users action_rate.before action_rate_c95.before action_rate.after
#1     a       100                0.1             0.05850000              0.15
#3     b       200                0.2             0.05515433              0.18
#5     c       300                0.3             0.05159215              0.35
#  action_rate_c95.after
#1            0.06962893
#3            0.05297400
#5            0.05369881

#1


5  

Here's another alternative using tidyr:

这是使用tidyr的另一种选择:

library(tidyr)
foo %>%
  gather(key, value, -group, -times, -num_users) %>%
  unite(col, key, times) %>%
  spread(col, value)

Which gives:

#  group num_users action_rate_after action_rate_before action_rate_c95_after
#1     a       100              0.15                0.1            0.06962893
#2     b       200              0.18                0.2            0.05297400
#3     c       300              0.35                0.3            0.05369881
#  action_rate_c95_before
#1             0.05850000
#2             0.05515433
#3             0.05159215

#2


7  

You can use data.table instead of reshape2, because its dcast() function accepts several variables, and is faster too:

您可以使用data.table而不是reshape2,因为它的dcast()函数接受多个变量,并且也更快:

require(data.table)
setDT(foo)
dcast(foo,group+num_users~times,value.var=c("action_rate","action_rate_c95"))

   group num_users action_rate_after action_rate_before action_rate_c95_after action_rate_c95_before
1:     a       100              0.15                0.1            0.06962893             0.05850000
2:     b       200              0.18                0.2            0.05297400             0.05515433
3:     c       300              0.35                0.3            0.05369881             0.05159215

#3


4  

Here is a base R option with reshape

这是一个带有重塑的基本R选项

reshape(foo, idvar=c("group", "num_users"), timevar="times", direction="wide")
#  group num_users action_rate.before action_rate_c95.before action_rate.after
#1     a       100                0.1             0.05850000              0.15
#3     b       200                0.2             0.05515433              0.18
#5     c       300                0.3             0.05159215              0.35
#  action_rate_c95.after
#1            0.06962893
#3            0.05297400
#5            0.05369881