I want to repeat the rows of a data.frame, each N
times. The result should be a new data.frame
(with nrow(new.df) == nrow(old.df) * N
) keeping the data types of the columns.
我要重复数据a的行,每个N次。结果应该是一个新的data.frame(使用nrow(new.df) == nrow(old.df) * N)保存这些列的数据类型。
Example for N = 2:
N = 2的例子:
A B C
A B C 1 j i 100
1 j i 100 --> 2 j i 100
2 K P 101 3 K P 101
4 K P 101
So, each row is repeated 2 times and characters remain characters, factors remain factors, numerics remain numerics, ...
因此,每一行重复2次,字符保留字符,因子保持因子,数字保持数字,…
My first attempt used apply: apply(old.df, 2, function(co) rep(co, each = N))
, but this one transforms my values to characters and I get:
我的第一次尝试使用了apply(旧的)。df, 2, function(co) rep(co, each = N),但是这个函数将我的值转换为字符,我得到:
A B C
[1,] "j" "i" "100"
[2,] "j" "i" "100"
[3,] "K" "P" "101"
[4,] "K" "P" "101"
9 个解决方案
#1
85
df <- data.frame(a=1:2, b=letters[1:2])
df[rep(seq_len(nrow(df)), each=2),]
#2
6
A clean dplyr
solution, taken from here
一个干净的dplyr解决方案,从这里开始。
library(dplyr)
df <- data_frame(x = 1:2, y = c("a", "b"))
df %>% slice(rep(1:n(), each = 2))
#3
4
If you can repeat the whole thing, or subset it first then repeat that, then this similar question may be helpful. Once again:
如果你可以重复整件事,或者先把它的子集重复一遍,那么这个类似的问题可能会有所帮助。再次:
library(mefa)
rep(mtcars,10)
or simply
或者简单地
mefa:::rep.data.frame(mtcars)
#4
4
The rep.row function seems to sometimes make lists for columns, which leads to bad memory hijinks. I have written the following which seems to work well:
row函数有时会对列进行列表,这会导致糟糕的内存hijinks。我已经写了下面的文章,看起来效果不错:
library(plyr)
rep.row <- function(r, n){
colwise(function(x) rep(x, n))(r)
}
#5
3
Adding to what @dardisco mentioned about mefa::rep.data.frame()
, it's very flexible.
添加到@dardisco提到的mefa::rep.data.frame(),它非常灵活。
You can either repeat each row N times:
你可以重复每一行N次:
rep(df, each=N)
or repeat the entire dataframe N times (think: like when you recycle a vectorized argument)
或者重复整个dataframe N times(想想:当你回收一个矢量化的参数时)
rep(df, times=N)
Two thumbs up for mefa
! I had never heard of it until now and I had to write manual code to do this.
为mefa竖起两个大拇指!直到现在我还没有听说过它,我不得不编写手工代码来完成它。
#6
3
For reference and adding to answers citing mefa, it might worth to take a look on the implementation of mefa::rep.data.frame()
in case you don't want to include the whole package:
为了引用和添加引用mefa的答案,您可以看看mefa的实现::rep.data.frame(),以防您不想包含整个包:
> data <- data.frame(a=letters[1:3], b=letters[4:6])
> data
a b
1 a d
2 b e
3 c f
> as.data.frame(lapply(data, rep, 2))
a b
1 a d
2 b e
3 c f
4 a d
5 b e
6 c f
#7
1
try using for example
尝试使用例如
N=2
rep(1:4, each = N)
as an index
作为一个指标
#8
1
My solution similar as mefa:::rep.data.frame
, but a little faster and cares about row names:
我的解决方案类似于mefa:::rep.data.frame,但是稍微快一点,并且关心行名称:
rep.data.frame <- function(x, times) {
rnames <- attr(x, "row.names")
x <- lapply(x, rep.int, times = times)
class(x) <- "data.frame"
if (!is.numeric(rnames))
attr(x, "row.names") <- make.unique(rep.int(rnames, times))
else
attr(x, "row.names") <- .set_row_names(length(rnames) * times)
x
}
Compare solutions:
比较解决方案:
library(Lahman)
library(microbenchmark)
microbenchmark(
mefa:::rep.data.frame(Batting, 10),
rep.data.frame(Batting, 10),
Batting[rep.int(seq_len(nrow(Batting)), 10), ],
times = 10
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> mefa:::rep.data.frame(Batting, 10) 127.77786 135.3480 198.0240 148.1749 278.1066 356.3210 10 a
#> rep.data.frame(Batting, 10) 79.70335 82.8165 134.0974 87.2587 191.1713 307.4567 10 a
#> Batting[rep.int(seq_len(nrow(Batting)), 10), ] 895.73750 922.7059 981.8891 956.3463 1018.2411 1127.3927 10 b
#9
0
Another way to do this would to first get row indices, append extra copies of the df, and then order by the indices:
这样做的另一种方法是,首先获取行索引,追加df的额外副本,然后按索引顺序排列:
df$index = 1:nrow(df)
df = rbind(df,df)
df = df[order(df$index),][,-ncol(df)]
Although the other solutions may be shorter, this method may be more advantageous in certain situations.
虽然其他的解决方案可能比较短,但在某些情况下这种方法可能更有利。
#1
85
df <- data.frame(a=1:2, b=letters[1:2])
df[rep(seq_len(nrow(df)), each=2),]
#2
6
A clean dplyr
solution, taken from here
一个干净的dplyr解决方案,从这里开始。
library(dplyr)
df <- data_frame(x = 1:2, y = c("a", "b"))
df %>% slice(rep(1:n(), each = 2))
#3
4
If you can repeat the whole thing, or subset it first then repeat that, then this similar question may be helpful. Once again:
如果你可以重复整件事,或者先把它的子集重复一遍,那么这个类似的问题可能会有所帮助。再次:
library(mefa)
rep(mtcars,10)
or simply
或者简单地
mefa:::rep.data.frame(mtcars)
#4
4
The rep.row function seems to sometimes make lists for columns, which leads to bad memory hijinks. I have written the following which seems to work well:
row函数有时会对列进行列表,这会导致糟糕的内存hijinks。我已经写了下面的文章,看起来效果不错:
library(plyr)
rep.row <- function(r, n){
colwise(function(x) rep(x, n))(r)
}
#5
3
Adding to what @dardisco mentioned about mefa::rep.data.frame()
, it's very flexible.
添加到@dardisco提到的mefa::rep.data.frame(),它非常灵活。
You can either repeat each row N times:
你可以重复每一行N次:
rep(df, each=N)
or repeat the entire dataframe N times (think: like when you recycle a vectorized argument)
或者重复整个dataframe N times(想想:当你回收一个矢量化的参数时)
rep(df, times=N)
Two thumbs up for mefa
! I had never heard of it until now and I had to write manual code to do this.
为mefa竖起两个大拇指!直到现在我还没有听说过它,我不得不编写手工代码来完成它。
#6
3
For reference and adding to answers citing mefa, it might worth to take a look on the implementation of mefa::rep.data.frame()
in case you don't want to include the whole package:
为了引用和添加引用mefa的答案,您可以看看mefa的实现::rep.data.frame(),以防您不想包含整个包:
> data <- data.frame(a=letters[1:3], b=letters[4:6])
> data
a b
1 a d
2 b e
3 c f
> as.data.frame(lapply(data, rep, 2))
a b
1 a d
2 b e
3 c f
4 a d
5 b e
6 c f
#7
1
try using for example
尝试使用例如
N=2
rep(1:4, each = N)
as an index
作为一个指标
#8
1
My solution similar as mefa:::rep.data.frame
, but a little faster and cares about row names:
我的解决方案类似于mefa:::rep.data.frame,但是稍微快一点,并且关心行名称:
rep.data.frame <- function(x, times) {
rnames <- attr(x, "row.names")
x <- lapply(x, rep.int, times = times)
class(x) <- "data.frame"
if (!is.numeric(rnames))
attr(x, "row.names") <- make.unique(rep.int(rnames, times))
else
attr(x, "row.names") <- .set_row_names(length(rnames) * times)
x
}
Compare solutions:
比较解决方案:
library(Lahman)
library(microbenchmark)
microbenchmark(
mefa:::rep.data.frame(Batting, 10),
rep.data.frame(Batting, 10),
Batting[rep.int(seq_len(nrow(Batting)), 10), ],
times = 10
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> mefa:::rep.data.frame(Batting, 10) 127.77786 135.3480 198.0240 148.1749 278.1066 356.3210 10 a
#> rep.data.frame(Batting, 10) 79.70335 82.8165 134.0974 87.2587 191.1713 307.4567 10 a
#> Batting[rep.int(seq_len(nrow(Batting)), 10), ] 895.73750 922.7059 981.8891 956.3463 1018.2411 1127.3927 10 b
#9
0
Another way to do this would to first get row indices, append extra copies of the df, and then order by the indices:
这样做的另一种方法是,首先获取行索引,追加df的额外副本,然后按索引顺序排列:
df$index = 1:nrow(df)
df = rbind(df,df)
df = df[order(df$index),][,-ncol(df)]
Although the other solutions may be shorter, this method may be more advantageous in certain situations.
虽然其他的解决方案可能比较短,但在某些情况下这种方法可能更有利。