I have a data frame with columns that, when concatenated (row-wise) as a string, would allow me to partition the data frame into a desired form.
我有一个带有列的数据框架,当将(rowwise)作为一个字符串连接时,允许我将数据帧划分为所需的窗体。
> str(data)'data.frame': 680420 obs. of 10 variables: $ A : chr "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ... $ B : chr "2011-01-26" "2011-01-27" "2011-02-09" "2011-02-10" ... $ C : chr "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ... $ D : chr "AAA" "AAA" "BCB" "CCC" ... $ E : chr "A00001" "A00002" "B00002" "B00001" ... $ F : int 9 9 37 37 37 37 191 191 191 191 ... $ G : int NA NA NA NA NA NA NA NA NA NA ... $ H : int 4 4 4 4 4 4 4 4 4 4 ...
For each row, I would like to concatenate the data in columns F, E, D, and C into a string (with the underscore character as separator). Below is my unsuccessful attempt at this:
对于每一行,我希望将列F、E、D和C中的数据连接到一个字符串中(下划线字符作为分隔符)。以下是我失败的尝试:
data$id <- sapply(as.data.frame(cbind(data$F,data$E,data$D,data$C)), paste, sep="_")
And below is the undesired result:
下面是不理想的结果:
> str(data) 'data.frame': 680420 obs. of 10 variables: $ A : chr "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ... $ B : chr "2011-01-26" "2011-01-27" "2011-02-09" "2011-02-10" ... $ C : chr "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ... $ D : chr "AAA" "AAA" "BCB" "CCC" ... $ E : chr "A00001" "A00002" "B00002" "B00001" ... $ F : int 9 9 37 37 37 37 191 191 191 191 ... $ G : int NA NA NA NA NA NA NA NA NA NA ... $ H : int 4 4 4 4 4 4 4 4 4 4 ... $ id : chr [1:680420, 1:4] "9" "9" "37" "37" ... ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr "V1" "V2" "V3" "V4"
Any help would be greatly appreciated.
如有任何帮助,我们将不胜感激。
3 个解决方案
#1
38
Try
试一试
data$id <- paste(data$F, data$E, data$D, data$C, sep="_")
instead. The beauty of vectorized code is that you do not need row-by-row loops, or loop-equivalent *apply functions.
代替。矢量化代码的美妙之处在于,您不需要逐行循环,或者循环等效的*apply函数。
Edit Even better is
编辑更好的是
data <- within(data, id <- paste(F, E, D, C, sep=""))
#2
3
Use unite
of tidyr
package:
tidyr包的联合使用:
require(tidyr)data <- data %>% unite(id, F, E, D, C, sep = '_')
First parameter is the desired name, all next up to sep
- columns to concatenate.
第一个参数是想要的名称,所有的都在sep - column上进行连接。
#3
2
Either stringr::str_c()
or paste()
will work.
可以使用stringr::str_c()或paste()。
require(stringr)data <- within(data, str_c(F,E,D,C, sep="_")or elsedata <- within(data, paste(F,E,D,C, sep="_")
#1
38
Try
试一试
data$id <- paste(data$F, data$E, data$D, data$C, sep="_")
instead. The beauty of vectorized code is that you do not need row-by-row loops, or loop-equivalent *apply functions.
代替。矢量化代码的美妙之处在于,您不需要逐行循环,或者循环等效的*apply函数。
Edit Even better is
编辑更好的是
data <- within(data, id <- paste(F, E, D, C, sep=""))
#2
3
Use unite
of tidyr
package:
tidyr包的联合使用:
require(tidyr)data <- data %>% unite(id, F, E, D, C, sep = '_')
First parameter is the desired name, all next up to sep
- columns to concatenate.
第一个参数是想要的名称,所有的都在sep - column上进行连接。
#3
2
Either stringr::str_c()
or paste()
will work.
可以使用stringr::str_c()或paste()。
require(stringr)data <- within(data, str_c(F,E,D,C, sep="_")or elsedata <- within(data, paste(F,E,D,C, sep="_")