连接在dataframe的特定列上的行。

时间:2021-10-03 21:41:08

I have a data frame with columns that, when concatenated (row-wise) as a string, would allow me to partition the data frame into a desired form.

我有一个带有列的数据框架,当将(rowwise)作为一个字符串连接时,允许我将数据帧划分为所需的窗体。

> str(data)'data.frame':   680420 obs. of  10 variables: $ A              : chr  "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ... $ B              : chr  "2011-01-26" "2011-01-27" "2011-02-09" "2011-02-10" ... $ C              : chr  "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ... $ D              : chr  "AAA" "AAA" "BCB" "CCC" ... $ E              : chr  "A00001" "A00002" "B00002" "B00001" ... $ F              : int  9 9 37 37 37 37 191 191 191 191 ... $ G              : int  NA NA NA NA NA NA NA NA NA NA ... $ H              : int  4 4 4 4 4 4 4 4 4 4 ...

For each row, I would like to concatenate the data in columns F, E, D, and C into a string (with the underscore character as separator). Below is my unsuccessful attempt at this:

对于每一行,我希望将列F、E、D和C中的数据连接到一个字符串中(下划线字符作为分隔符)。以下是我失败的尝试:

data$id <- sapply(as.data.frame(cbind(data$F,data$E,data$D,data$C)), paste, sep="_")

And below is the undesired result:

下面是不理想的结果:

  > str(data)    'data.frame':   680420 obs. of  10 variables:     $ A              : chr  "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ...     $ B              : chr  "2011-01-26" "2011-01-27" "2011-02-09" "2011-02-10" ...     $ C              : chr  "2011-01-26" "2011-01-26" "2011-02-09" "2011-02-09" ...     $ D              : chr  "AAA" "AAA" "BCB" "CCC" ...     $ E              : chr  "A00001" "A00002" "B00002" "B00001" ...     $ F              : int  9 9 37 37 37 37 191 191 191 191 ...     $ G              : int  NA NA NA NA NA NA NA NA NA NA ...     $ H              : int  4 4 4 4 4 4 4 4 4 4 ...     $ id             : chr [1:680420, 1:4] "9" "9" "37" "37" ...      ..- attr(*, "dimnames")=List of 2      .. ..$ : NULL      .. ..$ : chr  "V1" "V2" "V3" "V4"

Any help would be greatly appreciated.

如有任何帮助,我们将不胜感激。

3 个解决方案

#1


38  

Try

试一试

 data$id <- paste(data$F, data$E, data$D, data$C, sep="_")

instead. The beauty of vectorized code is that you do not need row-by-row loops, or loop-equivalent *apply functions.

代替。矢量化代码的美妙之处在于,您不需要逐行循环,或者循环等效的*apply函数。

Edit Even better is

编辑更好的是

 data <- within(data,  id <- paste(F, E, D, C, sep=""))

#2


3  

Use unite of tidyr package:

tidyr包的联合使用:

require(tidyr)data <- data %>% unite(id, F, E, D, C, sep = '_')

First parameter is the desired name, all next up to sep - columns to concatenate.

第一个参数是想要的名称,所有的都在sep - column上进行连接。

#3


2  

Either stringr::str_c() or paste() will work.

可以使用stringr::str_c()或paste()。

require(stringr)data <- within(data, str_c(F,E,D,C, sep="_")or elsedata <- within(data, paste(F,E,D,C, sep="_")

#1


38  

Try

试一试

 data$id <- paste(data$F, data$E, data$D, data$C, sep="_")

instead. The beauty of vectorized code is that you do not need row-by-row loops, or loop-equivalent *apply functions.

代替。矢量化代码的美妙之处在于,您不需要逐行循环,或者循环等效的*apply函数。

Edit Even better is

编辑更好的是

 data <- within(data,  id <- paste(F, E, D, C, sep=""))

#2


3  

Use unite of tidyr package:

tidyr包的联合使用:

require(tidyr)data <- data %>% unite(id, F, E, D, C, sep = '_')

First parameter is the desired name, all next up to sep - columns to concatenate.

第一个参数是想要的名称,所有的都在sep - column上进行连接。

#3


2  

Either stringr::str_c() or paste() will work.

可以使用stringr::str_c()或paste()。

require(stringr)data <- within(data, str_c(F,E,D,C, sep="_")or elsedata <- within(data, paste(F,E,D,C, sep="_")