如何将多个列值合并到一个列中?

时间:2021-09-03 22:58:08

I have a data frame called "stemmoutput" (see below) :

我有一个名为“stemmoutput”的数据框架(见下文):

     X1      X2       X3      X4      X5      X6      X7     X8     X9    X10     
1  tanaman  cabai                                    
2  banget   hama     sakit   tanaman                            
3  koramil  nogosari melaks  ecek     hama   tanaman padi    ppl    ds   rambun

And I want to merge multiple columns values into one column like this :

我想将多个列值合并到一个列中,如下所示:

     TEXT
1  tanaman cabai                                     
2  banget hama sakit tanaman                            
3  koramil nogosari melaks ecek hama tanaman padi ppl ds rambun 

I have tried this code, and it works

我试过这个代码,它是有效的

stemmoutput$TEXT <- with(stemmoutput, paste(X1,X2,X3,X4,X5,X6,X7,X8,X9,X10, sep=" "))

but is there any other way that is more efficient, without having to write down the name of the column one by one?

但是有没有其他更有效的方法,而不需要逐一写下列的名称呢?

I've also tried this code like below but that didn't work either.

我也试过下面这样的代码,但也没用。

for(i in names(stemmoutput)){
     stemmoutput$TEXT <- with(stemmoutput, paste(i, sep=" "))}

2 个解决方案

#1


2  

Try do.call

尝试do.call

library(stringr)
newdat <- data.frame(TEXT=str_trim(do.call(paste, stemmoutput)),
                     stringsAsFactors=FALSE)

newdat
#                                                         TEXT
#1                                                tanaman cabai
#2                                    banget hama sakit tanaman
#3 koramil nogosari melaks ecek hama tanaman padi ppl ds rambun

It may be better to use , as delimiter if there are multi-part words within a column

如果列中有多部分词,最好使用分隔符

 TEXT <- gsub(', [^A-Za-z]+', '', do.call(paste, c(stemmoutput, sep=', ')))

 newdat <- data.frame(TEXT, stringsAsFactors=FALSE)
 newdat
 #                                                                  TEXT
 #1                                                        tanaman, cabai
 #2                                          banget, hama, sakit, tanaman
 #3 koramil, nogosari, melaks, ecek, hama, tanaman, padi, ppl, ds, rambun

#2


1  

Here's another idea using tidyr

这是使用tidyr的另一个想法

If you want to unite only columns from X1 to X10 you could do:

如果你只想将X1到X10的列合并,你可以这样做:

library(tidyr)
unite(stemmoutput, TEXT, num_range("X", 1:10), sep = " ")

If you want to unite all columns do:

如果你想把所有的栏目联合起来,请做:

unite(stemmoutput, TEXT, everything(), sep = " ")

Benchmarks

基准

I tried the two approaches on the benchmark because I suspected unite would be much faster than do.call, but they ended up being pretty equivalent:

我在基准测试中尝试了这两种方法,因为我怀疑unite的速度会比实际快得多。打电话,但结果是相当相似的:

df <- data.frame(replicate(10,sample(paste0(
  sample(LETTERS[1:10]), collapse = ""), 10e5, replace = TRUE)))

mbm <- microbenchmark(
  akrun = data.frame(TEXT=str_trim(do.call(paste, df)), stringsAsFactors=FALSE),
  steven = unite(df, TEXT, everything(), sep = " "),
  times = 50
)

如何将多个列值合并到一个列中?

# Unit: milliseconds
#    expr       min        lq      mean    median       uq       max neval cld
#   akrun 1117.1350 1132.3861 1146.3943 1136.3094 1145.076 1232.5633    50   b
#  steven  910.7432  924.0386  927.8614  927.7224  929.649  995.3584    50  a

#1


2  

Try do.call

尝试do.call

library(stringr)
newdat <- data.frame(TEXT=str_trim(do.call(paste, stemmoutput)),
                     stringsAsFactors=FALSE)

newdat
#                                                         TEXT
#1                                                tanaman cabai
#2                                    banget hama sakit tanaman
#3 koramil nogosari melaks ecek hama tanaman padi ppl ds rambun

It may be better to use , as delimiter if there are multi-part words within a column

如果列中有多部分词,最好使用分隔符

 TEXT <- gsub(', [^A-Za-z]+', '', do.call(paste, c(stemmoutput, sep=', ')))

 newdat <- data.frame(TEXT, stringsAsFactors=FALSE)
 newdat
 #                                                                  TEXT
 #1                                                        tanaman, cabai
 #2                                          banget, hama, sakit, tanaman
 #3 koramil, nogosari, melaks, ecek, hama, tanaman, padi, ppl, ds, rambun

#2


1  

Here's another idea using tidyr

这是使用tidyr的另一个想法

If you want to unite only columns from X1 to X10 you could do:

如果你只想将X1到X10的列合并,你可以这样做:

library(tidyr)
unite(stemmoutput, TEXT, num_range("X", 1:10), sep = " ")

If you want to unite all columns do:

如果你想把所有的栏目联合起来,请做:

unite(stemmoutput, TEXT, everything(), sep = " ")

Benchmarks

基准

I tried the two approaches on the benchmark because I suspected unite would be much faster than do.call, but they ended up being pretty equivalent:

我在基准测试中尝试了这两种方法,因为我怀疑unite的速度会比实际快得多。打电话,但结果是相当相似的:

df <- data.frame(replicate(10,sample(paste0(
  sample(LETTERS[1:10]), collapse = ""), 10e5, replace = TRUE)))

mbm <- microbenchmark(
  akrun = data.frame(TEXT=str_trim(do.call(paste, df)), stringsAsFactors=FALSE),
  steven = unite(df, TEXT, everything(), sep = " "),
  times = 50
)

如何将多个列值合并到一个列中?

# Unit: milliseconds
#    expr       min        lq      mean    median       uq       max neval cld
#   akrun 1117.1350 1132.3861 1146.3943 1136.3094 1145.076 1232.5633    50   b
#  steven  910.7432  924.0386  927.8614  927.7224  929.649  995.3584    50  a