如果可变数量的数据，则拆分列data.frame

I want to split column y of df below according to the '_' but my data is incomplet. (df is just a representative portion of a bigger data.frame).

我想根据'_'拆分下面的df列y,但我的数据不完整。 (df只是更大的data.frame的代表性部分)。

df <- data.frame(x = 1:10,
                 y = c("vuh_ftu_yefq", "sos_nvtspb", "pfymm_ucms",
                       "tucbexcqzh", "n_zndbhoun", "wdetzaolvn",
                       "lvohrpdqns", "wso_bsqwvr", "wx_gbkbxjl",
                       "t_dbxkkvge"))

I have tried using:

我尝试过使用:

df$z <- strsplit(df$y,'_')

But I get an error because the number of pieces in each list are different.

但我得到一个错误,因为每个列表中的件数是不同的。

How can I do this?

我怎样才能做到这一点?

2 个解决方案

#1

Assumptions:

) needed to close out df in your example.

)需要在你的例子中关闭df。

incomplete data means it's filled in from the left such that a value without intervening '_' is the first or datum.

不完整的数据意味着它从左边填充,使得没有插入'_'的值是第一个或基准。

`tidyr`'s `separate()`:

result <- separate(df, y, into = c("z1","z2","z3") , sep ='_', extra = "drop")

the key here is extra = "drop" which according to docs always returns length(into) pieces by dropping or expanding as necessary.

这里的关键是extra =“drop”,根据docs,它总是根据需要通过删除或扩展来返回长度(into)。

`data.table`'s `tstrsplit()`

DT <- as.data.table(df)
result <- DT[, c("z1", "z2","z3") := tstrsplit(y, '_', fixed=TRUE)][]

the default behaviour for tstrsplit() does what you need and the fixed=TRUE is to pass to strsplit() underneath to keep things hasty.

tstrsplit()的默认行为是你需要的,而fixed = TRUE是传递到下面的strsplit()以保持仓促。

note: if your incomplete data is filled from the right you need to unmix your variables here!!!

注意:如果从右侧填写了不完整的数据,则需要在此处取消混合变量!

#2

You could use the separate function from tidyr.

你可以使用tidyr的单独功能。

# required package
require(tidyr)
# separate (removing the y column)
separate(df, y, paste0("z", 1:3), sep = "_", extra = "merge")
# separate without removing the y column
separate(df, y, paste0("z", 1:3), sep = "_", extra = "merge", remove = FALSE)

#1

Assumptions:

) needed to close out df in your example.

)需要在你的例子中关闭df。

incomplete data means it's filled in from the left such that a value without intervening '_' is the first or datum.

不完整的数据意味着它从左边填充,使得没有插入'_'的值是第一个或基准。

`tidyr`'s `separate()`:

result <- separate(df, y, into = c("z1","z2","z3") , sep ='_', extra = "drop")

the key here is extra = "drop" which according to docs always returns length(into) pieces by dropping or expanding as necessary.

这里的关键是extra =“drop”,根据docs,它总是根据需要通过删除或扩展来返回长度(into)。

`data.table`'s `tstrsplit()`

DT <- as.data.table(df)
result <- DT[, c("z1", "z2","z3") := tstrsplit(y, '_', fixed=TRUE)][]

the default behaviour for tstrsplit() does what you need and the fixed=TRUE is to pass to strsplit() underneath to keep things hasty.

tstrsplit()的默认行为是你需要的,而fixed = TRUE是传递到下面的strsplit()以保持仓促。

note: if your incomplete data is filled from the right you need to unmix your variables here!!!

注意:如果从右侧填写了不完整的数据,则需要在此处取消混合变量!

#2

You could use the separate function from tidyr.

你可以使用tidyr的单独功能。

# required package
require(tidyr)
# separate (removing the y column)
separate(df, y, paste0("z", 1:3), sep = "_", extra = "merge")
# separate without removing the y column
separate(df, y, paste0("z", 1:3), sep = "_", extra = "merge", remove = FALSE)

秒客网

如果可变数量的数据，则拆分列data.frame

2 个解决方案

#1

Assumptions:

`tidyr`'s `separate()`:

`data.table`'s `tstrsplit()`

#2

#1

Assumptions:

`tidyr`'s `separate()`:

`data.table`'s `tstrsplit()`

#2

相关文章

如果可变数量的数据，则拆分列data.frame

2 个解决方案

#1

Assumptions:

tidyr's separate():

data.table's tstrsplit()

#2

#1

Assumptions:

tidyr's separate():

data.table's tstrsplit()

#2

相关文章

`tidyr`'s `separate()`:

`data.table`'s `tstrsplit()`

`tidyr`'s `separate()`:

`data.table`'s `tstrsplit()`