如果可变数量的数据,则拆分列data.frame

时间:2021-10-26 19:35:20

I want to split column y of df below according to the '_' but my data is incomplet. (df is just a representative portion of a bigger data.frame).

我想根据'_'拆分下面的df列y,但我的数据不完整。 (df只是更大的data.frame的代表性部分)。

df <- data.frame(x = 1:10,
                 y = c("vuh_ftu_yefq", "sos_nvtspb", "pfymm_ucms",
                       "tucbexcqzh", "n_zndbhoun", "wdetzaolvn",
                       "lvohrpdqns", "wso_bsqwvr", "wx_gbkbxjl",
                       "t_dbxkkvge"))

I have tried using:

我尝试过使用:

df$z <- strsplit(df$y,'_')

But I get an error because the number of pieces in each list are different.

但我得到一个错误,因为每个列表中的件数是不同的。

How can I do this?

我怎样才能做到这一点?

2 个解决方案

#1


Assumptions:

  • ) needed to close out df in your example.
  • )需要在你的例子中关闭df。

  • incomplete data means it's filled in from the left such that a value without intervening '_' is the first or datum.
  • 不完整的数据意味着它从左边填充,使得没有插入'_'的值是第一个或基准。

tidyr's separate():

result <- separate(df, y, into = c("z1","z2","z3") , sep ='_', extra = "drop")
  • the key here is extra = "drop" which according to docs always returns length(into) pieces by dropping or expanding as necessary.
  • 这里的关键是extra =“drop”,根据docs,它总是根据需要通过删除或扩展来返回长度(into)。

data.table's tstrsplit()

DT <- as.data.table(df)
result <- DT[, c("z1", "z2","z3") := tstrsplit(y, '_', fixed=TRUE)][]
  • the default behaviour for tstrsplit() does what you need and the fixed=TRUE is to pass to strsplit() underneath to keep things hasty.
  • tstrsplit()的默认行为是你需要的,而fixed = TRUE是传递到下面的strsplit()以保持仓促。

note: if your incomplete data is filled from the right you need to unmix your variables here!!!

注意:如果从右侧填写了不完整的数据,则需要在此处取消混合变量!

#2


You could use the separate function from tidyr.

你可以使用tidyr的单独功能。

# required package
require(tidyr)
# separate (removing the y column)
separate(df, y, paste0("z", 1:3), sep = "_", extra = "merge")
# separate without removing the y column
separate(df, y, paste0("z", 1:3), sep = "_", extra = "merge", remove = FALSE)

#1


Assumptions:

  • ) needed to close out df in your example.
  • )需要在你的例子中关闭df。

  • incomplete data means it's filled in from the left such that a value without intervening '_' is the first or datum.
  • 不完整的数据意味着它从左边填充,使得没有插入'_'的值是第一个或基准。

tidyr's separate():

result <- separate(df, y, into = c("z1","z2","z3") , sep ='_', extra = "drop")
  • the key here is extra = "drop" which according to docs always returns length(into) pieces by dropping or expanding as necessary.
  • 这里的关键是extra =“drop”,根据docs,它总是根据需要通过删除或扩展来返回长度(into)。

data.table's tstrsplit()

DT <- as.data.table(df)
result <- DT[, c("z1", "z2","z3") := tstrsplit(y, '_', fixed=TRUE)][]
  • the default behaviour for tstrsplit() does what you need and the fixed=TRUE is to pass to strsplit() underneath to keep things hasty.
  • tstrsplit()的默认行为是你需要的,而fixed = TRUE是传递到下面的strsplit()以保持仓促。

note: if your incomplete data is filled from the right you need to unmix your variables here!!!

注意:如果从右侧填写了不完整的数据,则需要在此处取消混合变量!

#2


You could use the separate function from tidyr.

你可以使用tidyr的单独功能。

# required package
require(tidyr)
# separate (removing the y column)
separate(df, y, paste0("z", 1:3), sep = "_", extra = "merge")
# separate without removing the y column
separate(df, y, paste0("z", 1:3), sep = "_", extra = "merge", remove = FALSE)