I want to split column y
of df
below according to the '_' but my data is incomplet. (df
is just a representative portion of a bigger data.frame
).
我想根据'_'拆分下面的df列y,但我的数据不完整。 (df只是更大的data.frame的代表性部分)。
df <- data.frame(x = 1:10,
y = c("vuh_ftu_yefq", "sos_nvtspb", "pfymm_ucms",
"tucbexcqzh", "n_zndbhoun", "wdetzaolvn",
"lvohrpdqns", "wso_bsqwvr", "wx_gbkbxjl",
"t_dbxkkvge"))
I have tried using:
我尝试过使用:
df$z <- strsplit(df$y,'_')
But I get an error because the number of pieces in each list are different.
但我得到一个错误,因为每个列表中的件数是不同的。
How can I do this?
我怎样才能做到这一点?
2 个解决方案
#1
Assumptions:
-
)
needed to close out df in your example. -
incomplete data
means it's filled in from the left such that a value without intervening '_' is the first or datum.
)需要在你的例子中关闭df。
不完整的数据意味着它从左边填充,使得没有插入'_'的值是第一个或基准。
tidyr
's separate()
:
result <- separate(df, y, into = c("z1","z2","z3") , sep ='_', extra = "drop")
- the key here is
extra = "drop"
which according to docs always returns length(into) pieces by dropping or expanding as necessary.
这里的关键是extra =“drop”,根据docs,它总是根据需要通过删除或扩展来返回长度(into)。
data.table
's tstrsplit()
DT <- as.data.table(df)
result <- DT[, c("z1", "z2","z3") := tstrsplit(y, '_', fixed=TRUE)][]
- the default behaviour for
tstrsplit()
does what you need and thefixed=TRUE
is to pass tostrsplit()
underneath to keep things hasty.
tstrsplit()的默认行为是你需要的,而fixed = TRUE是传递到下面的strsplit()以保持仓促。
note: if your incomplete data is filled from the right you need to unmix your variables here!!!
注意:如果从右侧填写了不完整的数据,则需要在此处取消混合变量!
#2
You could use the separate
function from tidyr
.
你可以使用tidyr的单独功能。
# required package
require(tidyr)
# separate (removing the y column)
separate(df, y, paste0("z", 1:3), sep = "_", extra = "merge")
# separate without removing the y column
separate(df, y, paste0("z", 1:3), sep = "_", extra = "merge", remove = FALSE)
#1
Assumptions:
-
)
needed to close out df in your example. -
incomplete data
means it's filled in from the left such that a value without intervening '_' is the first or datum.
)需要在你的例子中关闭df。
不完整的数据意味着它从左边填充,使得没有插入'_'的值是第一个或基准。
tidyr
's separate()
:
result <- separate(df, y, into = c("z1","z2","z3") , sep ='_', extra = "drop")
- the key here is
extra = "drop"
which according to docs always returns length(into) pieces by dropping or expanding as necessary.
这里的关键是extra =“drop”,根据docs,它总是根据需要通过删除或扩展来返回长度(into)。
data.table
's tstrsplit()
DT <- as.data.table(df)
result <- DT[, c("z1", "z2","z3") := tstrsplit(y, '_', fixed=TRUE)][]
- the default behaviour for
tstrsplit()
does what you need and thefixed=TRUE
is to pass tostrsplit()
underneath to keep things hasty.
tstrsplit()的默认行为是你需要的,而fixed = TRUE是传递到下面的strsplit()以保持仓促。
note: if your incomplete data is filled from the right you need to unmix your variables here!!!
注意:如果从右侧填写了不完整的数据,则需要在此处取消混合变量!
#2
You could use the separate
function from tidyr
.
你可以使用tidyr的单独功能。
# required package
require(tidyr)
# separate (removing the y column)
separate(df, y, paste0("z", 1:3), sep = "_", extra = "merge")
# separate without removing the y column
separate(df, y, paste0("z", 1:3), sep = "_", extra = "merge", remove = FALSE)