This question already has an answer here:
这个问题已经有了答案:
- R: Convert delimited string into variables 3 answers
- 将带分隔符的字符串转换为变量3的答案
I have the following dataset
我有以下数据集
#datset
id attributes value
1 a,b,c 1
2 c,d 0
3 b,e 1
I wish to make a pivot table out of them and assign binary values to the attribute (1 to the attributes if they exist otherwise assign 0 to them). My ideal output will be the following:
我希望用它们创建一个pivot表,并为属性分配二进制值(如果属性存在,则为1)。我的理想输出如下:
#output
id a b c d e Value
1 1 1 1 0 0 1
2 0 0 1 1 0 0
3 0 1 0 0 1 1
Any tip is really appreciated.
任何提示都非常感谢。
2 个解决方案
#1
1
We split the 'attributes' column by ',', get the frequency with mtabulate
from qdapTools
and cbind
with the first and third column.
我们将“属性”列拆分为“,”,从qdapTools中获取mtabulate的频率,并使用第一和第三列的cbind。
library(qdapTools)
cbind(df1[1], mtabulate(strsplit(df1$attributes, ",")), df1[3])
# id a b c d e value
#1 1 1 1 1 0 0 1
#2 2 0 0 1 1 0 0
#3 3 0 1 0 0 1 1
#2
1
With base R:
于基本的R:
attributes <- sort(unique(unlist(strsplit(as.character(df$attributes), split=','))))
cols <- as.data.frame(matrix(rep(0, nrow(df)*length(attributes)), ncol=length(attributes)))
names(cols) <- attributes
df <- cbind.data.frame(df, cols)
df <- as.data.frame(t(apply(df, 1, function(x){attributes <- strsplit(x['attributes'], split=','); x[unlist(attributes)] <- 1;x})))[c('id', attributes, 'value')]
df
id a b c d e value
1 1 1 1 1 0 0 1
2 2 0 0 1 1 0 0
3 3 0 1 0 0 1 1
#1
1
We split the 'attributes' column by ',', get the frequency with mtabulate
from qdapTools
and cbind
with the first and third column.
我们将“属性”列拆分为“,”,从qdapTools中获取mtabulate的频率,并使用第一和第三列的cbind。
library(qdapTools)
cbind(df1[1], mtabulate(strsplit(df1$attributes, ",")), df1[3])
# id a b c d e value
#1 1 1 1 1 0 0 1
#2 2 0 0 1 1 0 0
#3 3 0 1 0 0 1 1
#2
1
With base R:
于基本的R:
attributes <- sort(unique(unlist(strsplit(as.character(df$attributes), split=','))))
cols <- as.data.frame(matrix(rep(0, nrow(df)*length(attributes)), ncol=length(attributes)))
names(cols) <- attributes
df <- cbind.data.frame(df, cols)
df <- as.data.frame(t(apply(df, 1, function(x){attributes <- strsplit(x['attributes'], split=','); x[unlist(attributes)] <- 1;x})))[c('id', attributes, 'value')]
df
id a b c d e value
1 1 1 1 1 0 0 1
2 2 0 0 1 1 0 0
3 3 0 1 0 0 1 1