I have a large data frame marking occurrences of trigrams in a string, where the strings are the rows, the trigrams are the columns, and the values mark whether an trigram occurs in a string.
我有一个大的数据框,标记字符串中出现三元组,其中字符串是行,三元组是列,值表示三角形是否出现在字符串中。
so something like this:
所以这样的事情:
strs <- c('this', 'that', 'chat', 'chin')
thi <- c(1, 0, 0, 0)
tha <- c(0, 1, 0, 0)
hin <- c(0, 0, 0, 1)
hat <- c(0, 1, 1, 0)
df <- data.frame(strs, thi, tha, hin, hat)
df
# strs thi tha hin hat
#1 this 1 0 0 0
#2 that 0 1 0 1
#3 chat 0 0 0 1
#4 chin 0 0 1 0
I want to get all of the columns/trigrams that have a 1 for a given row or a given string.
我想获得给定行或给定字符串的所有列/三元组。
So for row 2, the string 'that', the result would a data frame that looks like this:
因此对于第2行,字符串'that',结果将是一个如下所示的数据框:
str tha hat
1 this 0 0
2 that 1 1
3 chat 0 1
4 chin 0 0
How could I do this?
我怎么能这样做?
2 个解决方案
#1
This will give you the desired output df.
这将为您提供所需的输出df。
givenStr <- "that"
row <- df[df$strs==givenStr,]
df[,c(1,1+which(row[,-1]==1))]
#2
In a one liner:
在一个班轮:
df[as.logical(df[df$strs=='that',])]
# strs tha hat
#1 this 0 0
#2 that 1 1
#3 chat 0 1
#4 chin 0 0
#1
This will give you the desired output df.
这将为您提供所需的输出df。
givenStr <- "that"
row <- df[df$strs==givenStr,]
df[,c(1,1+which(row[,-1]==1))]
#2
In a one liner:
在一个班轮:
df[as.logical(df[df$strs=='that',])]
# strs tha hat
#1 this 0 0
#2 that 1 1
#3 chat 0 1
#4 chin 0 0