如何将结果限制为仅包含0的数据框行和列?

时间:2021-10-12 09:10:44

I am doing approximate string matching in R. I am rather inexperienced with this technique, but because I want to find instances where my x strings match parts of my y strings exactly, I am only interested in Levenshtein scores of 0 (is this the correct approach?).

我在R中进行近似字符串匹配。我对这种技术缺乏经验,但是因为我想找到我的x字符串与y字符串的部分完全匹配的实例,我只对Levenshtein得分为0感兴趣(这是正确的接近?)。

What's the most convenient way to subset the results? Because I have about 10k columns and 1k rows, I'm not sure there's any way to efficiently visualize the results either. I apologize for the lack of tact in this question. I just lack experience with this.

对结果进行分组的最方便的方法是什么?因为我有大约10k列和1k行,我不确定是否有任何方法可以有效地可视化结果。我为这个问题缺乏机智而道歉。我只是缺乏经验。

2 个解决方案

#1


1  

Using Mark's data, here's a way to build the indices with apply:

使用Mark的数据,这是使用apply构建索引的方法:

rows <- apply(my.data, 1, function(x) any(!x))
cols <- apply(my.data, 2, function(x) any(!x))

my.data[rows, cols]
##   V2 V3 V4
## 1  0  2  1
## 3  1  1  0
## 5  0  0  0

#2


0  

This will retain all rows and columns that contain a zero.

这将保留包含零的所有行和列。

set.seed(2234)

my.data <- as.data.frame(matrix(sample(0:2,20,replace=TRUE), nrow=5))
my.data

aa <- unique(which(my.data==0,arr.ind=TRUE)[,1])
bb <- unique(which(my.data==0,arr.ind=TRUE)[,2])

my.data2 <- my.data[sort(aa),sort(bb)]
my.data2

> my.data
  V1 V2 V3 V4
1  2  0  2  1
2  2  2  1  2
3  2  1  1  0
4  2  2  2  1
5  1  0  0  0

> my.data2
  V2 V3 V4
1  0  2  1
3  1  1  0
5  0  0  0

#1


1  

Using Mark's data, here's a way to build the indices with apply:

使用Mark的数据,这是使用apply构建索引的方法:

rows <- apply(my.data, 1, function(x) any(!x))
cols <- apply(my.data, 2, function(x) any(!x))

my.data[rows, cols]
##   V2 V3 V4
## 1  0  2  1
## 3  1  1  0
## 5  0  0  0

#2


0  

This will retain all rows and columns that contain a zero.

这将保留包含零的所有行和列。

set.seed(2234)

my.data <- as.data.frame(matrix(sample(0:2,20,replace=TRUE), nrow=5))
my.data

aa <- unique(which(my.data==0,arr.ind=TRUE)[,1])
bb <- unique(which(my.data==0,arr.ind=TRUE)[,2])

my.data2 <- my.data[sort(aa),sort(bb)]
my.data2

> my.data
  V1 V2 V3 V4
1  2  0  2  1
2  2  2  1  2
3  2  1  1  0
4  2  2  2  1
5  1  0  0  0

> my.data2
  V2 V3 V4
1  0  2  1
3  1  1  0
5  0  0  0