比较两个data.frame和delete行

时间:2022-11-09 22:51:04

I have two data.frame x1 & x2. I want to remove rows from x2 if there is a common gene found in x1 and x2

我有两个数据,坐标系x1和x2。如果在x1和x2中有一个共同的基因,我想从x2中移除行。

x1 <- chr   start   end         Genes   
      1      8401    8410      Mndal,Mnda,Ifi203,Ifi202b    
      2      8001    8020      Cyb5r1,Adipor1,Klhl12    
      3      4001    4020      Alyref2,Itln1,Cd244  

x2 <- chr   start   end         Genes
      1      8861   8868       Olfr1193 
      1      8405    8420      Mrgprx3-ps,Mrgpra1,Mrgpra2a,Mndal,Mrgpra2b   
      2      8501    8520      Chia,Chi3l3,Chi3l4   
      3      4321    4670      Tdpoz4,Tdpoz3,Tdpoz5 



x2 <- chr   start   end         Genes   
      1      8861   8868       Olfr1193
      2      8501    8520      Chia,Chi3l3,Chi3l4   
      3      4321    4670      Tdpoz4,Tdpoz3,Tdpoz5 

1 个解决方案

#1


3  

You could try

你可以试试

x2[mapply(function(x,y) !any(x %in% y), 
        strsplit(x1$Genes, ','), strsplit(x2$Genes, ',')),]
#  chr start  end                Genes
#2   2  8501 8520   Chia,Chi3l3,Chi3l4
#3   3  4321 4670 Tdpoz4,Tdpoz3,Tdpoz5

Or replace !any(x %in% y) with length(intersect(x,y))==0.

或者替换!任何(% y中的x %)的长度(相交于(x,y))= 0。

NOTE: If the "Genes" column is "factor", convert it to "character" as strsplit cannot take 'factor' class. i.e. strsplit(as.character(x1$Genes, ','))

注意:如果“基因”列是“因子”,则转换为“字符”,因为strsplit不能使用“因子”类。即strsplit(如。字符(x1基因,美元','))

Update

Based on the new dataset for 'x2', we can merge the two datasets by the 'chr' column, strsplit the 'Genes.x', 'Genes.y' from the output dataset ('xNew'), get the logical index based on the occurrence of any element of 'Genes.x' in 'Genes.y' strings, use that to subset the 'x2' dataset

基于“x2”的新数据集,我们可以通过“chr”列合并这两个数据集,跨出“基因”。x”、“基因。y'从输出数据集('xNew')中,根据'Genes的任何元素的出现获得逻辑索引。x '在'基因。y' strings,用来将'x2'数据集子集

 xNew <- merge(x1, x2[,c(1,4)], by='chr')
 indx <- mapply(function(x,y) any(x %in% y), 
      strsplit(xNew$Genes.x, ','), strsplit(xNew$Genes.y, ','))
 x2[!indx,]
 # chr start  end                Genes
 #1   1  8861 8868             Olfr1193
 #3   2  8501 8520   Chia,Chi3l3,Chi3l4
 #4   3  4321 4670 Tdpoz4,Tdpoz3,Tdpoz5

#1


3  

You could try

你可以试试

x2[mapply(function(x,y) !any(x %in% y), 
        strsplit(x1$Genes, ','), strsplit(x2$Genes, ',')),]
#  chr start  end                Genes
#2   2  8501 8520   Chia,Chi3l3,Chi3l4
#3   3  4321 4670 Tdpoz4,Tdpoz3,Tdpoz5

Or replace !any(x %in% y) with length(intersect(x,y))==0.

或者替换!任何(% y中的x %)的长度(相交于(x,y))= 0。

NOTE: If the "Genes" column is "factor", convert it to "character" as strsplit cannot take 'factor' class. i.e. strsplit(as.character(x1$Genes, ','))

注意:如果“基因”列是“因子”,则转换为“字符”,因为strsplit不能使用“因子”类。即strsplit(如。字符(x1基因,美元','))

Update

Based on the new dataset for 'x2', we can merge the two datasets by the 'chr' column, strsplit the 'Genes.x', 'Genes.y' from the output dataset ('xNew'), get the logical index based on the occurrence of any element of 'Genes.x' in 'Genes.y' strings, use that to subset the 'x2' dataset

基于“x2”的新数据集,我们可以通过“chr”列合并这两个数据集,跨出“基因”。x”、“基因。y'从输出数据集('xNew')中,根据'Genes的任何元素的出现获得逻辑索引。x '在'基因。y' strings,用来将'x2'数据集子集

 xNew <- merge(x1, x2[,c(1,4)], by='chr')
 indx <- mapply(function(x,y) any(x %in% y), 
      strsplit(xNew$Genes.x, ','), strsplit(xNew$Genes.y, ','))
 x2[!indx,]
 # chr start  end                Genes
 #1   1  8861 8868             Olfr1193
 #3   2  8501 8520   Chia,Chi3l3,Chi3l4
 #4   3  4321 4670 Tdpoz4,Tdpoz3,Tdpoz5