I have two data.frame x1 & x2. I want to remove rows from x2 if there is a common gene found in x1 and x2
我有两个数据,坐标系x1和x2。如果在x1和x2中有一个共同的基因,我想从x2中移除行。
x1 <- chr start end Genes
1 8401 8410 Mndal,Mnda,Ifi203,Ifi202b
2 8001 8020 Cyb5r1,Adipor1,Klhl12
3 4001 4020 Alyref2,Itln1,Cd244
x2 <- chr start end Genes
1 8861 8868 Olfr1193
1 8405 8420 Mrgprx3-ps,Mrgpra1,Mrgpra2a,Mndal,Mrgpra2b
2 8501 8520 Chia,Chi3l3,Chi3l4
3 4321 4670 Tdpoz4,Tdpoz3,Tdpoz5
x2 <- chr start end Genes
1 8861 8868 Olfr1193
2 8501 8520 Chia,Chi3l3,Chi3l4
3 4321 4670 Tdpoz4,Tdpoz3,Tdpoz5
1 个解决方案
#1
3
You could try
你可以试试
x2[mapply(function(x,y) !any(x %in% y),
strsplit(x1$Genes, ','), strsplit(x2$Genes, ',')),]
# chr start end Genes
#2 2 8501 8520 Chia,Chi3l3,Chi3l4
#3 3 4321 4670 Tdpoz4,Tdpoz3,Tdpoz5
Or replace !any(x %in% y)
with length(intersect(x,y))==0
.
或者替换!任何(% y中的x %)的长度(相交于(x,y))= 0。
NOTE: If the "Genes" column is "factor", convert it to "character" as strsplit
cannot take 'factor' class. i.e. strsplit(as.character(x1$Genes, ','))
注意:如果“基因”列是“因子”,则转换为“字符”,因为strsplit不能使用“因子”类。即strsplit(如。字符(x1基因,美元','))
Update
Based on the new dataset for 'x2', we can merge
the two datasets by the 'chr' column, strsplit
the 'Genes.x', 'Genes.y' from the output dataset ('xNew'), get the logical index based on the occurrence of any element of 'Genes.x' in 'Genes.y' strings, use that to subset the 'x2' dataset
基于“x2”的新数据集,我们可以通过“chr”列合并这两个数据集,跨出“基因”。x”、“基因。y'从输出数据集('xNew')中,根据'Genes的任何元素的出现获得逻辑索引。x '在'基因。y' strings,用来将'x2'数据集子集
xNew <- merge(x1, x2[,c(1,4)], by='chr')
indx <- mapply(function(x,y) any(x %in% y),
strsplit(xNew$Genes.x, ','), strsplit(xNew$Genes.y, ','))
x2[!indx,]
# chr start end Genes
#1 1 8861 8868 Olfr1193
#3 2 8501 8520 Chia,Chi3l3,Chi3l4
#4 3 4321 4670 Tdpoz4,Tdpoz3,Tdpoz5
#1
3
You could try
你可以试试
x2[mapply(function(x,y) !any(x %in% y),
strsplit(x1$Genes, ','), strsplit(x2$Genes, ',')),]
# chr start end Genes
#2 2 8501 8520 Chia,Chi3l3,Chi3l4
#3 3 4321 4670 Tdpoz4,Tdpoz3,Tdpoz5
Or replace !any(x %in% y)
with length(intersect(x,y))==0
.
或者替换!任何(% y中的x %)的长度(相交于(x,y))= 0。
NOTE: If the "Genes" column is "factor", convert it to "character" as strsplit
cannot take 'factor' class. i.e. strsplit(as.character(x1$Genes, ','))
注意:如果“基因”列是“因子”,则转换为“字符”,因为strsplit不能使用“因子”类。即strsplit(如。字符(x1基因,美元','))
Update
Based on the new dataset for 'x2', we can merge
the two datasets by the 'chr' column, strsplit
the 'Genes.x', 'Genes.y' from the output dataset ('xNew'), get the logical index based on the occurrence of any element of 'Genes.x' in 'Genes.y' strings, use that to subset the 'x2' dataset
基于“x2”的新数据集,我们可以通过“chr”列合并这两个数据集,跨出“基因”。x”、“基因。y'从输出数据集('xNew')中,根据'Genes的任何元素的出现获得逻辑索引。x '在'基因。y' strings,用来将'x2'数据集子集
xNew <- merge(x1, x2[,c(1,4)], by='chr')
indx <- mapply(function(x,y) any(x %in% y),
strsplit(xNew$Genes.x, ','), strsplit(xNew$Genes.y, ','))
x2[!indx,]
# chr start end Genes
#1 1 8861 8868 Olfr1193
#3 2 8501 8520 Chia,Chi3l3,Chi3l4
#4 3 4321 4670 Tdpoz4,Tdpoz3,Tdpoz5