This question already has an answer here:
这个问题在这里已有答案:
- How to join (merge) data frames (inner, outer, left, right)? 12 answers
- 如何加入(合并)数据框(内部,外部,左侧,右侧)? 12个答案
I have a data.frame1:
我有一个data.frame1:
BIN CHR BP1 BP2 Score Value
12 chr1 29123222 29454711 -5.7648 599
116 chr13 45799118 45986770 -4.8403 473
117 chr5 46327104 46490961 -5.3036 536
121 chr6 50780759 51008404 -4.4165 415
133 chr18 63634657 63864734 -4.8096 469
147 chr1 77825305 78062178 -5.4671 559
I have a second data.frame2 like:
我有第二个data.frame2像:
CHR SNP A1 A2 BP INFO OR SE P NGT
chr1 rs10900604 A G 29204555 0.774 1.01582 0.0143 0.2723 0
chr3 rs12132517 A G 79880711 0.604 0.98334 0.0253 0.5071 2
chr14 rs11240777 A G 79895429 0.818 0.98817 0.0139 0.3907 27
chr18 rs147634896 T C 63789900 0.623 1.02634 0.0259 0.3161 0
chr6 rs143609865 A T 77934001 0.617 1.01562 0.0317 0.6254 0
I am interested in keeping all the rows in data.frame2 that match the following crirteria: they much have the same CHR and a BP value between BP1 and BP2 of any of the rows in data.frame1.
我有兴趣保留data.frame2中与以下crirteria匹配的所有行:它们在data.frame1中的任何行的BP1和BP2之间具有相同的CHR和BP值。
For example, row one of data.frame2 has "chr1" and also has a "BP" between one of the BP ranges from data.frame1. Notice it doesn't fall into row 7's range, but it does fall into row 1's range. I would thus like to keep this row in data.frame2
例如,data.frame2的第一行具有“chr1”,并且在data.frame1之一的BP范围之间也具有“BP”。请注意,它不属于第7行的范围,但它确实属于第1行的范围。我想将此行保留在data.frame2中
Another example, row 4 of data.frame2 has "chr18" and BP 63789900 that falls within the BP range (between BP1 and BP2) of row 5 in data.frame1. I would thus like to keep this row in data.frame2
另一个例子,data.frame2的第4行具有“chr18”和BP 63789900,它们属于data.frame1中第5行的BP范围(BP1和BP2之间)。我想将此行保留在data.frame2中
Final example. Notice that row 5 in data.frame2 has a BP 77934001 that falls within BP1 and BP2 range for row 6 in data.frame1. Yet in data.frame2 "chr6" does not match with "chr1". I would like to delete this row.
最后的例子。请注意,data.frame2中的第5行具有BP 77934001,该数据位于data.frame1中第6行的BP1和BP2范围内。然而在data.frame2中,“chr6”与“chr1”不匹配。我想删除这一行。
I would also like to delete all the other rows that don't match both CHR and BP range at the same time.
我还想删除同时与CHR和BP范围不匹配的所有其他行。
I was thinking maybe if loop that had CHR1=CHR2, and BP>BP1 and BP
我想的可能是循环有CHR1 = CHR2,BP> BP1和BP
2 个解决方案
#1
4
This ought to work using base R:
这应该使用基数R:
# merge the relevant data
dfmerge = merge(df1[c("CHR", "BP1", "BP2")], df2, by = "CHR")
# delete unwanted rows
dfmerge = dfmerge[(dfmerge$BP > dfmerge$BP1 & dfmerge$BP < dfmerge$BP2),]
# clean up columns
dfmerge[c("BP1", "BP2")] = list(NULL)
In generally, SQL can do this nice and concisely:
通常,SQL可以做到这一点,简洁明了:
library(sqldf)
sqldf("select df2.*
from df2 inner join df1
on df2.CHR = df1.CHR
and df2.BP between df1.BP1 and df2.BP2")
#2
3
Here is a dplyr
approach:
这是一个dplyr方法:
d %>%
left_join(d2) %>%
filter(BP >= BP1 & BP <= BP2)
#1
4
This ought to work using base R:
这应该使用基数R:
# merge the relevant data
dfmerge = merge(df1[c("CHR", "BP1", "BP2")], df2, by = "CHR")
# delete unwanted rows
dfmerge = dfmerge[(dfmerge$BP > dfmerge$BP1 & dfmerge$BP < dfmerge$BP2),]
# clean up columns
dfmerge[c("BP1", "BP2")] = list(NULL)
In generally, SQL can do this nice and concisely:
通常,SQL可以做到这一点,简洁明了:
library(sqldf)
sqldf("select df2.*
from df2 inner join df1
on df2.CHR = df1.CHR
and df2.BP between df1.BP1 and df2.BP2")
#2
3
Here is a dplyr
approach:
这是一个dplyr方法:
d %>%
left_join(d2) %>%
filter(BP >= BP1 & BP <= BP2)