I have a translation table (trans_df
):
我有一个翻译表(trans_df):
trans_df <- read.table(text = "rs1065852 rs201377835 rs28371706 rs5030655 rs5030865 rs3892097 rs35742686 rs5030656 rs5030867 rs28371725 rs59421388
G C G A C C T CTT T C C
G C G A C C del CTT T C C
A C G A C T T CTT T C C
del del del del del del del del del del del
G C G del C C T CTT T C C
G C G A C C T CTT G C C
G C G A C C T del T C C
A C G A C C T CTT T C C
G C A A C C T CTT T C C
G C G A C C T CTT T C T
G C G A C C T CTT T T C",header=TRUE, stringsAsFactors = FALSE, colClasses = "character")
and input
:
并输入:
input <- read.table(text = "rs1065852 rs201377835 rs28371706 rs5030655 rs5030865 rs3892097 rs35742686 rs5030656 rs5030867 rs28371725 rs59421388
+ G|A C G|A A C T|C T CTT T C|T C", header = TRUE, stringsAsFactors = FALSE, colClasses = "character")
I want to find the input row in the trans_df using regular expression. I have achieved it by position:
我想使用正则表达式在trans_df中找到输入行。我按位置实现了它:
Reduce(intersect,lapply(seq(1, ncol(trans_df)),
function(i) {grep(pattern = input[, i],
trans_df[, i])}))
Is there any way to do this where pattern = input? Please advise.
有没有办法在pattern = input中执行此操作?请指教。
2 个解决方案
#1
1
You can use Map
to achieve that, i.e.
您可以使用Mapto实现该目标,即
Map(grep, input, trans_df)
However, that makes the assumption that your columns match one-on-one. If that does not stand, then you can use match
to make them the same, i.e.
但是,这会假设您的列与一对一匹配。如果这不成立,那么你可以使用匹配使它们相同,即
Map(grep, input[match(names(input), names(trans_df))], trans_df)
#or in the same sense and to keep input intact,
Map(grep, input, trans_df[match(names(trans_df), names(input))])
However, I think that would beat your purpose though.
但是,我认为这会打败你的目的。
#2
1
I would just use subset()
here and pass it the criteria for a matching row. In this case, the criteria involves checking each column in the data frame against a set of known values. Assuming that input is a named vector, we can try the following code:
我只想在这里使用subset()并传递匹配行的条件。在这种情况下,标准涉及根据一组已知值检查数据框中的每一列。假设输入是命名向量,我们可以尝试以下代码:
subset(trans_df, rs1065852 == input["rs1065852"] & rs201377835 == input["rs201377835"] &
... & rs59421388 == input["rs59421388"])
#1
1
You can use Map
to achieve that, i.e.
您可以使用Mapto实现该目标,即
Map(grep, input, trans_df)
However, that makes the assumption that your columns match one-on-one. If that does not stand, then you can use match
to make them the same, i.e.
但是,这会假设您的列与一对一匹配。如果这不成立,那么你可以使用匹配使它们相同,即
Map(grep, input[match(names(input), names(trans_df))], trans_df)
#or in the same sense and to keep input intact,
Map(grep, input, trans_df[match(names(trans_df), names(input))])
However, I think that would beat your purpose though.
但是,我认为这会打败你的目的。
#2
1
I would just use subset()
here and pass it the criteria for a matching row. In this case, the criteria involves checking each column in the data frame against a set of known values. Assuming that input is a named vector, we can try the following code:
我只想在这里使用subset()并传递匹配行的条件。在这种情况下,标准涉及根据一组已知值检查数据框中的每一列。假设输入是命名向量,我们可以尝试以下代码:
subset(trans_df, rs1065852 == input["rs1065852"] & rs201377835 == input["rs201377835"] &
... & rs59421388 == input["rs59421388"])