R:索引是正则表达式的结果。

I am trying to use the indexes that were returned from searching through a string for every instance of a character. When I use gregexp (pattern, text),

我正在尝试使用通过字符串搜索每个字符实例返回的索引。当我使用gregexp(模式，文本)时，

lookfor<-"n"
string<-"ATTnGGCnATTn"
gregexpr(pattern=lookfor,text=string)

I get the following:

我得到以下:

[[1]]

[1]  4  8  12

attr(,"match.length")

[1] 1 1 1

attr(,"useBytes")

[1] TRUE

How do I index through the first line to be able to use those locations? Thank you in advance for your help!

如何通过第一行索引才能使用这些位置?感谢您的帮助!

4 个解决方案

#1

Addition (2) : After thinking about this for a while, I came to the conclusion that you could have simply used unlist on your original gregexpr call

加法(2):考虑了一会儿之后，我得出结论，您可以在最初的gregexpr调用中使用unlist

> unlist(gregexpr("n", string))
# [1]  4  8 12

From your comment

从你的评论

I am looking for the position of each letter n

我在寻找每个字母n的位置

it follows that you could do any of these:

你可以这样做:

> which(strsplit(string, "")[[1]] == "n")
# [1]  4  8 12
> cumsum(nchar(strsplit(string, "n")[[1]])+1)
# [1]  4  8 12
> nc <- 1:nchar(string)
> which(substring(string, nc, nc) == "n")
# [1]  4  8 12

Addition (1) in regards to the similar strings (comment in another answer) : You could use strsplit again, and locate those values with one of the methods above

附加(1)关于类似的字符串(注释在另一个答案中):您可以再次使用strsplit，并使用上述方法之一定位这些值

> string2 <- "ATTTGGCCATTG"
> w <- which(strsplit(string, "")[[1]] == "n")
> strsplit(string2, "")[[1]][w]
[1] "T" "C" "G"

#2

If you want to extract all the matches, you can use the builtin function regmatches()

如果要提取所有匹配项，可以使用内置函数regmatches()

m <- gregexpr(regexp,string)
regmatches(string,m)

This will return a list of character vectors because string can be greater than length 1. If you're only passing one string in, you can get at the vector of matches bypassing the list with

这将返回一个字符向量列表，因为字符串的长度可以大于1。如果您只传入一个字符串，您可以通过传递列表获得匹配向量

regmatches(string,m)[[1]]

#3

Here is a step-by-step method to find the indices. I suspect there are more efficient ways to achieve the same result. The argument fixed = TRUE tells R to look for the literal lower case "n" rather than treat it as a regular expression.

下面是一步一步地找到指标的方法。我怀疑有更有效的方法来达到同样的结果。参数fixed = TRUE告诉R查找小写的“n”而不是把它当作正则表达式。

Having done so, the [[1]] portion at the end retains only the indices element of the list

这样做之后，最后的[[1]]部分只保留列表中的索引元素

To show all indices, use the length function.

要显示所有指标，请使用length函数。

string="ATTnGGCnATTn"
index  <- gregexpr(pattern = "n", text = string, fixed = TRUE)[[1]] 
first.index  <- index[1:length(index)]

#4

I think I figured out my own answer! Using the Biostrings package in Bioconductor:

我想我找到了自己的答案!在生物导体中使用生物字符串包装:

string<-"ATTnGGCnATTn"  
matches<-matchPattern(pattern="n", subject=string)
m<-as.vector(ranges(matches))

Now I can call up each position of of "n" in all similar strings. Thank you for those that took the time to answer.

现在我可以调用所有相似字符串中n的每个位置。谢谢你们花时间回答我的问题。

#1