This question already has an answer here:
这个问题在这里已有答案:
- Finding multiple occurrences of a string within a string in Python 18 answers
- 在Python 18中找到字符串中多次出现的字符串答案
I´m working on an assigment in Python, and I have a question if you could answer. I wanna write a function that returns a list with the locations of the first nucleotide of all occurrences of "ATG" in the sequence. For example, we can say that our DNA sequence is AATGCATGC. We see that ATG can start in the index 1, and the other possibility is index 5. I tried this one to solve this assignment;
我正在使用Python进行分配,如果你能回答,我有一个问题。我想编写一个函数,该函数返回一个列表,其中包含序列中所有出现的“ATG”的第一个核苷酸的位置。例如,我们可以说我们的DNA序列是AATGCATGC。我们看到ATG可以从索引1开始,另一种可能性是索引5.我试过这个来解决这个问题。
dna = "AATGCATGC"
starting_offset = dna.index("ATG")
print(starting_offset)
The result I´ve got is 1. But I wanna get result as [1, 5]
我得到的结果是1.但我想得到的结果为[1,5]
So how should I write this function for all occurrences?
那么我该如何为所有事件编写这个函数呢?
Thanks for helping me :)
谢谢你帮助我:)
2 个解决方案
#1
2
Using regular expressions, you can use re.finditer to find all occurences:
使用正则表达式,您可以使用re.finditer查找所有出现的事件:
You can try this function :
你可以试试这个功能:
import re
text = 'AATGCATGC'
pattern='ATG'
def getIndexes (text,pattern):
list=[index.start() for index in re.finditer('ATG', text)]
return list
getIndexes(text,pattern)
>>[1, 5]
It will gives you the list you're looking for . Hope that'll be helpful !
它会为您提供您正在寻找的列表。希望这会有所帮助!
#2
0
If you want something to think about, analyse this:
如果你想要考虑一些事情,请分析一下:
def GetMultipleInString(dna, term):
# computing end condition 0
if (term not in dna):
print (dna + " does not contain the term " + term)
return []
# start of list of lists of 2 elements: index, rest
result = [[None,dna]]
# we look for the index in the rest, need to keep track how much we
# shortened the string in total so far to get index in complete string
totalIdx = 0
# we look at the last element of the list until it's length is shorter
# than the term we look for (end of computing condition 1)
termLen = len(term)
while len(result[-1][1]) >= termLen:
# get the last element
last = result[-1][1]
try:
# find our term, if not found -> exception
idx = last.index(term)
# partition "abcdefg" with "c" -> ("ab","c", "defg")
# we take only the remaining
rest = last.partition(term)[2]
# we compute the total index, and put it in our result
result.append( [idx+totalIdx , rest] )
totalIdx += idx+termLen
except:
result.append([None,last])
break
# any results found that are not none?
if (any( x[0] != None for x in result)):
print (dna + " contains the term " + term + " at positions:"),
# get only indexes from our results
rv = [ str(x[0]) for x in result if x[0] != None]
print (' '.join(rv))
return rv
else:
print (dna + " does not contain the term " + term)
return []
print("_----------------------------------_")
myDna = "AATGCATGC"
res1 = GetMultipleInString(myDna,"ATG")
print(res1)
res2 = GetMultipleInString(myDna,"A")
print(res2)
#1
2
Using regular expressions, you can use re.finditer to find all occurences:
使用正则表达式,您可以使用re.finditer查找所有出现的事件:
You can try this function :
你可以试试这个功能:
import re
text = 'AATGCATGC'
pattern='ATG'
def getIndexes (text,pattern):
list=[index.start() for index in re.finditer('ATG', text)]
return list
getIndexes(text,pattern)
>>[1, 5]
It will gives you the list you're looking for . Hope that'll be helpful !
它会为您提供您正在寻找的列表。希望这会有所帮助!
#2
0
If you want something to think about, analyse this:
如果你想要考虑一些事情,请分析一下:
def GetMultipleInString(dna, term):
# computing end condition 0
if (term not in dna):
print (dna + " does not contain the term " + term)
return []
# start of list of lists of 2 elements: index, rest
result = [[None,dna]]
# we look for the index in the rest, need to keep track how much we
# shortened the string in total so far to get index in complete string
totalIdx = 0
# we look at the last element of the list until it's length is shorter
# than the term we look for (end of computing condition 1)
termLen = len(term)
while len(result[-1][1]) >= termLen:
# get the last element
last = result[-1][1]
try:
# find our term, if not found -> exception
idx = last.index(term)
# partition "abcdefg" with "c" -> ("ab","c", "defg")
# we take only the remaining
rest = last.partition(term)[2]
# we compute the total index, and put it in our result
result.append( [idx+totalIdx , rest] )
totalIdx += idx+termLen
except:
result.append([None,last])
break
# any results found that are not none?
if (any( x[0] != None for x in result)):
print (dna + " contains the term " + term + " at positions:"),
# get only indexes from our results
rv = [ str(x[0]) for x in result if x[0] != None]
print (' '.join(rv))
return rv
else:
print (dna + " does not contain the term " + term)
return []
print("_----------------------------------_")
myDna = "AATGCATGC"
res1 = GetMultipleInString(myDna,"ATG")
print(res1)
res2 = GetMultipleInString(myDna,"A")
print(res2)