查找Python中所有子字符串的出现。

时间:2022-04-25 19:19:50

Python has string.find() and string.rfind() to get the index of a substring in string.

Python有string.find()和string.rfind()来获取字符串中子字符串的索引。

I wonder, maybe there is something like string.find_all() which can return all founded indexes (not only first from beginning or first from end)?

我想知道,也许有一些像string.find_all()这样的东西可以返回所有创建的索引(不仅是从头开始,还是从头开始)?

For example:

例如:

string = "test test test test"

print string.find('test') # 0
print string.rfind('test') # 15

#that's the goal
print string.find_all('test') # [0,5,10,15]

13 个解决方案

#1


356  

There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:

没有简单的内置字符串函数可以实现您想要的功能,但是您可以使用更强大的正则表达式:

>>> import re
>>> [m.start() for m in re.finditer('test', 'test test test test')]
[0, 5, 10, 15]

If you want to find overlapping matches, lookahead will do that:

如果你想找到重叠的匹配,前面的将会这样做:

>>> [m.start() for m in re.finditer('(?=tt)', 'ttt')]
[0, 1]

If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:

如果你想要一个反向的搜索-所有的都没有重叠,你可以把积极和消极的展望变成这样的一个表达:

>>> search = 'tt'
>>> [m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
[1]

re.finditer returns a generator, so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you're only iterating through the results once.

finditer返回一个生成器,因此您可以将上面的[]更改为()来获得一个生成器,而不是一个列表,如果您只遍历结果一次,它将更有效。

#2


76  

>>> help(str.find)
Help on method_descriptor:

find(...)
    S.find(sub [,start [,end]]) -> int

Thus, we can build it ourselves:

因此,我们可以自己建造:

def find_all(a_str, sub):
    start = 0
    while True:
        start = a_str.find(sub, start)
        if start == -1: return
        yield start
        start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

No temporary strings or regexes required.

不需要临时字符串或regex。

#3


33  

Here's a (very inefficient) way to get all (i.e. even overlapping) matches:

这里有一种(非常低效的)方法来获得所有(甚至是重叠的)匹配:

>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]

#4


17  

You can use re.finditer() for non-overlapping matches.

对于不重叠的匹配,可以使用re.finditer()。

>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]

but won't work for:

但是不工作:

In [1]: aString="ababa"

In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]

#5


15  

Come, let us recurse together.

来,让我们一起递归。

def locations_of_substring(string, substring):
    """Return a list of locations of a substring."""

    substring_length = len(substring)    
    def recurse(locations_found, start):
        location = string.find(substring, start)
        if location != -1:
            return recurse(locations_found + [location], location+substring_length)
        else:
            return locations_found

    return recurse([], 0)

print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]

No need for regular expressions this way.

不需要这样的正则表达式。

#6


12  

Again, old thread, but here's my solution using a generator and plain str.find.

同样,旧线程,但这里是我的解决方案,使用生成器和简单的字符串。

def findall(p, s):
    '''Yields all the positions of
    the pattern p in the string s.'''
    i = s.find(p)
    while i != -1:
        yield i
        i = s.find(p, i+1)

Example

x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]

returns

返回

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]

#7


7  

If you're just looking for a single character, this would work:

如果你只是在寻找一个角色,这是可行的:

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7

Also,

同时,

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4

My hunch is that neither of these (especially #2) is terribly performant.

我的直觉是,这两种(尤其是第2种)都不是很出色的表现。

#8


7  

this is an old thread but i got interested and wanted to share my solution.

这是一个旧的线程,但是我感兴趣并且想要分享我的解决方案。

def find_all(a_string, sub):
    result = []
    k = 0
    while k < len(a_string):
        k = a_string.find(sub, k)
        if k == -1:
            return result
        else:
            result.append(k)
            k += 1 #change to k += len(sub) to not search overlapping results
    return result

It should return a list of positions where the substring was found. Please comment if you see an error or room for improvment.

它应该返回找到子字符串的位置列表。如果你看到一个错误或改进的空间,请评论。

#9


2  

This thread is a little old but this worked for me:

这条线有点旧,但这对我有用:

numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"

marker = 0
while marker < len(numberString):
    try:
        print(numberString.index("five",marker))
        marker = numberString.index("five", marker) + 1
    except ValueError:
        print("String not found")
        marker = len(numberString)

#10


0  

Whatever the solutions provided by others are completely based on the available method find() or any available methods.

不管其他人提供的解决方案是完全基于可用的方法find()或任何可用的方法。

What is the core basic algorithm to find all the occurrences of a substring in a string?

在字符串中找到子字符串的所有出现的核心基本算法是什么?

def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

You can also inherit str class to new class and can use this function below.

您也可以将str类继承到新类,并可以使用下面的函数。

class newstr(str):
def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

Calling the method

调用的方法

newstr.find_all('Do you find this answer helpful? then upvote this!','this')

newstr。find_all(“你觉得这个答案有用吗?”然后upvote !”、“这”)

#11


0  

You can try :

你可以尝试:

>>> string = "test test test test"
>>> for index,value in enumerate(string):
    if string[index:index+(len("test"))] == "test":
        print index

0
5
10
15

#12


-1  

The pythonic way would be:

python的方式是:

mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]

# s represents the search string
# c represents the character string

find_all(mystring,'o')    # will return all positions of 'o'

[4, 7, 20, 26] 
>>> 

#13


-2  

please look at below code

请看下面的代码。

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''


def get_substring_indices(text, s):
    result = [i for i in range(len(text)) if text.startswith(s, i)]
    return result


if __name__ == '__main__':
    text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
    s = 'wood'
    print get_substring_indices(text, s)

#1


356  

There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:

没有简单的内置字符串函数可以实现您想要的功能,但是您可以使用更强大的正则表达式:

>>> import re
>>> [m.start() for m in re.finditer('test', 'test test test test')]
[0, 5, 10, 15]

If you want to find overlapping matches, lookahead will do that:

如果你想找到重叠的匹配,前面的将会这样做:

>>> [m.start() for m in re.finditer('(?=tt)', 'ttt')]
[0, 1]

If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:

如果你想要一个反向的搜索-所有的都没有重叠,你可以把积极和消极的展望变成这样的一个表达:

>>> search = 'tt'
>>> [m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
[1]

re.finditer returns a generator, so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you're only iterating through the results once.

finditer返回一个生成器,因此您可以将上面的[]更改为()来获得一个生成器,而不是一个列表,如果您只遍历结果一次,它将更有效。

#2


76  

>>> help(str.find)
Help on method_descriptor:

find(...)
    S.find(sub [,start [,end]]) -> int

Thus, we can build it ourselves:

因此,我们可以自己建造:

def find_all(a_str, sub):
    start = 0
    while True:
        start = a_str.find(sub, start)
        if start == -1: return
        yield start
        start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

No temporary strings or regexes required.

不需要临时字符串或regex。

#3


33  

Here's a (very inefficient) way to get all (i.e. even overlapping) matches:

这里有一种(非常低效的)方法来获得所有(甚至是重叠的)匹配:

>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]

#4


17  

You can use re.finditer() for non-overlapping matches.

对于不重叠的匹配,可以使用re.finditer()。

>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]

but won't work for:

但是不工作:

In [1]: aString="ababa"

In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]

#5


15  

Come, let us recurse together.

来,让我们一起递归。

def locations_of_substring(string, substring):
    """Return a list of locations of a substring."""

    substring_length = len(substring)    
    def recurse(locations_found, start):
        location = string.find(substring, start)
        if location != -1:
            return recurse(locations_found + [location], location+substring_length)
        else:
            return locations_found

    return recurse([], 0)

print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]

No need for regular expressions this way.

不需要这样的正则表达式。

#6


12  

Again, old thread, but here's my solution using a generator and plain str.find.

同样,旧线程,但这里是我的解决方案,使用生成器和简单的字符串。

def findall(p, s):
    '''Yields all the positions of
    the pattern p in the string s.'''
    i = s.find(p)
    while i != -1:
        yield i
        i = s.find(p, i+1)

Example

x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]

returns

返回

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]

#7


7  

If you're just looking for a single character, this would work:

如果你只是在寻找一个角色,这是可行的:

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7

Also,

同时,

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4

My hunch is that neither of these (especially #2) is terribly performant.

我的直觉是,这两种(尤其是第2种)都不是很出色的表现。

#8


7  

this is an old thread but i got interested and wanted to share my solution.

这是一个旧的线程,但是我感兴趣并且想要分享我的解决方案。

def find_all(a_string, sub):
    result = []
    k = 0
    while k < len(a_string):
        k = a_string.find(sub, k)
        if k == -1:
            return result
        else:
            result.append(k)
            k += 1 #change to k += len(sub) to not search overlapping results
    return result

It should return a list of positions where the substring was found. Please comment if you see an error or room for improvment.

它应该返回找到子字符串的位置列表。如果你看到一个错误或改进的空间,请评论。

#9


2  

This thread is a little old but this worked for me:

这条线有点旧,但这对我有用:

numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"

marker = 0
while marker < len(numberString):
    try:
        print(numberString.index("five",marker))
        marker = numberString.index("five", marker) + 1
    except ValueError:
        print("String not found")
        marker = len(numberString)

#10


0  

Whatever the solutions provided by others are completely based on the available method find() or any available methods.

不管其他人提供的解决方案是完全基于可用的方法find()或任何可用的方法。

What is the core basic algorithm to find all the occurrences of a substring in a string?

在字符串中找到子字符串的所有出现的核心基本算法是什么?

def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

You can also inherit str class to new class and can use this function below.

您也可以将str类继承到新类,并可以使用下面的函数。

class newstr(str):
def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

Calling the method

调用的方法

newstr.find_all('Do you find this answer helpful? then upvote this!','this')

newstr。find_all(“你觉得这个答案有用吗?”然后upvote !”、“这”)

#11


0  

You can try :

你可以尝试:

>>> string = "test test test test"
>>> for index,value in enumerate(string):
    if string[index:index+(len("test"))] == "test":
        print index

0
5
10
15

#12


-1  

The pythonic way would be:

python的方式是:

mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]

# s represents the search string
# c represents the character string

find_all(mystring,'o')    # will return all positions of 'o'

[4, 7, 20, 26] 
>>> 

#13


-2  

please look at below code

请看下面的代码。

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''


def get_substring_indices(text, s):
    result = [i for i in range(len(text)) if text.startswith(s, i)]
    return result


if __name__ == '__main__':
    text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
    s = 'wood'
    print get_substring_indices(text, s)