找出一个正则表达式在Python中匹配一个字符串的次数

时间:2021-04-25 23:59:07

Is there a way that I can find out how many matches of a regex are in a string in Python? For example, if I have the string "It actually happened when it acted out of turn."

有没有一种方法可以让我知道在Python中一个字符串中有多少匹配的正则表达式?例如,如果我有一个字符串“它实际上是在它不正常的时候发生的。”

I want to know how many times "t a" appears in the string. In that string, "t a" appears twice. I want my function to tell me it appeared twice. Is this possible?

我想知道在字符串中出现了多少次t a。在这个字符串中,“t a”出现了两次。我想让函数告诉我它出现了两次。这是可能的吗?

7 个解决方案

#1


18  

The existing solutions based on findall are fine for non-overlapping matches (and no doubt optimal except maybe for HUGE number of matches), although alternatives such as sum(1 for m in re.finditer(thepattern, thestring)) (to avoid ever materializing the list when all you care about is the count) are also quite possible. Somewhat idiosyncratic would be using subn and ignoring the resulting string...:

基于findall的现有解决方案对于非重叠的匹配(毫无疑问是最优的,除了大量的匹配)是可以的,尽管还有其他的选择,比如sum(re.finditer中的m为1)(thepattern, thestring)(在您只关心计数的情况下避免实现列表)也是完全可能的。有些特殊的做法是使用subn而忽略结果字符串……

def countnonoverlappingrematches(pattern, thestring):
  return re.subn(pattern, '', thestring)[1]

the only real advantage of this latter idea would come if you only cared to count (say) up to 100 matches; then, re.subn(pattern, '', thestring, 100)[1] might be practical (returning 100 whether there are 100 matches, or 1000, or even larger numbers).

后一种想法的唯一真正好处是,如果你只想数100个匹配;然后,re.subn(pattern, thestring, 100)[1]可能是实用的(如果有100个匹配,或者1000个,甚至更大的数字,返回100)。

Counting overlapping matches requires you to write more code, because the built-in functions in question are all focused on NON-overlapping matches. There's also a problem of definition, e.g, with pattern being 'a+' and thestring being 'aa', would you consider this to be just one match, or three (the first a, the second one, both of them), or...?

计算重叠匹配需要编写更多的代码,因为所涉及的内置函数都关注于非重叠匹配。还有一个定义问题,e。g,图案是a+,字符串是aa,你认为这是一个匹配,还是三个(第一个a,第二个,两个都是),还是…?

Assuming for example that you want possibly-overlapping matches starting at distinct spots in the string (which then would give TWO matches for the example in the previous paragraph):

假设您想要可能的重叠匹配,从字符串中不同的位置开始(然后会为前一段中的示例提供两个匹配):

def countoverlappingdistinct(pattern, thestring):
  total = 0
  start = 0
  there = re.compile(pattern)
  while True:
    mo = there.search(thestring, start)
    if mo is None: return total
    total += 1
    start = 1 + mo.start()

Note that you do have to compile the pattern into a RE object in this case: function re.search does not accept a start argument (starting position for the search) the way method search does, so you'd have to be slicing thestring as you go -- definitely more effort than just having the next search start at the next possible distinct starting point, which is what I'm doing in this function.

请注意,你必须编译模式为RE对象在这种情况下:函数re.search不接受一个开始参数(起始位置的搜索)方法搜索做的方式,所以你必须切thestring当你肯定——更多的努力不仅仅是拥有下一个搜索开始下一个可能的不同的起点,这是我在做什么在这个函数。

#2


33  

import re
len(re.findall(pattern, string_to_search))

#3


10  

I know this is a question about regex. I just thought I'd mention the count method for future reference if someone wants a non-regex solution.

我知道这是关于regex的问题。我只是想,如果有人想要一个非regex解决方案,我就会提到计数方法以备将来参考。

>>> s = "It actually happened when it acted out of turn."
>>> s.count('t a')
2

Which return the number of non-overlapping occurrences of the substring

哪个返回子字符串非重叠的次数?

#4


7  

Have you tried this?

你有试过吗?

 len( pattern.findall(source) )

#5


7  

You can find overlapping matches by using a noncapturing subpattern:

您可以通过使用一个非捕获子模式找到重叠匹配:

def count_overlapping(pattern, string):
    return len(re.findall("(?=%s)" % pattern, string))

#6


1  

import re
print len(re.findall(r'ab',u'ababababa'))

#7


1  

To avoid creating a list of matches one may also use re.sub with a callable as replacement. It will be called on each match, incrementing internal counter.

为了避免创建匹配列表,还可以使用re.sub,并使用可调用作为替换。它将在每个匹配上被调用,增加内部计数器。

class Counter(object):
    def __init__(self):
        self.matched = 0
    def __call__(self, matchobj):
        self.matched += 1

counter = Counter()
re.sub(some_pattern, counter, text)

print counter.matched

#1


18  

The existing solutions based on findall are fine for non-overlapping matches (and no doubt optimal except maybe for HUGE number of matches), although alternatives such as sum(1 for m in re.finditer(thepattern, thestring)) (to avoid ever materializing the list when all you care about is the count) are also quite possible. Somewhat idiosyncratic would be using subn and ignoring the resulting string...:

基于findall的现有解决方案对于非重叠的匹配(毫无疑问是最优的,除了大量的匹配)是可以的,尽管还有其他的选择,比如sum(re.finditer中的m为1)(thepattern, thestring)(在您只关心计数的情况下避免实现列表)也是完全可能的。有些特殊的做法是使用subn而忽略结果字符串……

def countnonoverlappingrematches(pattern, thestring):
  return re.subn(pattern, '', thestring)[1]

the only real advantage of this latter idea would come if you only cared to count (say) up to 100 matches; then, re.subn(pattern, '', thestring, 100)[1] might be practical (returning 100 whether there are 100 matches, or 1000, or even larger numbers).

后一种想法的唯一真正好处是,如果你只想数100个匹配;然后,re.subn(pattern, thestring, 100)[1]可能是实用的(如果有100个匹配,或者1000个,甚至更大的数字,返回100)。

Counting overlapping matches requires you to write more code, because the built-in functions in question are all focused on NON-overlapping matches. There's also a problem of definition, e.g, with pattern being 'a+' and thestring being 'aa', would you consider this to be just one match, or three (the first a, the second one, both of them), or...?

计算重叠匹配需要编写更多的代码,因为所涉及的内置函数都关注于非重叠匹配。还有一个定义问题,e。g,图案是a+,字符串是aa,你认为这是一个匹配,还是三个(第一个a,第二个,两个都是),还是…?

Assuming for example that you want possibly-overlapping matches starting at distinct spots in the string (which then would give TWO matches for the example in the previous paragraph):

假设您想要可能的重叠匹配,从字符串中不同的位置开始(然后会为前一段中的示例提供两个匹配):

def countoverlappingdistinct(pattern, thestring):
  total = 0
  start = 0
  there = re.compile(pattern)
  while True:
    mo = there.search(thestring, start)
    if mo is None: return total
    total += 1
    start = 1 + mo.start()

Note that you do have to compile the pattern into a RE object in this case: function re.search does not accept a start argument (starting position for the search) the way method search does, so you'd have to be slicing thestring as you go -- definitely more effort than just having the next search start at the next possible distinct starting point, which is what I'm doing in this function.

请注意,你必须编译模式为RE对象在这种情况下:函数re.search不接受一个开始参数(起始位置的搜索)方法搜索做的方式,所以你必须切thestring当你肯定——更多的努力不仅仅是拥有下一个搜索开始下一个可能的不同的起点,这是我在做什么在这个函数。

#2


33  

import re
len(re.findall(pattern, string_to_search))

#3


10  

I know this is a question about regex. I just thought I'd mention the count method for future reference if someone wants a non-regex solution.

我知道这是关于regex的问题。我只是想,如果有人想要一个非regex解决方案,我就会提到计数方法以备将来参考。

>>> s = "It actually happened when it acted out of turn."
>>> s.count('t a')
2

Which return the number of non-overlapping occurrences of the substring

哪个返回子字符串非重叠的次数?

#4


7  

Have you tried this?

你有试过吗?

 len( pattern.findall(source) )

#5


7  

You can find overlapping matches by using a noncapturing subpattern:

您可以通过使用一个非捕获子模式找到重叠匹配:

def count_overlapping(pattern, string):
    return len(re.findall("(?=%s)" % pattern, string))

#6


1  

import re
print len(re.findall(r'ab',u'ababababa'))

#7


1  

To avoid creating a list of matches one may also use re.sub with a callable as replacement. It will be called on each match, incrementing internal counter.

为了避免创建匹配列表,还可以使用re.sub,并使用可调用作为替换。它将在每个匹配上被调用,增加内部计数器。

class Counter(object):
    def __init__(self):
        self.matched = 0
    def __call__(self, matchobj):
        self.matched += 1

counter = Counter()
re.sub(some_pattern, counter, text)

print counter.matched