
时间:2022-09-13 16:28:19

I'm using python and regex (new to both) to find sequence of chars in a string as follows: Grab the first instance of p followed by any number (It'll always be in the form of p_ _ where _ and _ will be integers). Then either find an 's' or a 'go' then all integers till the end of the string. For example:

我正在使用python和regex(两者都是新的)来查找字符串中的字符序列,如下所示:抓住p的第一个实例,后跟任意数字(它总是以p _ _的形式,其中_和_将是整数)。然后找到's'或'go'然后找到所有整数直到字符串的结尾。例如:


should yield p12 go 12 23.

应该产生p12去12 23。


should yield p12 s 12 23.

应该产生p12 s 12 23。

I've only managed to get the p12 part of the string and this is what I've tried so far to extract the 'go' or 's':


decoded = (re.findall(r'([p][0-9]*)',myStr))
print(decoded) //prints p12 

I know by doing something like



will give me all occurrences of s and g and o, but something like that is not what I'm looking for. And I'm not sure how I'd combine these regexes to get the desired output.


2 个解决方案



Use re.findall with pattern grouping:


>>> string = 'ascjksdcvyp12nbvnzxcmgonbmbh12hjg23'
>>> re.findall(r'(p\d{2}).*(s|go)\D*(\d+)(?:\D*(\d+))*', string)
[('p12', 'go', '12', '23')]

>>> string = 'ascjksdcvyp12nbvnzxcmsnbmbh12hjg23'
>>> re.findall(r'(p\d{2}).*(s|go)\D*(\d+)(?:\D*(\d+))*', string)
[('p12', 's', '12', '23')]
  • With re.findall we are only willing to get what are matched by pattern grouping ()


  • p\d{2} matches any two digits after p

    p \ d {2}匹配p后的任意两位数

  • After that .* matches anything


  • Then, s|go matches either s or go

    然后,s | go匹配s或go

  • \D* matches any number of non-digits

    \ D *匹配任意数量的非数字

  • \d+ indicates one or more digits

    \ d +表示一个或多个数字

  • (?:) is a non-capturing group i.e. the match inside won't show up in the output, it is only for the sake of grouping tokens



>>> re.findall(r'(p\d{2}).*(s|go)(?:\D*(\d+))+?', string)
[('p12', 's', '12')]

>>> re.findall(r'(p\d{2}).*(s|go)(?:\D*(\d+))+', string)
[('p12', 's', '23')]

I would like to use one of the above two as matching later digits is kind of a repeated task but there are problems with both non-greedy and greedy matches, hence we need to match the digits after s or go well, kind of explicitly.




First, try to match your line with a minimal pattern, as a test. Use (grouping) and (?:nongrouping) parens to capture the interesting parts and not capture the uninteresting parts. Store away what you care about, then chop off the remainder of the string and search for numbers as a second step.


import re
simple_test = r'^.*p(\d{2}).*?(?:s|go).*?(\d+)'
m = re.match(simple_test, line)
if m is not None:
    p_num =
    trailing_numbers = []

    remainder = line[m.end()+1:]
    trailing_numbers.extend(               # extend list by appending
        map(                               # list from applying
            lambda m:,          # get group(1) from match
            re.finditer(r"(\d+)", remainder) # of each number in string

    print("P:", p_num, "Numbers:", trailing_numbers)



Use re.findall with pattern grouping:


>>> string = 'ascjksdcvyp12nbvnzxcmgonbmbh12hjg23'
>>> re.findall(r'(p\d{2}).*(s|go)\D*(\d+)(?:\D*(\d+))*', string)
[('p12', 'go', '12', '23')]

>>> string = 'ascjksdcvyp12nbvnzxcmsnbmbh12hjg23'
>>> re.findall(r'(p\d{2}).*(s|go)\D*(\d+)(?:\D*(\d+))*', string)
[('p12', 's', '12', '23')]
  • With re.findall we are only willing to get what are matched by pattern grouping ()


  • p\d{2} matches any two digits after p

    p \ d {2}匹配p后的任意两位数

  • After that .* matches anything


  • Then, s|go matches either s or go

    然后,s | go匹配s或go

  • \D* matches any number of non-digits

    \ D *匹配任意数量的非数字

  • \d+ indicates one or more digits

    \ d +表示一个或多个数字

  • (?:) is a non-capturing group i.e. the match inside won't show up in the output, it is only for the sake of grouping tokens



>>> re.findall(r'(p\d{2}).*(s|go)(?:\D*(\d+))+?', string)
[('p12', 's', '12')]

>>> re.findall(r'(p\d{2}).*(s|go)(?:\D*(\d+))+', string)
[('p12', 's', '23')]

I would like to use one of the above two as matching later digits is kind of a repeated task but there are problems with both non-greedy and greedy matches, hence we need to match the digits after s or go well, kind of explicitly.




First, try to match your line with a minimal pattern, as a test. Use (grouping) and (?:nongrouping) parens to capture the interesting parts and not capture the uninteresting parts. Store away what you care about, then chop off the remainder of the string and search for numbers as a second step.


import re
simple_test = r'^.*p(\d{2}).*?(?:s|go).*?(\d+)'
m = re.match(simple_test, line)
if m is not None:
    p_num =
    trailing_numbers = []

    remainder = line[m.end()+1:]
    trailing_numbers.extend(               # extend list by appending
        map(                               # list from applying
            lambda m:,          # get group(1) from match
            re.finditer(r"(\d+)", remainder) # of each number in string

    print("P:", p_num, "Numbers:", trailing_numbers)