如何提取所需的模式（字符串）

I am trying to compare my pattern with given string (in general I will readline out of file, but for now I use explicit string just to see how it works) though for given line script does not work as I desire.

我试图将我的模式与给定的字符串进行比较(一般情况下我会读取文件中的内容,但是现在我使用显式字符串只是为了看它是如何工作的)虽然对于给定的行脚本不能按我的意愿工作。

import re

regex = '.+0+[0-9]+.'
string = "Your order number is 0000122995"

print (re.match(regex,string))

What I am trying to achieve here is to find this 0000* number and assign it to the variable (which I would like to place into Excel later), but given regex matches the whole line, which is not what I am trying to get (I know that is because of the syntax). Any tips how to overcome this?

我在这里想要实现的是找到这个0000 *数字并将其分配给变量(我稍后将其放入Excel中),但是给定正则表达式匹配整行,这不是我想要得到的(我知道这是因为语法)。任何提示如何克服这个?

2 个解决方案

#1

If you want to locate a match anywhere in a string, use re.search() instead of re.match(). re.match() checks for a match only at the beginning of the string, while re.search() checks for a match anywhere in the string.

如果要在字符串中的任何位置找到匹配项,请使用re.search()而不是re.match()。 re.match()仅在字符串的开头检查匹配,而re.search()检查字符串中任何位置的匹配。

import re
regex = r'(0{4}\d+)'
string = "Your order number is 0000122995"

print (re.search(regex, string).group(0))

re.search() and re.match() return a match object if there is a match. Using match.group() returns one or more subgroups of the match.

如果匹配,re.search()和re.match()返回匹配对象。使用match.group()返回匹配的一个或多个子组。

See the re.search() documentation for more information.

有关更多信息,请参阅re.search()文档。

#2

In your case, if you expect your queries to be as consistent as you've shown the following will work(It ignores "Your order number is " and captures everything behind it until it hits whitespace or the end of the string):

在您的情况下,如果您希望您的查询与您显示的一致,则以下内容将起作用(它忽略“您的订单号是”并捕获其后面的所有内容,直到它到达空格或字符串结尾):

def findOrder():
        import re
        string = "Your order number is 0000122995"
        arrayAnswer = re.findall('Your order number is ([\S]+)', string)
        print('Your number in an Array is:')
        print(arrayAnswer)
        print('')
        print('Your number(s) output as a "string(s)" is/are:')
        for order in arrayAnswer:
                print(order)

Run this by making sure to call findOrder(). If you wan to get a little more "regexy", noting that what you want exclusively includes numbers, the below excludes letters and spaces and returns numbers:

通过确保调用findOrder()来运行它。如果你想获得更多“regexy”,注意你想要的只包括数字,下面不包括字母和空格并返回数字:

def findOrder():
        import re
        string = "Your order number is 0000122995"
        arrayAnswer = re.findall('[a-zA-Z\s]+([\d]+)', string)
        print('Your number in an Array is:')
        print(arrayAnswer)
        print('')
        print('Your number(s) output as a "string(s)" is/are:')
        for order in arrayAnswer:
                print(order)

Again, run this by making sure to call findOrder().

再次,通过确保调用findOrder()来运行它。

Your OUTPUT for both should be this:

你们两个的输出应该是这样的:

>>> findOrder()
Your number in an Array is:
['0000122995']

Your number(s) output as a "string(s)" is/are:
0000122995

I suspect, though, you might want to work with a query longer than the string you posted. Post that if you need anything further.

但我怀疑,您可能希望使用比您发布的字符串更长的查询。发布,如果你还需要什么。

#1

import re
regex = r'(0{4}\d+)'
string = "Your order number is 0000122995"

print (re.search(regex, string).group(0))