match = re.match();如果匹配:…的成语吗?

时间:2021-11-01 22:37:56

If you want to check if something matches a regex, if so, print the first group, you do..

如果您想检查是否有匹配的regex,如果是,打印第一个组,您可以。

import re
match = re.match("(\d+)g", "123g")
if match is not None:
    print match.group(1)

This is completely pedantic, but the intermediate match variable is a bit annoying..

这完全是学究式的,但是中间匹配变量有点烦人。

Languages like Perl do this by creating new $1..$9 variables for match groups, like..

像Perl这样的语言通过创建新的$1来实现这一点。$9匹配组变量,例如。

if($blah ~= /(\d+)g/){
    print $1
}

From this reddit comment,

从这个reddit评论,

with re_context.match('^blah', s) as match:
    if match:
        ...
    else:
        ...

..which I thought was an interesting idea, so I wrote a simple implementation of it:

. .我认为这是个有趣的想法,所以我写了一个简单的例子:

#!/usr/bin/env python2.6
import re

class SRE_Match_Wrapper:
    def __init__(self, match):
        self.match = match

    def __exit__(self, type, value, tb):
        pass

    def __enter__(self):
        return self.match

    def __getattr__(self, name):
        if name == "__exit__":
            return self.__exit__
        elif name == "__enter__":
            return self.__name__
        else:
            return getattr(self.match, name)

def rematch(pattern, inp):
    matcher = re.compile(pattern)
    x = SRE_Match_Wrapper(matcher.match(inp))
    return x
    return match

if __name__ == '__main__':
    # Example:
    with rematch("(\d+)g", "123g") as m:
        if m:
            print(m.group(1))

    with rematch("(\d+)g", "123") as m:
        if m:
            print(m.group(1))

(This functionality could theoretically be patched into the _sre.SRE_Match object)

(理论上可以将此功能修补到_sre中。SRE_Match对象)

It would be nice if you could skip the execution of the with statement's code block, if there was no match, which would simplify this to..

如果您可以跳过带有语句代码块的执行,如果没有匹配,可以将其简化为。

with rematch("(\d+)g", "123") as m:
    print(m.group(1)) # only executed if the match occurred

..but this seems impossible based of what I can deduce from PEP 343

. .但这似乎是不可能的,因为我可以从PEP 343推导出来。

Any ideas? As I said, this is really trivial annoyance, almost to the point of being code-golf..

什么好主意吗?就像我说的,这真的是微不足道的烦恼,几乎到了打码高尔夫的地步。

9 个解决方案

#1


12  

I don't think it's trivial. I don't want to have to sprinkle a redundant conditional around my code if I'm writing code like that often.

我不认为这是微不足道的。如果我经常编写这样的代码,我不希望在代码中添加冗余的条件。

This is slightly odd, but you can do this with an iterator:

这有点奇怪,但您可以用迭代器来实现:

import re

def rematch(pattern, inp):
    matcher = re.compile(pattern)
    matches = matcher.match(inp)
    if matches:
        yield matches

if __name__ == '__main__':
    for m in rematch("(\d+)g", "123g"):
        print(m.group(1))

The odd thing is that it's using an iterator for something that isn't iterating--it's closer to a conditional, and at first glance it might look like it's going to yield multiple results for each match.

奇怪的是,它使用的是迭代器,因为它不是迭代的——它更接近于一个条件,乍一看,它看起来可能会为每个匹配产生多个结果。

It does seem odd that a context manager can't cause its managed function to be skipped entirely; while that's not explicitly one of the use cases of "with", it seems like a natural extension.

很奇怪,上下文管理器不能完全忽略它的托管函数;虽然这并不是“with”的用例,但它似乎是一个自然扩展。

#2


4  

Another nice syntax would be something like this:

另一个很好的语法是这样的:

header = re.compile('(.*?) = (.*?)$')
footer = re.compile('(.*?): (.*?)$')

if header.match(line) as m:
    key, value = m.group(1,2)
elif footer.match(line) as m
    key, value = m.group(1,2)
else:
    key, value = None, None

#3


1  

I have another way of doing this, based on Glen Maynard's solution:

我有另一种方法,基于格伦·梅纳德的解决方案:

for match in [m for m in [re.match(pattern,key)] if m]:
    print "It matched: %s" % match

Similar to Glen's solution, this itterates either 0 (if no match) or 1 (if a match) times.

类似于格伦的解决方案,这个itterates要么0(如果没有匹配)或者1(如果匹配)时间。

No sub needed, but less tidy as a result.

没有必要,但结果却不那么整洁。

#4


0  

If you're doing a lot of these in one place, here's an alternative answer:

如果你在一个地方做了很多这样的事情,这里有一个替代的答案:

import re
class Matcher(object):
    def __init__(self):
        self.matches = None
    def set(self, matches):
        self.matches = matches
    def __getattr__(self, name):
        return getattr(self.matches, name)

class re2(object):
    def __init__(self, expr):
        self.re = re.compile(expr)

    def match(self, matcher, s):
        matches = self.re.match(s)
        matcher.set(matches)
        return matches

pattern = re2("(\d+)g")
m = Matcher()
if pattern.match(m, "123g"):
    print(m.group(1))
if not pattern.match(m, "x123g"):
    print "no match"

You can compile the regex once with the same thread safety as re, create a single reusable Matcher object for the whole function, and then you can use it very concisely. This also has the benefit that you can reverse it in the obvious way--to do that with an iterator, you'd need to pass a flag to tell it to invert its result.

您可以使用与re相同的线程安全性来编译regex,为整个函数创建一个可重用的Matcher对象,然后您可以非常简洁地使用它。这也有好处,您可以以明显的方式逆转它——用迭代器来实现它,您需要传递一个标志来告诉它转换结果。

It's not much help if you're only doing a single match per function, though; you don't want to keep Matcher objects in a broader context than that; it'd cause the same issues as Blixt's solution.

不过,如果你只对每个函数做一个匹配,那就没什么帮助了;你不想让Matcher对象在更大的范围内;它会导致与Blixt的解决方案相同的问题。

#5


0  

I don't think using with is the solution in this case. You'd have to raise an exception in the BLOCK part (which is specified by the user) and have the __exit__ method return True to "swallow" the exception. So it would never look good.

我不认为在这种情况下使用是解决方案。您必须在块部分(由用户指定)引发异常,并有__exit__方法返回True以“swallow”异常。所以它永远不会好看。

I'd suggest going for a syntax similar to the Perl syntax. Make your own extended re module (I'll call it rex) and have it set variables in its module namespace:

我建议使用类似于Perl语法的语法。创建自己的扩展re模块(我称之为rex),并在其模块名称空间中设置变量:

if rex.match('(\d+)g', '123g'):
    print rex._1

As you can see in the comments below, this method is neither scope- nor thread-safe. You would only use this if you were completely certain that your application wouldn't become multi-threaded in the future and that any functions called from the scope that you're using this in will also use the same method.

正如您在下面的评论中看到的,这个方法既不是范围,也不是线程安全的。如果您完全确定您的应用程序在将来不会成为多线程的,那么您将只使用它,并且您在使用该应用程序的范围内调用的任何函数也将使用相同的方法。

#6


0  

This is not really pretty-looking, but you can profit from the getattr(object, name[, default]) built-in function using it like this:

这并不是很漂亮,但是您可以从getattr(对象、名称[默认])内置函数中获利:

>>> getattr(re.match("(\d+)g", "123g"), 'group', lambda n:'')(1)
'123'
>>> getattr(re.match("(\d+)g", "X23g"), 'group', lambda n:'')(1)
''

To mimic the if match print group flow, you can (ab)use the for statement this way:

为了模拟如果匹配打印组流程,您可以(ab)以这种方式使用for语句:

>>> for group in filter(None, [getattr(re.match("(\d+)g", "123g"), 'group', None)]):
        print(group(1))
123
>>> for group in filter(None, [getattr(re.match("(\d+)g", "X23g"), 'group', None)]):
        print(group(1))
>>> 

Of course you can define a little function to do the dirty work:

当然,你可以定义一个小函数来做这些脏工作:

>>> matchgroup = lambda p,s: filter(None, [getattr(re.match(p, s), 'group', None)])
>>> for group in matchgroup("(\d+)g", "123g"):
        print(group(1))
123
>>> for group in matchgroup("(\d+)g", "X23g"):
        print(group(1))
>>> 

#7


0  

Not the perfect solution, but does allow you to chain several match options for the same str:

不是完美的解决方案,但是允许您为相同的str链接多个匹配选项:

class MatchWrapper(object):
  def __init__(self):
    self._matcher = None

  def wrap(self, matcher):
    self._matcher = matcher

  def __getattr__(self, attr):
    return getattr(self._matcher, attr)

def match(pattern, s, matcher):
  m = re.match(pattern, s)
  if m:
    matcher.wrap(m)
    return True
  else:
    return False

matcher = MatchWrapper()
s = "123g";
if _match("(\d+)g", line, matcher):
  print matcher.group(1)
elif _match("(\w+)g", line, matcher):
  print matcher.group(1)
else:
  print "no match"

#8


0  

Here's my solution:

这是我的解决方案:

import re

s = 'hello world'

match = []
if match.append(re.match('w\w+', s)) or any(match):
    print('W:', match.pop().group(0))
elif match.append(re.match('h\w+', s)) or any(match):
    print('H:', match.pop().group(0))
else:
    print('No match found')

You can use as many elif clauses as needed.

您可以使用尽可能多的elif子句。

Even better:

更好的是:

import re

s = 'hello world'

if vars().update(match=re.match('w\w+', s)) or match:
    print('W:', match.group(0))
elif vars().update(match=re.match('h\w+', s)) or match:
    print('H:', match.group(0))
else:
    print('No match found')

Both append and update return None. So you have to actually check the result of your expression by using the or part in every case.

append和update都没有返回。所以你必须用每个例子来检查你的表达式的结果。

Unfortunately, this only works as long as the code resides top-level, i.e. not in a function.

不幸的是,只要代码位于顶层,即不在函数中,这就只能工作。

#9


0  

This is what I do:

这就是我所做的:

def re_match_cond (match_ref, regex, text):
    match = regex.match (text)
    del match_ref[:]
    match_ref.append (match)
    return match

if __name__ == '__main__':
    match_ref = []
    if re_match_cond (match_ref, regex_1, text):
        match = match_ref[0]
        ### ...
    elif re_match_cond (match_ref, regex_2, text):
        match = match_ref[0]
        ### ...
    elif re_match_cond (match_ref, regex_3, text):
        match = match_ref[0]
        ### ...
    else:
        ### no match
        ### ...

That is, I pass a list to the function to emulate pass-by-reference.

也就是说,我将一个列表传递给函数以模拟传递引用。

#1


12  

I don't think it's trivial. I don't want to have to sprinkle a redundant conditional around my code if I'm writing code like that often.

我不认为这是微不足道的。如果我经常编写这样的代码,我不希望在代码中添加冗余的条件。

This is slightly odd, but you can do this with an iterator:

这有点奇怪,但您可以用迭代器来实现:

import re

def rematch(pattern, inp):
    matcher = re.compile(pattern)
    matches = matcher.match(inp)
    if matches:
        yield matches

if __name__ == '__main__':
    for m in rematch("(\d+)g", "123g"):
        print(m.group(1))

The odd thing is that it's using an iterator for something that isn't iterating--it's closer to a conditional, and at first glance it might look like it's going to yield multiple results for each match.

奇怪的是,它使用的是迭代器,因为它不是迭代的——它更接近于一个条件,乍一看,它看起来可能会为每个匹配产生多个结果。

It does seem odd that a context manager can't cause its managed function to be skipped entirely; while that's not explicitly one of the use cases of "with", it seems like a natural extension.

很奇怪,上下文管理器不能完全忽略它的托管函数;虽然这并不是“with”的用例,但它似乎是一个自然扩展。

#2


4  

Another nice syntax would be something like this:

另一个很好的语法是这样的:

header = re.compile('(.*?) = (.*?)$')
footer = re.compile('(.*?): (.*?)$')

if header.match(line) as m:
    key, value = m.group(1,2)
elif footer.match(line) as m
    key, value = m.group(1,2)
else:
    key, value = None, None

#3


1  

I have another way of doing this, based on Glen Maynard's solution:

我有另一种方法,基于格伦·梅纳德的解决方案:

for match in [m for m in [re.match(pattern,key)] if m]:
    print "It matched: %s" % match

Similar to Glen's solution, this itterates either 0 (if no match) or 1 (if a match) times.

类似于格伦的解决方案,这个itterates要么0(如果没有匹配)或者1(如果匹配)时间。

No sub needed, but less tidy as a result.

没有必要,但结果却不那么整洁。

#4


0  

If you're doing a lot of these in one place, here's an alternative answer:

如果你在一个地方做了很多这样的事情,这里有一个替代的答案:

import re
class Matcher(object):
    def __init__(self):
        self.matches = None
    def set(self, matches):
        self.matches = matches
    def __getattr__(self, name):
        return getattr(self.matches, name)

class re2(object):
    def __init__(self, expr):
        self.re = re.compile(expr)

    def match(self, matcher, s):
        matches = self.re.match(s)
        matcher.set(matches)
        return matches

pattern = re2("(\d+)g")
m = Matcher()
if pattern.match(m, "123g"):
    print(m.group(1))
if not pattern.match(m, "x123g"):
    print "no match"

You can compile the regex once with the same thread safety as re, create a single reusable Matcher object for the whole function, and then you can use it very concisely. This also has the benefit that you can reverse it in the obvious way--to do that with an iterator, you'd need to pass a flag to tell it to invert its result.

您可以使用与re相同的线程安全性来编译regex,为整个函数创建一个可重用的Matcher对象,然后您可以非常简洁地使用它。这也有好处,您可以以明显的方式逆转它——用迭代器来实现它,您需要传递一个标志来告诉它转换结果。

It's not much help if you're only doing a single match per function, though; you don't want to keep Matcher objects in a broader context than that; it'd cause the same issues as Blixt's solution.

不过,如果你只对每个函数做一个匹配,那就没什么帮助了;你不想让Matcher对象在更大的范围内;它会导致与Blixt的解决方案相同的问题。

#5


0  

I don't think using with is the solution in this case. You'd have to raise an exception in the BLOCK part (which is specified by the user) and have the __exit__ method return True to "swallow" the exception. So it would never look good.

我不认为在这种情况下使用是解决方案。您必须在块部分(由用户指定)引发异常,并有__exit__方法返回True以“swallow”异常。所以它永远不会好看。

I'd suggest going for a syntax similar to the Perl syntax. Make your own extended re module (I'll call it rex) and have it set variables in its module namespace:

我建议使用类似于Perl语法的语法。创建自己的扩展re模块(我称之为rex),并在其模块名称空间中设置变量:

if rex.match('(\d+)g', '123g'):
    print rex._1

As you can see in the comments below, this method is neither scope- nor thread-safe. You would only use this if you were completely certain that your application wouldn't become multi-threaded in the future and that any functions called from the scope that you're using this in will also use the same method.

正如您在下面的评论中看到的,这个方法既不是范围,也不是线程安全的。如果您完全确定您的应用程序在将来不会成为多线程的,那么您将只使用它,并且您在使用该应用程序的范围内调用的任何函数也将使用相同的方法。

#6


0  

This is not really pretty-looking, but you can profit from the getattr(object, name[, default]) built-in function using it like this:

这并不是很漂亮,但是您可以从getattr(对象、名称[默认])内置函数中获利:

>>> getattr(re.match("(\d+)g", "123g"), 'group', lambda n:'')(1)
'123'
>>> getattr(re.match("(\d+)g", "X23g"), 'group', lambda n:'')(1)
''

To mimic the if match print group flow, you can (ab)use the for statement this way:

为了模拟如果匹配打印组流程,您可以(ab)以这种方式使用for语句:

>>> for group in filter(None, [getattr(re.match("(\d+)g", "123g"), 'group', None)]):
        print(group(1))
123
>>> for group in filter(None, [getattr(re.match("(\d+)g", "X23g"), 'group', None)]):
        print(group(1))
>>> 

Of course you can define a little function to do the dirty work:

当然,你可以定义一个小函数来做这些脏工作:

>>> matchgroup = lambda p,s: filter(None, [getattr(re.match(p, s), 'group', None)])
>>> for group in matchgroup("(\d+)g", "123g"):
        print(group(1))
123
>>> for group in matchgroup("(\d+)g", "X23g"):
        print(group(1))
>>> 

#7


0  

Not the perfect solution, but does allow you to chain several match options for the same str:

不是完美的解决方案,但是允许您为相同的str链接多个匹配选项:

class MatchWrapper(object):
  def __init__(self):
    self._matcher = None

  def wrap(self, matcher):
    self._matcher = matcher

  def __getattr__(self, attr):
    return getattr(self._matcher, attr)

def match(pattern, s, matcher):
  m = re.match(pattern, s)
  if m:
    matcher.wrap(m)
    return True
  else:
    return False

matcher = MatchWrapper()
s = "123g";
if _match("(\d+)g", line, matcher):
  print matcher.group(1)
elif _match("(\w+)g", line, matcher):
  print matcher.group(1)
else:
  print "no match"

#8


0  

Here's my solution:

这是我的解决方案:

import re

s = 'hello world'

match = []
if match.append(re.match('w\w+', s)) or any(match):
    print('W:', match.pop().group(0))
elif match.append(re.match('h\w+', s)) or any(match):
    print('H:', match.pop().group(0))
else:
    print('No match found')

You can use as many elif clauses as needed.

您可以使用尽可能多的elif子句。

Even better:

更好的是:

import re

s = 'hello world'

if vars().update(match=re.match('w\w+', s)) or match:
    print('W:', match.group(0))
elif vars().update(match=re.match('h\w+', s)) or match:
    print('H:', match.group(0))
else:
    print('No match found')

Both append and update return None. So you have to actually check the result of your expression by using the or part in every case.

append和update都没有返回。所以你必须用每个例子来检查你的表达式的结果。

Unfortunately, this only works as long as the code resides top-level, i.e. not in a function.

不幸的是,只要代码位于顶层,即不在函数中,这就只能工作。

#9


0  

This is what I do:

这就是我所做的:

def re_match_cond (match_ref, regex, text):
    match = regex.match (text)
    del match_ref[:]
    match_ref.append (match)
    return match

if __name__ == '__main__':
    match_ref = []
    if re_match_cond (match_ref, regex_1, text):
        match = match_ref[0]
        ### ...
    elif re_match_cond (match_ref, regex_2, text):
        match = match_ref[0]
        ### ...
    elif re_match_cond (match_ref, regex_3, text):
        match = match_ref[0]
        ### ...
    else:
        ### no match
        ### ...

That is, I pass a list to the function to emulate pass-by-reference.

也就是说,我将一个列表传递给函数以模拟传递引用。