python中的正则表达式似乎没有像我期望的那样工作

时间:2022-03-19 00:06:16

My code doesn't seem to be working like it's supposed to:

我的代码似乎没有像它应该的那样工作:

x = "engniu4nwi5u"
print re.sub(r"\D(\d)\D", r"\1abc", x)

My desired output is: engniuabcnwiabcu
But the output actually given is: engni4abcw5abc

我想要的输出是:engniuabcnwiabcu但实际给出的输出是:engni4abcw5abc

4 个解决方案

#1


You are grouping the wrong characters it must be written as

您正在将必须写入的错误字符分组为

>>> x = "engniu4nwi5u"
>>> re.sub(r"(\D)\d(\D)", r"\1abc\2", x)
'engniuabcnwiabcu'
  • (\D) Matches a non digit and captures it in \1
  • (\ D)匹配非数字并在\ 1中捕获它

  • \d Matches the digit
  • \ d匹配数字

  • (\D) Matches the following digit. captures in \2
  • (\ D)匹配以下数字。捕获\ 2

How does it matches?

它是如何匹配的?

engniu4nwi5u
     |
    \D => \1

engniu4nwi5u
      |
     \d

engniu4nwi5u
       |
      \D => \2

Another Solution

You can also use look arounds to perform the same as

您也可以使用环顾四周来执行相同的操作

>>> x = "engniu4nwi5u"
>>> re.sub(r"(?<=\D)\d(?=\D)", r"abc", x)
'engniuabcnwiabcu'
  • (?<=\D) Look behind assertion. Checks if the digit is presceded by a non digit. But not caputred
  • (?<= \ D)看看断言背后。检查数字是否以非数字表示。但没有被束缚

  • \d Matches the digit
  • \ d匹配数字

  • (?=\D) Look ahead assertion. Checks if the digit is followed by the non digit. Also not captured.
  • (?= \ D)向前看断言。检查数字后面是否为非数字。也没有捕获。

#2


This is because you replaced the wrong part:

这是因为你更换了错误的部分:

Let's consider the first match. \D\d\D matches the following:

让我们考虑第一场比赛。 \ D \ d \ D符合以下条件:

engniu4nwi5u
     ^^^

4 is captured as \1. Then you replace the whole match with: \1abc, which becomes 4abc.

4被捕获为\ 1。然后用:\ 1abc替换整个匹配,变为4abc。

You have a couple solutions here:

你有几个解决方案:

  • Capture what you want to keep: (\D)\d(\D) and replace it with \1abc\2
  • 捕获您想要保留的内容:(\ D)\ d(\ D)并将其替换为\ 1abc \ 2

  • Use lookaheads: (?<=\D)\d(?=\D) and replace this with abc
  • 使用前瞻:(?<= \ D)\ d(?= \ D)并用abc替换它

#3


Based on your regexp:

基于你的正则表达式:

>>> re.sub("(\D)\d", r"\1abc", x)
'engniuabcnwiabcu'

Although I would do this instead:

虽然我会这样做:

>>> re.sub("\d", "abc", x)
'engniuabcnwiabcu'

#4


If you plan to check also the beginning and end of string, you need to add ^ and $ to the regex:

如果您还打算检查字符串的开头和结尾,则需要将^和$添加到正则表达式:

(\D|^)\d(?=$|\D)

And replace with \1abc.

并用\ 1abc替换。

See demo

Sample code on IDEONE:

IDEONE上的示例代码:

import re
p = re.compile(ur'(\D|^)\d(?=$|\D)')
test_str = u"1engniu4nwi5u"
subst = u"\1abc"
print re.sub(p, subst, test_str)

#1


You are grouping the wrong characters it must be written as

您正在将必须写入的错误字符分组为

>>> x = "engniu4nwi5u"
>>> re.sub(r"(\D)\d(\D)", r"\1abc\2", x)
'engniuabcnwiabcu'
  • (\D) Matches a non digit and captures it in \1
  • (\ D)匹配非数字并在\ 1中捕获它

  • \d Matches the digit
  • \ d匹配数字

  • (\D) Matches the following digit. captures in \2
  • (\ D)匹配以下数字。捕获\ 2

How does it matches?

它是如何匹配的?

engniu4nwi5u
     |
    \D => \1

engniu4nwi5u
      |
     \d

engniu4nwi5u
       |
      \D => \2

Another Solution

You can also use look arounds to perform the same as

您也可以使用环顾四周来执行相同的操作

>>> x = "engniu4nwi5u"
>>> re.sub(r"(?<=\D)\d(?=\D)", r"abc", x)
'engniuabcnwiabcu'
  • (?<=\D) Look behind assertion. Checks if the digit is presceded by a non digit. But not caputred
  • (?<= \ D)看看断言背后。检查数字是否以非数字表示。但没有被束缚

  • \d Matches the digit
  • \ d匹配数字

  • (?=\D) Look ahead assertion. Checks if the digit is followed by the non digit. Also not captured.
  • (?= \ D)向前看断言。检查数字后面是否为非数字。也没有捕获。

#2


This is because you replaced the wrong part:

这是因为你更换了错误的部分:

Let's consider the first match. \D\d\D matches the following:

让我们考虑第一场比赛。 \ D \ d \ D符合以下条件:

engniu4nwi5u
     ^^^

4 is captured as \1. Then you replace the whole match with: \1abc, which becomes 4abc.

4被捕获为\ 1。然后用:\ 1abc替换整个匹配,变为4abc。

You have a couple solutions here:

你有几个解决方案:

  • Capture what you want to keep: (\D)\d(\D) and replace it with \1abc\2
  • 捕获您想要保留的内容:(\ D)\ d(\ D)并将其替换为\ 1abc \ 2

  • Use lookaheads: (?<=\D)\d(?=\D) and replace this with abc
  • 使用前瞻:(?<= \ D)\ d(?= \ D)并用abc替换它

#3


Based on your regexp:

基于你的正则表达式:

>>> re.sub("(\D)\d", r"\1abc", x)
'engniuabcnwiabcu'

Although I would do this instead:

虽然我会这样做:

>>> re.sub("\d", "abc", x)
'engniuabcnwiabcu'

#4


If you plan to check also the beginning and end of string, you need to add ^ and $ to the regex:

如果您还打算检查字符串的开头和结尾,则需要将^和$添加到正则表达式:

(\D|^)\d(?=$|\D)

And replace with \1abc.

并用\ 1abc替换。

See demo

Sample code on IDEONE:

IDEONE上的示例代码:

import re
p = re.compile(ur'(\D|^)\d(?=$|\D)')
test_str = u"1engniu4nwi5u"
subst = u"\1abc"
print re.sub(p, subst, test_str)