带有re.split函数的bug和Python 2.7.1的re模块中的re.DOTALL标志

时间:2021-10-23 02:57:22

I have a Mac running Lion and Python 2.7.1. I am noticing something very strange from the re module. If I run the following line:

我有一台运行Lion和Python 2.7.1的Mac。我注意到re模块中有一些非常奇怪的东西。如果我运行以下行:

print re.split(r'\s*,\s*', 'a, b,\nc, d, e, f, g, h, i, j, k,\nl, m, n, o, p, q, r')

I get this result:

我得到这个结果:

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r']

But if I run it with the re.DOTALL flag like this:

但是,如果我使用re.DOTALL标志运行它,如下所示:

print re.split(r'\s*,\s*', 'a, b,\nc, d, e, f, g, h, i, j, k,\nl, m, n, o, p, q, r', re.DOTALL)

Then I get this result:

然后我得到这个结果:

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q, r']

Note that 'q, r' is counted as one match instead of two.

请注意,'q,r'计为一个匹配而不是两个匹配。

Why is this happening? I don't see why the re.DOTALL flag would make a difference if I am not using dots in my pattern. Am I doing something wrong or is there some sort of bug?

为什么会这样?我不明白为什么如果我在模式中不使用点,则re.DOTALL标志会有所不同。我做错了什么还是有某种错误?

1 个解决方案

#1


10  

>>> s = 'a, b,\nc, d, e, f, g, h, i, j, k,\nl, m, n, o, p, q, r'
>>> re.split(r'\s*,\s*', s)
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r']
>>> re.split(r'\s*,\s*', s, maxsplit=16)
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q, r']
>>> re.split(r'\s*,\s*', s, flags=re.DOTALL)
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r']

The problem is that you are passing re.DOTALL positionally, where it sets the maxsplit=0 argument, not the flags=0 argument. re.DOTALL happens to be the constant 16.

问题是你在位置传递re.DOTALL,它设置了maxsplit = 0参数,而不是flags = 0参数。 re.DOTALL碰巧是常数16。

#1


10  

>>> s = 'a, b,\nc, d, e, f, g, h, i, j, k,\nl, m, n, o, p, q, r'
>>> re.split(r'\s*,\s*', s)
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r']
>>> re.split(r'\s*,\s*', s, maxsplit=16)
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q, r']
>>> re.split(r'\s*,\s*', s, flags=re.DOTALL)
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r']

The problem is that you are passing re.DOTALL positionally, where it sets the maxsplit=0 argument, not the flags=0 argument. re.DOTALL happens to be the constant 16.

问题是你在位置传递re.DOTALL,它设置了maxsplit = 0参数,而不是flags = 0参数。 re.DOTALL碰巧是常数16。