>>> src = ' pkg.subpkg.submod.thing pkg2.subpkg.submod.thing '
>>> re.search(r'\s*(\w+\.)+', src).groups()
('submod.',)
This regex seems to put everything which is not space into a/the group - nothing to be lost before stop of regex match.
这个正则表达式似乎把所有不是空间的东西放到了一个/组中 - 在正则表达式匹配之前没有什么可以丢失的。
Why is just the last "+" repetition found in the group here - and not ('pkg.subpkg.submod.',)
?
为什么这里只是组中发现的最后一次“+”重复 - 而不是('pkg.subpkg.submod。',)?
Or ('pkg.',)
- early stop because no real repetition - no "loss of information" in another sense?
或者('pkg。') - 早期停止,因为没有真正的重复 - 在另一种意义上没有“信息丢失”?
(I needed to use another (?:...)
like r'\s((?:\w+\.)+)'
)
(我需要使用另一个(?:...)像r'\ s((?:\ w + \。)+)')
Even more strange:
更奇怪的是:
>>> src = ' pkg.subpkg.submod.thing pkg2.subpkg.submod.thing '
>>> re.search(r'\s(\w+\.)*', src).groups()
(None,)
Edit: the "more strange" is actually "less strange" as @Avinash Raj pointed out, because - unlike intended - the match simply ends before the group; So
编辑:“更奇怪”实际上“不那么奇怪”正如@Avinash Raj指出的那样,因为 - 与预期不同 - 这场比赛只是在小组之前结束;所以
>>> re.search(r'\s+(\w+\.)*', ' pkg.subpkg.submod.thing').groups()
('submod.',)
.. then produces the same questioned behavior than "+" : just last repetition - things before seeming lost...
..然后产生与“+”相同的质疑行为:只是最后一次重复 - 看似丢失之前的事情......
3 个解决方案
#1
1
I'll explain the even more strange part..
我会解释更奇怪的部分..
src = ' pkg.subpkg.submod.thing pkg2.subpkg.submod.thing '
re.search
stops matching once it finds a first match. So,
re.search在找到第一个匹配后停止匹配。所以,
r'\s(\w+\.)*'
would match the first space character (*
repeats the previous pattern zero or more times), since there is no match for (\w+\.)*
after the first space, groups()
function on searchObj returns None
and group
on searchObj should return the space that is the first space.
r'\ s(\ w + \。)*'将匹配第一个空格字符(*重复前一个模式零次或多次),因为在第一个空格,组之后没有匹配(\ w + \。)* searchObj上的函数返回None,searchObj上的group应该返回第一个空格的空格。
#2
0
I do not know, why it is strange for you. What do you expect?
我不知道,为什么对你来说很奇怪。你能指望什么?
In the documentation you find the following:
在文档中,您可以找到以下内容:
re.search(pattern, string, flags=0) Scan through string looking for the first location where the regular expression pattern ...
re.search(pattern,string,flags = 0)扫描字符串,查找正则表达式模式的第一个位置...
re.search(r'\s*(\w+\.)+', src).groups()
in your search string you have only one group: (\w+.) Because it is greedy by default all the pkg.subpkg. is eaten before you find submod., this is the last that is filled, that the string matches.
在您的搜索字符串中,您只有一个组:(\ w +。)因为默认情况下它是贪婪的所有pkg.subpkg。在找到submod之前被吃掉。这是填充的最后一个,字符串匹配。
your second try doesn't match, cause there is not even 1 group nessesary to fulfil the Statement, so all 3 parts are eaten and inside the Group you find nothing.
你的第二次尝试不匹配,因为甚至没有一个小组有必要履行声明,所以所有3个部分都被吃掉了,在集团里面你什么都没找到。
Do you look for this?
你在找这个吗?
re.search(r'\s*((\w+\.)+)', src).groups()[0]
Try out the following to understand it better:
尝试以下内容以更好地理解它:
re.search(r'\s*((\w+\.)*)(\w+\.)*', 'a.b.c.d.e.f.g.h.i').groups()
#3
-1
This should work fine to match the complete string ' pkg.subpkg.submod.thing pkg2.subpkg.submod.thing '
这应该可以正常匹配完整的字符串'pkg.subpkg.submod.thing pkg2.subpkg.submod.thing'
(\s*(\w+[.\s])+)+
In case you want the output ' pkg.subpkg.submod.thing ' then use this
如果你想要输出'pkg.subpkg.submod.thing'然后使用它
\s*(\w+[.\s])+
#1
1
I'll explain the even more strange part..
我会解释更奇怪的部分..
src = ' pkg.subpkg.submod.thing pkg2.subpkg.submod.thing '
re.search
stops matching once it finds a first match. So,
re.search在找到第一个匹配后停止匹配。所以,
r'\s(\w+\.)*'
would match the first space character (*
repeats the previous pattern zero or more times), since there is no match for (\w+\.)*
after the first space, groups()
function on searchObj returns None
and group
on searchObj should return the space that is the first space.
r'\ s(\ w + \。)*'将匹配第一个空格字符(*重复前一个模式零次或多次),因为在第一个空格,组之后没有匹配(\ w + \。)* searchObj上的函数返回None,searchObj上的group应该返回第一个空格的空格。
#2
0
I do not know, why it is strange for you. What do you expect?
我不知道,为什么对你来说很奇怪。你能指望什么?
In the documentation you find the following:
在文档中,您可以找到以下内容:
re.search(pattern, string, flags=0) Scan through string looking for the first location where the regular expression pattern ...
re.search(pattern,string,flags = 0)扫描字符串,查找正则表达式模式的第一个位置...
re.search(r'\s*(\w+\.)+', src).groups()
in your search string you have only one group: (\w+.) Because it is greedy by default all the pkg.subpkg. is eaten before you find submod., this is the last that is filled, that the string matches.
在您的搜索字符串中,您只有一个组:(\ w +。)因为默认情况下它是贪婪的所有pkg.subpkg。在找到submod之前被吃掉。这是填充的最后一个,字符串匹配。
your second try doesn't match, cause there is not even 1 group nessesary to fulfil the Statement, so all 3 parts are eaten and inside the Group you find nothing.
你的第二次尝试不匹配,因为甚至没有一个小组有必要履行声明,所以所有3个部分都被吃掉了,在集团里面你什么都没找到。
Do you look for this?
你在找这个吗?
re.search(r'\s*((\w+\.)+)', src).groups()[0]
Try out the following to understand it better:
尝试以下内容以更好地理解它:
re.search(r'\s*((\w+\.)*)(\w+\.)*', 'a.b.c.d.e.f.g.h.i').groups()
#3
-1
This should work fine to match the complete string ' pkg.subpkg.submod.thing pkg2.subpkg.submod.thing '
这应该可以正常匹配完整的字符串'pkg.subpkg.submod.thing pkg2.subpkg.submod.thing'
(\s*(\w+[.\s])+)+
In case you want the output ' pkg.subpkg.submod.thing ' then use this
如果你想要输出'pkg.subpkg.submod.thing'然后使用它
\s*(\w+[.\s])+