Say I have a string 'ad>ad>ad>>ad'
and I want to split on this on the '>'
(not the '>>'
chars). Just picked up regex and was wondering if there is a way (special character) to split on a specific part of the matched expression, rather than splitting on the whole matched expression, for example the regex could be:
假设我有一个字符串'ad> ad> ad >> ad',我想在'>'(而不是'>>'字符)上拆分。刚刚拿起正则表达式并想知道是否有一种方法(特殊字符)可以在匹配表达式的特定部分上进行拆分,而不是拆分整个匹配表达式,例如正则表达式可能是:
re.split('[^>]>[^>]', 'ad>ad>ad>>ad')
Can you get it to split on the char in parenthesis [^>](>)[^>]
?
你能把它拆分成括号[^>](>)[^>]中的字符吗?
2 个解决方案
#1
1
You need to use lookarounds:
你需要使用lookarounds:
re.split(r'(?<!>)>(?!>)', 'ad>ad>ad>>ad')
See the regex demo
请参阅正则表达式演示
The (?<!>)>(?!>)
pattern only matches a >
that is not preceded with a <
(due to the negative lookbehind (?<!>)
) and that is not followed with a <
(due to the negative lookahead (?!>)
).
(? )>(?!>)模式只匹配一个>前面没有<(由于负面的后观(? ))并且后面没有<(由于负向前瞻(?!>))。
Since lookarounds do not consume the characters (unlike negated (and positive) character classes, like [^>]
), we only match and split on a <
symbol without "touching" the symbols around it.
由于lookarounds不消耗字符(不像否定(和正)字符类,如[^>]),我们只匹配并拆分 <符号而不“触摸”它周围的符号。< p>
#2
1
Try with \b>\b
尝试使用\ b> \ b
This will check for single >
surrounded by non-whitespace characters. As the string in the question is continuous stream of characters checking word boundary with \b
is simplest method.
这将检查由非空白字符包围的single>。由于问题中的字符串是连续的字符流,用\ b检查字边界是最简单的方法。
#1
1
You need to use lookarounds:
你需要使用lookarounds:
re.split(r'(?<!>)>(?!>)', 'ad>ad>ad>>ad')
See the regex demo
请参阅正则表达式演示
The (?<!>)>(?!>)
pattern only matches a >
that is not preceded with a <
(due to the negative lookbehind (?<!>)
) and that is not followed with a <
(due to the negative lookahead (?!>)
).
(? )>(?!>)模式只匹配一个>前面没有<(由于负面的后观(? ))并且后面没有<(由于负向前瞻(?!>))。
Since lookarounds do not consume the characters (unlike negated (and positive) character classes, like [^>]
), we only match and split on a <
symbol without "touching" the symbols around it.
由于lookarounds不消耗字符(不像否定(和正)字符类,如[^>]),我们只匹配并拆分 <符号而不“触摸”它周围的符号。< p>
#2
1
Try with \b>\b
尝试使用\ b> \ b
This will check for single >
surrounded by non-whitespace characters. As the string in the question is continuous stream of characters checking word boundary with \b
is simplest method.
这将检查由非空白字符包围的single>。由于问题中的字符串是连续的字符流,用\ b检查字边界是最简单的方法。