在匹配的正则表达式(python)的部分上拆分字符串

时间:2021-09-25 21:38:16

Say I have a string 'ad>ad>ad>>ad' and I want to split on this on the '>' (not the '>>' chars). Just picked up regex and was wondering if there is a way (special character) to split on a specific part of the matched expression, rather than splitting on the whole matched expression, for example the regex could be:

假设我有一个字符串'ad> ad> ad >> ad',我想在'>'(而不是'>>'字符)上拆分。刚刚拿起正则表达式并想知道是否有一种方法(特殊字符)可以在匹配表达式的特定部分上进行拆分,而不是拆分整个匹配表达式,例如正则表达式可能是:

re.split('[^>]>[^>]', 'ad>ad>ad>>ad')

Can you get it to split on the char in parenthesis [^>](>)[^>] ?

你能把它拆分成括号[^>](>)[^>]中的字符吗?

2 个解决方案

#1


1  

You need to use lookarounds:

你需要使用lookarounds:

re.split(r'(?<!>)>(?!>)', 'ad>ad>ad>>ad')

See the regex demo

请参阅正则表达式演示

The (?<!>)>(?!>) pattern only matches a > that is not preceded with a < (due to the negative lookbehind (?<!>)) and that is not followed with a < (due to the negative lookahead (?!>)).

(? )>(?!>)模式只匹配一个>前面没有<(由于负面的后观(? ))并且后面没有<(由于负向前瞻(?!>))。

Since lookarounds do not consume the characters (unlike negated (and positive) character classes, like [^>]), we only match and split on a < symbol without "touching" the symbols around it.

由于lookarounds不消耗字符(不像否定(和正)字符类,如[^>]),我们只匹配并拆分 <符号而不“触摸”它周围的符号。< p>

#2


1  

Try with \b>\b

尝试使用\ b> \ b

This will check for single > surrounded by non-whitespace characters. As the string in the question is continuous stream of characters checking word boundary with \b is simplest method.

这将检查由非空白字符包围的single>。由于问题中的字符串是连续的字符流,用\ b检查字边界是最简单的方法。

Regex101 Demo

#1


1  

You need to use lookarounds:

你需要使用lookarounds:

re.split(r'(?<!>)>(?!>)', 'ad>ad>ad>>ad')

See the regex demo

请参阅正则表达式演示

The (?<!>)>(?!>) pattern only matches a > that is not preceded with a < (due to the negative lookbehind (?<!>)) and that is not followed with a < (due to the negative lookahead (?!>)).

(? )>(?!>)模式只匹配一个>前面没有<(由于负面的后观(? ))并且后面没有<(由于负向前瞻(?!>))。

Since lookarounds do not consume the characters (unlike negated (and positive) character classes, like [^>]), we only match and split on a < symbol without "touching" the symbols around it.

由于lookarounds不消耗字符(不像否定(和正)字符类,如[^>]),我们只匹配并拆分 <符号而不“触摸”它周围的符号。< p>

#2


1  

Try with \b>\b

尝试使用\ b> \ b

This will check for single > surrounded by non-whitespace characters. As the string in the question is continuous stream of characters checking word boundary with \b is simplest method.

这将检查由非空白字符包围的single>。由于问题中的字符串是连续的字符流,用\ b检查字边界是最简单的方法。

Regex101 Demo