正则表达式匹配外括号

I need a regular expression to select all the text between two outer brackets.

我需要一个正则表达式来选择两个外括号之间的所有文本。

Example: some text(text here(possible text)text(possible text(more text)))end text

示例:一些文本(此处为文本(可能的文本)文本(可能的文本(更多文本)))结束文本

Result: (text here(possible text)text(possible text(more text)))

结果:(这里的文字(可能的文字)文字(可能的文字(更多文字)))

I've been trying for hours, mind you my regular expression knowledge isn't what I'd like it to be :-) so any help will be gratefully received.

我一直在努力工作几个小时,请注意我的正则表达知识不是我想要的:-)所以任何帮助都会感激不尽。

15 个解决方案

#1

115

Regular expressions are the wrong tool for the job because you are dealing with nested structures, i.e. recursion.

正则表达式是该作业的错误工具,因为您正在处理嵌套结构,即递归。

But there is a simple algorithm to do this, which I described in this answer to a previous question.

但是有一个简单的算法可以做到这一点,我在前一个问题的答案中对此进行了描述。

#2

You can use regex recursion:

你可以使用正则表达式递归:

\(([^()]|(?R))*\)

#3

I want to add this answer for quickreference. Feel free to update.

我想添加这个答案以便快速参考。随意更新。

.NET Regex using balancing groups.

.NET Regex使用平衡组。

\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\)

Where c is used as the depth counter.

其中c用作深度计数器。

Demo at Regexstorm.com

在Regexstorm.com上演示

Stack Overflow: Using RegEx to balance match parenthesis

堆栈溢出:使用RegEx平衡匹配括号

Wes' Puzzling Blog: Matching Balanced Constructs with .NET Regular Expressions

Wes'令人费解的博客:使用.NET正则表达式匹配平衡构造

Greg Reinacker's Weblog: Nested Constructs in Regular Expressions

Greg Reinacker的Weblog:正则表达式中的嵌套构造

PCRE using a recursive pattern.

PCRE使用递归模式。

\((?>[^)(]+|(?R))*+\)

Demo at regex101; Or without alternation:

在regex101演示;或者没有替换:

\((?>[^)(]*(?R)?)*+\)

Demo at regex101; Or unrolled for performance:

在regex101演示;或展开性能:

\([^)(]*(?:(?R)[^)(]*)*+\)

Demo at regex101; The pattern is pasted at (?R) which represents (?0).

在regex101演示;将图案粘贴在(ΔR)处,表示(?0)。

Perl, PHP, Notepad++, R: perl=TRUE, Python: Regex package with (?V1) for Perl behaviour.

Perl,PHP,Notepad ++,R:perl = TRUE,Python:带有(?V1)的Regex包用于Perl行为。

Ruby using subexpression calls.

Ruby使用子表达式调用。

With Ruby 2.0 \g<0> can be used to call full pattern.

使用Ruby 2.0 \ g <0>可用于调用完整模式。

\((?>[^)(]+|\g<0>)*\)

Demo at Rubular; Ruby 1.9 only supports capturing group recursion:

在Rubular演示; Ruby 1.9仅支持捕获组递归:

(\((?>[^)(]+|\g<1>)*\))

Demo at Rubular (atomic grouping since Ruby 1.9.3)

Rubular演示(Ruby 1.9.3以来的原子分组)

JavaScript API :: XRegExp.matchRecursive

JavaScript API :: XRegExp.matchRecursive

XRegExp.matchRecursive(str, '\\(', '\\)', 'g');

JS, Java and other regex flavors without recursion up to 2 levels of nesting:

JS,Java和其他正则表达式,没有递归,最多可达2级嵌套:

\((?:[^)(]+|\((?:[^)(]+|\([^)(]*\))*\))*\)

Demo at regex101. Deeper nesting needs to be added to pattern.
To fail faster on unbalanced parenthesis drop the + quantifier.

在regex101演示。需要将更深的嵌套添加到模式中。要在不平衡的括号上更快地失败,请删除+量词。

Java: An interesting idea using forward references by @jaytea.

Java:使用@jaytea的前向引用的一个有趣的想法。

_{Reference - What does this regex mean?}

参考 - 这个正则表达式意味着什么?

rexegg.com - Recursive Regular Expressions

rexegg.com - 递归正则表达式

Regular-Expressions.info - Regular Expression Recursion

Regular-Expressions.info - 正则表达式递归

#4

[^\(]*(\(.*\))[^\)]*

[^\(]* matches everything that isn't an opening bracket at the beginning of the string, (\(.*\)) captures the required substring enclosed in brackets, and [^\)]* matches everything that isn't a closing bracket at the end of the string. Note that this expression does not attempt to match brackets; a simple parser (see dehmann's answer) would be more suitable for that.

[^ \(] *匹配字符串开头不是左括号的所有内容,(\(。* \))捕获括号中括起来的所需子字符串,[^ \]] *匹配所有的东西。在字符串末尾的一个闭合括号。请注意,此表达式不会尝试匹配括号;一个简单的解析器(见dehmann的答案)会更适合它。

#5

(?<=\().*(?=\))

If you want to select text between two matching parentheses, you are out of luck with regular expressions. This is impossible^(*).

如果要在两个匹配的括号中选择文本,则表示您对正则表达式不满意。这是不可能的(*)。

This regex just returns the text between the first opening and the last closing parentheses in your string.

此正则表达式只返回字符串中第一个开头和最后一个右括号之间的文本。

^(*) Unless your regex engine has features like balancing groups or recursion. The number of engines that support such features is slowly growing, but they are still not a commonly available.

(*)除非您的正则表达式引擎具有平衡组或递归等功能。支持此类功能的引擎数量正在缓慢增长,但它们仍然不常用。

#6

It is actually possible to do it using .NET regular expressions, but it is not trivial, so read carefully.

实际上可以使用.NET正则表达式来完成它,但这并非易事,所以请仔细阅读。

You can read a nice article here. You also may need to read up on .NET regular expressions. You can start reading here.

你可以在这里阅读一篇好文章。您还可能需要阅读.NET正则表达式。你可以在这里开始阅读。

Angle brackets <> were used because they do not require escaping.

使用尖括号<>因为它们不需要转义。

The regular expression looks like this:

正则表达式如下所示:

<[^<>]*(    (        (?<Open><)        [^<>]*    )+    (        (?<Close-Open>>)        [^<>]*    )+)*(?(Open)(?!))>

#7

This answer explains the theoretical limitation of why regular expressions are not the right tool for this task.

这个答案解释了为什么正则表达式不适合这项任务的理论限制。

Regular expressions can not do this.

正则表达式不能这样做。

Regular expressions are based on a computing model known as Finite State Automata (FSA). As the name indicates, a FSA can remember only the current state, it has no information about the previous states.

正则表达式基于称为有限状态自动机(FSA)的计算模型。如名称所示,FSA只能记住当前状态,它没有关于先前状态的信息。

In the above diagram, S1 and S2 are two states where S1 is the starting and final step. So if we try with the string 0110 , the transition goes as follows:

在上图中,S1和S2是两个状态,其中S1是开始和最后步骤。因此,如果我们尝试使用字符串0110,则转换如下:

      0     1     1     0-> S1 -> S2 -> S2 -> S2 ->S1

In the above steps, when we are at second S2 i.e. after parsing 01 of 0110, the FSA has no information about the previous 0 in 01 as it can only remember the current state and the next input symbol.

在上述步骤中,当我们处于第二S2时,即在解析01的01之后,FSA没有关于01中前一个0的信息,因为它只能记住当前状态和下一个输入符号。

In the above problem, we need to know the no of opening parenthesis; this means it has to be stored at some place. But since FSAs can not do that, a regular expression can not be written.

在上面的问题中,我们需要知道左括号的否;这意味着它必须存储在某个地方。但由于FSA无法做到这一点,因此无法编写正则表达式。

However, an algorithm can be written to achieve the goal. Algorithms are generally falls under Pushdown Automata (PDA). PDA is one level above of FSA. PDA has an additional stack to store something. PDAs can be used to solve the above problem, because we can 'push' the opening parenthesis in the stack and 'pop' them once we encounter a closing parenthesis. If at the end, stack is empty, then opening parenthesis and closing parenthesis matches. Otherwise not.

但是,可以编写算法来实现目标。算法通常属于下推自动机(PDA)。 PDA比FSA高出一级。 PDA有一个额外的堆栈来存储东西。 PDA可用于解决上述问题,因为我们可以“推”堆栈中的左括号,并在遇到右括号时“弹出”它们。如果最后,stack为空,则打开括号和右括号匹配。否则不是。

A detailed discussion can be found here.

详细讨论可以在这里找到。

#8

This is the definitive regex:

这是权威的正则表达式:

\((?<arguments> (    ([^\(\)']*) |    (\([^\(\)']*\)) |  '(.*?)')*)\)

Example:

input: ( arg1, arg2, arg3, (arg4), '(pip' )output: arg1, arg2, arg3, (arg4), '(pip'

note that the '(pip' is correctly managed as string.(tried in regulator: http://sourceforge.net/projects/regulator/)

请注意'(pip'被正确管理为字符串。(在监管机构中试过:http://sourceforge.net/projects/regulator/)

#9

I have written a little javascript library called balanced to help with this task, you can accomplish this by doing

我写了一个名为balanced的小javascript库来帮助完成这项任务,你可以通过这样做完成

balanced.matches({    source: source,    open: '(',    close: ')'});

you can even do replacements

你甚至可以做替换

balanced.replacements({    source: source,    open: '(',    close: ')',    replace: function (source, head, tail) {        return head + source + tail;    }});

heres a more complex and interactive example JSFiddle

这是一个更复杂和互动的例子JSFiddle

#10

The regular expression using Ruby (version 1.9.3 or above):

使用Ruby(1.9.3或更高版本)的正则表达式:

/(?<match>\((?:\g<match>|[^()]++)*\))/

Demo on rubular

在rubular上演示

#11

so you need first and last parenthess, use smth like thisstr.indexOf('('); - it will give you first occurancestr.lastIndexOf(')'); - last one

所以你需要第一个和最后一个parenthess,使用smth就像这个.str.indexOf('('); - 它会给你第一次出现的事情.lastIndexOf(')'); - 最后一个

so u need string between, String searchedString = str.substring(str1.indexOf('('),str1.lastIndexOf(')');

所以你需要之间的字符串,String searchingString = str.substring(str1.indexOf('('),str1.lastIndexOf(')');

#12

Here is a customizable solution allowing single character literal delimiters in Java:

这是一个可自定义的解决方案,允许使用Java中的单字符文字分隔符:

public static List<String> getBalancedSubstrings(String s, Character markStart,                                  Character markEnd, Boolean includeMarkers) {        List<String> subTreeList = new ArrayList<String>();        int level = 0;        int lastOpenDelimiter = -1;        for (int i = 0; i < s.length(); i++) {            char c = s.charAt(i);            if (c == markStart) {                level++;                if (level == 1) {                    lastOpenDelimiter = (includeMarkers ? i : i + 1);                }            }            else if (c == markEnd) {                if (level == 1) {                    subTreeList.add(s.substring(lastOpenDelimiter, (includeMarkers ? i + 1 : i)));                }                if (level > 0) level--;            }        }        return subTreeList;    }}

Sample usage:

String s = "some text(text here(possible text)text(possible text(more text)))end text";List<String> balanced = getBalancedSubstrings(s, '(', ')', true);System.out.println("Balanced substrings:\n" + balanced);// => [(text here(possible text)text(possible text(more text)))]

#13

The answer depends on whether you need to match matching sets of brackets, or merely the first open to the last close in the input text.

答案取决于您是否需要匹配匹配的括号组,或者仅仅是输入文本中的第一个打开到最后一个关闭。

If you need to match matching nested brackets, then you need something more than regular expressions. - see @dehmann

如果您需要匹配匹配的嵌套括号,那么您需要的不仅仅是正则表达式。 - 见@dehmann

If it's just first open to last close see @Zach

如果它只是第一次打开到最后关闭,请参阅@Zach

Decide what you want to happen with:

决定你想要发生什么:

abc ( 123 ( foobar ) def ) xyz ) ghij

You need to decide what your code needs to match in this case.

在这种情况下,您需要确定代码需要匹配的内容。

#14

"""Here is a simple python program showing how to use regularexpressions to write a paren-matching recursive parser.This parser recognises items enclosed by parens, brackets,braces and <> symbols, but is adaptable to any set ofopen/close patterns.  This is where the re package greatlyassists in parsing. """import re# The pattern below recognises a sequence consisting of:#    1. Any characters not in the set of open/close strings.#    2. One of the open/close strings.#    3. The remainder of the string.# # There is no reason the opening pattern can't be the# same as the closing pattern, so quoted strings can# be included.  However quotes are not ignored inside# quotes.  More logic is needed for that....pat = re.compile("""    ( .*? )    ( \( | \) | \[ | \] | \{ | \} | \< | \> |                           \' | \" | BEGIN | END | $ )    ( .* )    """, re.X)# The keys to the dictionary below are the opening strings,# and the values are the corresponding closing strings.# For example "(" is an opening string and ")" is its# closing string.matching = { "(" : ")",             "[" : "]",             "{" : "}",             "<" : ">",             '"' : '"',             "'" : "'",             "BEGIN" : "END" }# The procedure below matches string s and returns a# recursive list matching the nesting of the open/close# patterns in s.def matchnested(s, term=""):    lst = []    while True:        m = pat.match(s)        if m.group(1) != "":            lst.append(m.group(1))        if m.group(2) == term:            return lst, m.group(3)        if m.group(2) in matching:            item, s = matchnested(m.group(3), matching[m.group(2)])            lst.append(m.group(2))            lst.append(item)            lst.append(matching[m.group(2)])        else:            raise ValueError("After <<%s %s>> expected %s not %s" %                             (lst, s, term, m.group(2)))# Unit test.if __name__ == "__main__":    for s in ("simple string",              """ "double quote" """,              """ 'single quote' """,              "one'two'three'four'five'six'seven",              "one(two(three(four)five)six)seven",              "one(two(three)four)five(six(seven)eight)nine",              "one(two)three[four]five{six}seven<eight>nine",              "one(two[three{four<five>six}seven]eight)nine",              "oneBEGINtwo(threeBEGINfourENDfive)sixENDseven",              "ERROR testing ((( mismatched ))] parens"):        print "\ninput", s        try:            lst, s = matchnested(s)            print "output", lst        except ValueError as e:            print str(e)    print "done"

#15

This one also worked

这个也有效

re.findall(r'\(.+\)', s)

#1

115