在RegExp中使用星号来提取由特定模式包围的数据

时间:2022-09-13 13:17:55

I have an text that consists of information enclosed by a certain pattern. The only thing I know is the pattern: "${template.start}" and ${template.end} To keep it simple I will substitute ${template.start} and ${template.end} with "a" in the example.

我有一个由特定模式包含的信息组成的文本。我唯一知道的是模式:“$ {template.start}”和$ {template.end}为了简单起见,我将$ {template.start}和$ {template.end}替换为“a”例。

So one entry in the text would be:

因此,案文中的一个条目是:

aINFORMATIONHEREa

I do not know how many of these entries are concatenated in the text. So the following is correct too:

我不知道这些条目中有多少是在文本中连接起来的。所以以下也是正确的:

aFOOOOOOaaASDADaaASDSDADa

I want to write a regular expression to extract the information enclosed by the "a"s.

我想写一个正则表达式来提取“a”所包含的信息。

My first attempt was to do:

我的第一次尝试是:

a(.*)a

which works as long as there is only one entry in the text. As soon as there are more than one entries it failes, because of the .* matching everything. So using a(.*)a on aFOOOOOOaaASDADaaASDSDADa results in only one capturing group containing everything between the first and the last character of the text which are "a":

只要文本中只有一个条目,它就可以工作。一旦有多个条目就会失败,因为。*匹配所有内容。因此,在aFOOOOOOaaASDADaaASDSDADa上使用(。*)a只会导致一个捕获组包含文本的第一个和最后一个字符之间的所有内容,即“a”:

FOOOOOOaaASDADaaASDSDAD

What I want to get is something like

我想要得到的是类似的东西

captureGroup(0):  aFOOOOOOaaASDADaaASDSDADa
captureGroup(1): FOOOOOO
captureGroup(2): ASDAD
captureGroup(3): ASDSDAD

It would be great to being able to extract each entry out of the text and from each entry the information that is enclosed between the "a"s. By the way I am using the QRegExp class of Qt4.

能够从文本中提取每个条目并从每个条目中提取“a”之间包含的信息将是很棒的。顺便说一句,我正在使用Qt4的QRegExp类。

Any hints? Thanks! Markus

任何提示?谢谢!马库斯


Multiple variation of this question have been seen before. Various related discussions:

之前已经看到过这个问题的多种变化。各种相关讨论:

and probably others...

可能还有其他人......

3 个解决方案

#1


Simply use non-greedy expressions, namely:

只需使用非贪婪的表达式,即:

a(.*?)a

#2


You need to match something like:

你需要匹配以下内容:

a[^a]*a

#3


You have a couple of working answers already, but I'll add a little gratuitous advice:

你已经有了几个工作答案,但我会添加一些无偿的建议:

Using regular expressions for parsing is a road fraught with danger

使用正则表达式进行解析是一条充满危险的道路

Edit: To be less cryptic: for all there power, flexibility and elegance, regular expression are not sufficiently expressive to describe any but the simplest grammars. Ther are adequate for the problem asked here, but are not a suitable replacement for state machine or recursive decent parsers if the input language become more complicated.

编辑:不那么神秘:尽管有力量,灵活性和优雅,正则表达式不足以描述除最简单的语法之外的任何语法。这对于这里提出的问题是足够的,但如果输入语言变得更复杂,则不适合替代状态机或递归正常解析器。

SO, choosing to use RE for parsing input streams is a decision that should be made with care and with an eye towards the future.

因此,选择使用RE来解析输入流是一个应该谨慎做出并着眼于未来的决定。

#1


Simply use non-greedy expressions, namely:

只需使用非贪婪的表达式,即:

a(.*?)a

#2


You need to match something like:

你需要匹配以下内容:

a[^a]*a

#3


You have a couple of working answers already, but I'll add a little gratuitous advice:

你已经有了几个工作答案,但我会添加一些无偿的建议:

Using regular expressions for parsing is a road fraught with danger

使用正则表达式进行解析是一条充满危险的道路

Edit: To be less cryptic: for all there power, flexibility and elegance, regular expression are not sufficiently expressive to describe any but the simplest grammars. Ther are adequate for the problem asked here, but are not a suitable replacement for state machine or recursive decent parsers if the input language become more complicated.

编辑:不那么神秘:尽管有力量,灵活性和优雅,正则表达式不足以描述除最简单的语法之外的任何语法。这对于这里提出的问题是足够的,但如果输入语言变得更复杂,则不适合替代状态机或递归正常解析器。

SO, choosing to use RE for parsing input streams is a decision that should be made with care and with an eye towards the future.

因此,选择使用RE来解析输入流是一个应该谨慎做出并着眼于未来的决定。