如何使用正则表达式循环分隔标记?

时间:2023-02-01 21:47:26

How can I create a regular expression that will grab delimited text from a string? For example, given a string like

如何创建一个可以从字符串中抓取分隔文本的正则表达式?例如,给定一个字符串

text ###token1### text text ###token2### text text 

I want a regex that will pull out ###token1###. Yes, I do want the delimiter as well. By adding another group, I can get both:

我想要一个正在引出### token1 ###的正则表达式。是的,我也想要分隔符。通过添加另一个组,我可以得到两个:

(###(.+?)###)

7 个解决方案

#1


4  

/###(.+?)###/

if you want the ###'s then you need

如果你想要###,那么你需要

/(###.+?###)/

the ? means non greedy, if you didn't have the ?, then it would grab too much.

的?意思是非贪婪,如果你没有?,那就太抢了。

e.g. '###token1### text text ###token2###' would all get grabbed.

例如'### token1 ### text text ### token2 ###'将被抓住。

My initial answer had a * instead of a +. * means 0 or more. + means 1 or more. * was wrong because that would allow ###### as a valid thing to find.

我最初的答案是*而不是+。 *表示0或更多。 +表示1或更多。 *是错误的,因为这将允许######作为有效的东西找到。

For playing around with regular expressions. I highly recommend http://www.weitz.de/regex-coach/ for windows. You can type in the string you want and your regular expression and see what it's actually doing.

用于玩正则表达式。我强烈推荐http://www.weitz.de/regex-coach/用于Windows。您可以输入所需的字符串和正则表达式,看看它实际上在做什么。

Your selected text will be stored in \1 or $1 depending on where you are using your regular expression.

您选择的文本将存储在\ 1或$ 1中,具体取决于您使用正则表达式的位置。

#2


1  

In Perl, you actually want something like this:

在Perl中,你实际上想要这样的东西:

$text = 'text ###token1### text text ###token2### text text';

while($text =~ m/###(.+?)###/g) {
  print $1, "\n";
}

Which will give you each token in turn within the while loop. The (.*?) ensures that you get the shortest bit between the delimiters, preventing it from thinking the token is 'token1### text text ###token2'.

这将在while循环中依次为您提供每个令牌。 (。*?)确保您获得分隔符之间的最短位,防止它认为令牌是'token1 ### text text ### token2'。

Or, if you just want to save them, not loop immediately:

或者,如果您只想保存它们,请不要立即循环:

@tokens = $text =~ m/###(.+?)###/g;

#3


0  

Assuming you want to match ###token2### as well...

假设你想匹配### token2 ###以及......

/###.+###/

#4


0  

Check out Regex Buddy Jeff has recomended it several times http://www.codinghorror.com/blog/archives/000027.html

查看Regex Buddy Jeff多次推荐它http://www.codinghorror.com/blog/archives/000027.html

#5


0  

Here is a good site as well, you can roll through all the tutorials and get yourself versed with Regex's.

这里也是一个很好的网站,你可以浏览所有的教程,让自己熟悉正则表达式。

http://www.regular-expressions.info/

#6


0  

Use () and \x. A naive example that assumes the text within the tokens is always delimited by #:

使用()和\ x。假设标记内的文本始终由#分隔的一个简单示例:

text (#+.+#+) text text (#+.+#+) text text

The stuff in the () can then be grabbed by using \1 and \2 (\1 for the first set, \2 for the second in the replacement expression (assuming you're doing a search/replace in an editor). For example, the replacement expression could be:

然后可以使用\ 1和\ 2来抓取()中的内容(第一个是\ 1,替换表达式中的第二个是\ 2)(假设您在编辑器中进行搜索/替换)。例如,替换表达式可以是:

token1: \1, token2: \2

For the above example, that should produce:

对于上面的例子,那应该产生:

token1: ###token1###, token2: ###token2###

If you're using a regexp library in a program, you'd presumably call a function to get at the contents first and second token, which you've indicated with the ()s around them.

如果你在一个程序中使用正则表达式库,你可能会调用一个函数来获取内容的第一个和第二个标记,你用它们周围的()表示。

#7


0  

Well when you are using delimiters such as this basically you just grab the first one then anything that does not match the ending delimiter followed by the ending delimiter. A special caution should be that in cases as the example above [^#] would not work as checking to ensure the end delimiter is not there since a singe # would cause the regex to fail (ie. "###foo#bar###). In the case above the regex to parse it would be the following assuming empty tokens are allowed (if not, change * to +):

好吧,当你使用这样的分隔符时,基本上你只需抓住第一个分隔符,然后是任何与结束分隔符不匹配的结尾分隔符。需要特别注意的是,如果上面的示例[^#]不能用于检查以确保结束分隔符不存在,因为单个#会导致正则表达式失败(即“### foo #bar# ##)。在上面的情况下,解析它的正则表达式将是以下假设允许空标记(如果不是,将*更改为+):

###([^#]|#[^#]|##[^#])*###

#1


4  

/###(.+?)###/

if you want the ###'s then you need

如果你想要###,那么你需要

/(###.+?###)/

the ? means non greedy, if you didn't have the ?, then it would grab too much.

的?意思是非贪婪,如果你没有?,那就太抢了。

e.g. '###token1### text text ###token2###' would all get grabbed.

例如'### token1 ### text text ### token2 ###'将被抓住。

My initial answer had a * instead of a +. * means 0 or more. + means 1 or more. * was wrong because that would allow ###### as a valid thing to find.

我最初的答案是*而不是+。 *表示0或更多。 +表示1或更多。 *是错误的,因为这将允许######作为有效的东西找到。

For playing around with regular expressions. I highly recommend http://www.weitz.de/regex-coach/ for windows. You can type in the string you want and your regular expression and see what it's actually doing.

用于玩正则表达式。我强烈推荐http://www.weitz.de/regex-coach/用于Windows。您可以输入所需的字符串和正则表达式,看看它实际上在做什么。

Your selected text will be stored in \1 or $1 depending on where you are using your regular expression.

您选择的文本将存储在\ 1或$ 1中,具体取决于您使用正则表达式的位置。

#2


1  

In Perl, you actually want something like this:

在Perl中,你实际上想要这样的东西:

$text = 'text ###token1### text text ###token2### text text';

while($text =~ m/###(.+?)###/g) {
  print $1, "\n";
}

Which will give you each token in turn within the while loop. The (.*?) ensures that you get the shortest bit between the delimiters, preventing it from thinking the token is 'token1### text text ###token2'.

这将在while循环中依次为您提供每个令牌。 (。*?)确保您获得分隔符之间的最短位,防止它认为令牌是'token1 ### text text ### token2'。

Or, if you just want to save them, not loop immediately:

或者,如果您只想保存它们,请不要立即循环:

@tokens = $text =~ m/###(.+?)###/g;

#3


0  

Assuming you want to match ###token2### as well...

假设你想匹配### token2 ###以及......

/###.+###/

#4


0  

Check out Regex Buddy Jeff has recomended it several times http://www.codinghorror.com/blog/archives/000027.html

查看Regex Buddy Jeff多次推荐它http://www.codinghorror.com/blog/archives/000027.html

#5


0  

Here is a good site as well, you can roll through all the tutorials and get yourself versed with Regex's.

这里也是一个很好的网站,你可以浏览所有的教程,让自己熟悉正则表达式。

http://www.regular-expressions.info/

#6


0  

Use () and \x. A naive example that assumes the text within the tokens is always delimited by #:

使用()和\ x。假设标记内的文本始终由#分隔的一个简单示例:

text (#+.+#+) text text (#+.+#+) text text

The stuff in the () can then be grabbed by using \1 and \2 (\1 for the first set, \2 for the second in the replacement expression (assuming you're doing a search/replace in an editor). For example, the replacement expression could be:

然后可以使用\ 1和\ 2来抓取()中的内容(第一个是\ 1,替换表达式中的第二个是\ 2)(假设您在编辑器中进行搜索/替换)。例如,替换表达式可以是:

token1: \1, token2: \2

For the above example, that should produce:

对于上面的例子,那应该产生:

token1: ###token1###, token2: ###token2###

If you're using a regexp library in a program, you'd presumably call a function to get at the contents first and second token, which you've indicated with the ()s around them.

如果你在一个程序中使用正则表达式库,你可能会调用一个函数来获取内容的第一个和第二个标记,你用它们周围的()表示。

#7


0  

Well when you are using delimiters such as this basically you just grab the first one then anything that does not match the ending delimiter followed by the ending delimiter. A special caution should be that in cases as the example above [^#] would not work as checking to ensure the end delimiter is not there since a singe # would cause the regex to fail (ie. "###foo#bar###). In the case above the regex to parse it would be the following assuming empty tokens are allowed (if not, change * to +):

好吧,当你使用这样的分隔符时,基本上你只需抓住第一个分隔符,然后是任何与结束分隔符不匹配的结尾分隔符。需要特别注意的是,如果上面的示例[^#]不能用于检查以确保结束分隔符不存在,因为单个#会导致正则表达式失败(即“### foo #bar# ##)。在上面的情况下,解析它的正则表达式将是以下假设允许空标记(如果不是,将*更改为+):

###([^#]|#[^#]|##[^#])*###