使用正则表达式解析方括号

I've always had a difficult time with regular expressions. I've searched for help with this, but I can't quite find what I'm looking for.

我一直很难用正则表达式。我一直在寻求帮助,但我找不到我想要的东西。

I have blocks of text that follow this pattern:

我有以下模式的文本块:

[php] ... any type of code sample here [/php]

[php] ...这里有任何类型的代码示例[/ php]

I need to:

我需要:

check for the square brackets, which can contain any number of 20-30 programming language names (php, ruby, etc.).

检查方括号,它可以包含任意数量的20-30个编程语言名称(php,ruby等)。

need to grab all code in between the opening and closing bracket.

需要抓住开始和结束括号之间的所有代码。

I have worked out the following regular expression:

我制定了以下正则表达式:

#\[([a-z]+)\]([^\[/]*)\[/([a-z]+)\]#i

Which matches everything pretty well. However, it breaks when the code sample contains square brackets. How do I modify it so that any character between those opening/closing braces will be matched for later use?

这很好地匹配了一切。但是,当代码示例包含方括号时,它会中断。如何修改它以使这些打开/关闭括号之间的任何字符匹配以供以后使用?

5 个解决方案

#1

This is the regex you want. It matches where the tags are even too, so a php tag will only end a php tag.

这是你想要的正则表达式。它匹配标签的位置,所以php标签只会结束一个php标签。

/\[(\w+)\](.*?)\[\/\1\]/s

Or if you wanted to explicitly match the tags you could use...

或者如果你想明确匹配你可以使用的标签......

$langs = array('php', 'python', ...); 

$langs = implode('|', array_map('preg_quote', $langs));

preg_match_all('/\[(' . $langs . ')\](.*?)\[\/\1\]/s', $str, $matches);

#2

The following will work:

以下将有效:

\[([a-z]+)\].*\[/\1\]

If you don't want to remove the greediness, you can do:

如果你不想消除贪婪,你可以这样做:

\[([a-z]+)\].*?\[/\1\]

All you have to do is to check that both the closing and opening tags have the same text (in this case, that both are the same programming language), and you do that with \1, telling it to match the previously matched Group number 1: ([a-z]+)

您所要做的就是检查closing和opening标签是否具有相同的文本(在这种情况下,两者都是相同的编程语言),并使用\ 1进行操作,告诉它与先前匹配的组编号匹配1:([az] +)

#3

Why don't you use something like below:

你为什么不用下面的东西:

\[php\].*?\[/php\]

I don't understand why you want to use [a-z]+ for the tags, there should be php or a limited amount of other tags. Just keep it simple.

我不明白你为什么要使用[a-z] +作为标签,应该有php或有限数量的其他标签。保持简单。

Actually you can use:

其实你可以用:

\[(php)\].*?\[/(\1)\]

so that you can match the opening and closing tags. Otherwise you will be matching random opening and closing. Add others like, I don't know, js etc as php|js etc.

这样你就可以匹配开始和结束标签。否则,您将匹配随机开启和关闭。添加其他像,我不知道,js等作为PHP | js等。

#4

Use a backreference to refer to a match already made in the regular expression:

使用反向引用来引用已在正则表达式中进行的匹配:

\[(\w+)\].*?\[/\1\]

#5

Not sure which language you are using but following non-greedy regex should work for you:

不确定您使用的是哪种语言,但遵循非贪婪的正则表达式应该适合您:

#\[([a-z]+)\](.*?)\[/(\1)\]#i

Rather than looking for non-opening-square-bracket match everything until you get [ using non-greedy modifier .*?

而不是寻找非开放方括号匹配所有东西,直到你得到[使用非贪婪的修饰符。*?

#1