正则表达式需要匹配p标签内的任何内容

时间:2022-03-20 20:13:40

I need a regular expression to match anything that is within <p> tags so for example if I had some text:

我需要一个正则表达式来匹配

标签内的任何内容,例如,如果我有一些文本:

<p>Hello world</p>

The regex would match the Hello world part

正则表达式将匹配Hello world部分

3 个解决方案

#1


7  

in javascript:

var str = "<p>Hello world</p>";
str.search(/<\s*p[^>]*>([^<]*)<\s*\/\s*p\s*>/)

in php:

$str = "<p>Hello world</p>";
preg_match_all("/<\s*p[^>]*>([^<]*)<\s*\/\s*p\s*>/", $str);

These will match something as complex as this

这些将匹配像这样复杂的东西

< p style=  "font-weight: bold;" >Hello world  <  /  p >

#2


5  

EDIT: Don't do it. Just don't.

编辑:不要这样做。只是不要。

See this question

看到这个问题

If you insist, use <p>(.+?)</p> and the result will be in the first group. It is not perfect, but no regexp solution to HTML parsing problem will ever be.

如果你坚持,使用

(。+?) ,结果将在第一组。它并不完美,但HTML解析问题的regexp解决方案永远都不会。

E.g (in python)

例如(在python中)

>>> import re
>>> r = re.compile('<p>(.+?)</p>')
>>> r.findall("<p>fo o</p><p>ba adr</p>")
['fo o', 'ba adr']

#3


1  

Regex:

<([a-z][a-z0-9]*)\b[^>]*>(.*?)</\1>

This will work for any pair of tags.

这适用于任何一对标签。

e.g <p class="foo">hello<br/></p>

例如

你好

The \1 makes sure that the opening tag matches the closing tag.

\ 1确保开始标记与结束标记匹配。

The content between the tags is captured in \2.

标签之间的内容在\ 2中捕获。

#1


7  

in javascript:

var str = "<p>Hello world</p>";
str.search(/<\s*p[^>]*>([^<]*)<\s*\/\s*p\s*>/)

in php:

$str = "<p>Hello world</p>";
preg_match_all("/<\s*p[^>]*>([^<]*)<\s*\/\s*p\s*>/", $str);

These will match something as complex as this

这些将匹配像这样复杂的东西

< p style=  "font-weight: bold;" >Hello world  <  /  p >

#2


5  

EDIT: Don't do it. Just don't.

编辑:不要这样做。只是不要。

See this question

看到这个问题

If you insist, use <p>(.+?)</p> and the result will be in the first group. It is not perfect, but no regexp solution to HTML parsing problem will ever be.

如果你坚持,使用

(。+?) ,结果将在第一组。它并不完美,但HTML解析问题的regexp解决方案永远都不会。

E.g (in python)

例如(在python中)

>>> import re
>>> r = re.compile('<p>(.+?)</p>')
>>> r.findall("<p>fo o</p><p>ba adr</p>")
['fo o', 'ba adr']

#3


1  

Regex:

<([a-z][a-z0-9]*)\b[^>]*>(.*?)</\1>

This will work for any pair of tags.

这适用于任何一对标签。

e.g <p class="foo">hello<br/></p>

例如

你好

The \1 makes sure that the opening tag matches the closing tag.

\ 1确保开始标记与结束标记匹配。

The content between the tags is captured in \2.

标签之间的内容在\ 2中捕获。