如何用正则表达式解析html标签？

I wanted to parse following html tags contents retrieved through curl by regular expressions.

我想通过正则表达式解析通过curl检索的html标签内容。

<span class='ui-allscores'>IND - 203/9 (49.4 Ovs)</span>

so that output will be "IND - 203/9 (49.4 Ovs)".

所以输出将是“IND - 203/9（49.4 Ovs）”。

I have written following code but it is not working.please help.

我写了下面的代码，但它没有工作。请帮助。

$one="<span class='ui-allscores'>IND - 203/9 (49.4 Ovs)</span>";
$five="~(?<=<span class='ui-allscores'>)[.]*(?=</br></span>)~";
preg_match_all($five,$one,$ui);
print_r($ui);

3 个解决方案

#1

Try this one:

试试这个：

$string = "<span class='ui-allscores'>IND - 203/9 (49.4 Ovs)</span>";

Dynamic span tag:

动态范围标记：

preg_match('/<span[^>]*>(.*?)<\/span>/si', $string, $matches);

Specific span tag:

特定范围标记：

preg_match("/<span class='ui-allscores'>(.*?)<\/span>/si", $string, $matches);

// Output
array (size=2)
  0 => string '<span class='ui-allscores'>IND - 203/9 (49.4 Ovs)</span>' (length=56)
  1 => string 'IND - 203/9 (49.4 Ovs)' (length=22)

#2

If you simply want to remove the HTML tags, Use the php built-in function strip_tags to remove the html tags.

如果您只是想删除HTML标记，请使用php内置函数strip_tags删除html标记。

Another answer on removing html tags Strip all HTML tags, except allowed

关于删除html标签的另一个答案除了允许之外，删除所有HTML标签

#3

The problem of your regex is the [.] part. This is matching only a literal ., because the dot is written inside a character class. So just remove the square brackets.

正则表达式的问题是[。]部分。这只匹配文字。因为点是写在字符类中的。所以只需删除方括号。

 $five="~(?<=<span class='ui-allscores'>).*(?=</br></span>)~";

The next problem then is the greediness of *. You can change this matching behaviour by putting a ? behind.

接下来的问题是*的贪婪。您可以通过放置？来更改此匹配行为背后。

$five="~(?<=<span class='ui-allscores'>).*?(?=</br></span>)~";

But the overall point is: You should most probably use a html parser for this job!

但总的问题是：你最有可能使用html解析器来完成这项工作！

See How do you parse and process HTML/XML in PHP?

#1

Try this one:

试试这个：

$string = "<span class='ui-allscores'>IND - 203/9 (49.4 Ovs)</span>";

Dynamic span tag:

动态范围标记：

preg_match('/<span[^>]*>(.*?)<\/span>/si', $string, $matches);

Specific span tag:

特定范围标记：

preg_match("/<span class='ui-allscores'>(.*?)<\/span>/si", $string, $matches);

// Output
array (size=2)
  0 => string '<span class='ui-allscores'>IND - 203/9 (49.4 Ovs)</span>' (length=56)
  1 => string 'IND - 203/9 (49.4 Ovs)' (length=22)

#2

If you simply want to remove the HTML tags, Use the php built-in function strip_tags to remove the html tags.

如果您只是想删除HTML标记，请使用php内置函数strip_tags删除html标记。

Another answer on removing html tags Strip all HTML tags, except allowed

关于删除html标签的另一个答案除了允许之外，删除所有HTML标签

#3

The problem of your regex is the [.] part. This is matching only a literal ., because the dot is written inside a character class. So just remove the square brackets.

正则表达式的问题是[。]部分。这只匹配文字。因为点是写在字符类中的。所以只需删除方括号。

 $five="~(?<=<span class='ui-allscores'>).*(?=</br></span>)~";

The next problem then is the greediness of *. You can change this matching behaviour by putting a ? behind.

接下来的问题是*的贪婪。您可以通过放置？来更改此匹配行为背后。

$five="~(?<=<span class='ui-allscores'>).*?(?=</br></span>)~";

But the overall point is: You should most probably use a html parser for this job!

但总的问题是：你最有可能使用html解析器来完成这项工作！

秒客网

如何用正则表达式解析html标签？

3 个解决方案

#1

#2

#3

See How do you parse and process HTML/XML in PHP?

#1

#2

#3

See How do you parse and process HTML/XML in PHP?

相关文章