在HTML标签之间获取文本[复制]

时间:2022-04-22 21:47:08

This question already has an answer here:

这个问题已经有了答案:

Ok, This is a pretty basic question im sure but im new to PHP and haven't been able to figure it out. The input string is $data im trying to continue to pull and only use the first match. Is the below incorrect? This may not even be the best way to perform the action, im just trying to pull the contents in between two html tags (first set found) and discard the rest of the data. I know there are similar questions, ive read them all, my question is a mix, if theres a better way to do this and how i can define the match as the new input for the rest of the remaining code. If i change $matches to $data2 and use it from there on out it returns errors.

好吧,这是一个非常基本的问题,我确定,但我对PHP还不熟悉,还没弄明白。输入字符串为$data im,试图继续拉取,仅使用第一个匹配。以下是不正确的?这甚至可能不是执行操作的最佳方式,im只是试图在两个html标记(第一个发现)之间提取内容并丢弃其余数据。我知道有类似的问题,我都读过了,我的问题是混合的,如果有更好的方法,以及如何将匹配定义为剩余代码的新输入。如果我将$matches更改为$data2并从此处使用它,它将返回错误。

preg_match('/<h2>(.*?)<\/h2>/s', $data, $matches);

3 个解决方案

#1


12  

Using regular expressions is generally a good idea for your problem.

使用正则表达式通常是解决问题的好办法。

When you look at http://php.net/preg_match you see that $matches will be an array, since there may be more than one match. Try

当您查看http://php.net/preg_match时,您会看到$matches将是一个数组,因为可能有多个匹配项。试一试

print_r($matches);

to get an idea of how the result looks, and then pick the right index.

要了解结果的外观,然后选择正确的索引。

EDIT:

编辑:

If there is a match, then you can get the text extracted between the parenthesis-group with

如果有匹配项,则可以在括号组之间提取文本

print($matches[1]);

If you had more than one parenthesis-group they would be numbered 2, 3 etc. You should also consider the case when there is no match, in which case the array will have the size of 0.

如果有多个括号组,它们将被编号为2、3等。您还应该考虑没有匹配的情况,在这种情况下,数组的大小将为0。

#2


22  

Don't parse HTML via preg_match, use this PHP class instead:

不要通过preg_match解析HTML,而是使用这个PHP类:

The DOMDocument class

Example:

例子:

<?php 

$html= "<p>hi</p>
<h1>H1 title</h1>
<h2>H2 title</h2>
<h3>H2 title</h3>";
 // a new dom object 
 $dom = new domDocument('1.0', 'utf-8'); 
 // load the html into the object ***/ 
 $dom->loadHTML($html); 
 //discard white space 
 $dom->preserveWhiteSpace = false; 
 $hTwo= $dom->getElementsByTagName('h2'); // here u use your desired tag
 echo $hTwo->item(0)->nodeValue; 
 //will return "H2 title";
 ?>

Reference

参考

#3


0  

You could do it this way::

你可以这样做:

$h1 = preg_replace('/<h1[^>]*?>([\\s\\S]*?)<\/h1>/',
'\\1', $h1);

This will Strip off or unwrap the TEXT from the <H1></H1> HTML Tags

这将从

HTML标记中剥离或展开文本

#1


12  

Using regular expressions is generally a good idea for your problem.

使用正则表达式通常是解决问题的好办法。

When you look at http://php.net/preg_match you see that $matches will be an array, since there may be more than one match. Try

当您查看http://php.net/preg_match时,您会看到$matches将是一个数组,因为可能有多个匹配项。试一试

print_r($matches);

to get an idea of how the result looks, and then pick the right index.

要了解结果的外观,然后选择正确的索引。

EDIT:

编辑:

If there is a match, then you can get the text extracted between the parenthesis-group with

如果有匹配项,则可以在括号组之间提取文本

print($matches[1]);

If you had more than one parenthesis-group they would be numbered 2, 3 etc. You should also consider the case when there is no match, in which case the array will have the size of 0.

如果有多个括号组,它们将被编号为2、3等。您还应该考虑没有匹配的情况,在这种情况下,数组的大小将为0。

#2


22  

Don't parse HTML via preg_match, use this PHP class instead:

不要通过preg_match解析HTML,而是使用这个PHP类:

The DOMDocument class

Example:

例子:

<?php 

$html= "<p>hi</p>
<h1>H1 title</h1>
<h2>H2 title</h2>
<h3>H2 title</h3>";
 // a new dom object 
 $dom = new domDocument('1.0', 'utf-8'); 
 // load the html into the object ***/ 
 $dom->loadHTML($html); 
 //discard white space 
 $dom->preserveWhiteSpace = false; 
 $hTwo= $dom->getElementsByTagName('h2'); // here u use your desired tag
 echo $hTwo->item(0)->nodeValue; 
 //will return "H2 title";
 ?>

Reference

参考

#3


0  

You could do it this way::

你可以这样做:

$h1 = preg_replace('/<h1[^>]*?>([\\s\\S]*?)<\/h1>/',
'\\1', $h1);

This will Strip off or unwrap the TEXT from the <H1></H1> HTML Tags

这将从

HTML标记中剥离或展开文本