I have div which contain other html tags
along with text
我有div包含其他html标签以及文本
I want to extract only text from this div OR inside all html tags
我想从所有html标签中的这个div OR中仅提取文本
<div class="rpr-help m-chm">
<div class="header">
<h2 class="h6">Repair Help</h2>
</div><!-- /end .header -->
<div class="inner m-bsc">
<ul>
<li><a href="#videol">Repair Video</a></li>
<li><a href="#qa1">Repair Q&A</a></li>
</ul>
</div>
<div>
<br>
<span class="h4">Cross Reference Information</span><br>
<p>Part Number 285753A (AP3963893) replaces 1195967, 280152, 285140, 285743, 285753, 3352470, 3363664, 3364002, 3364003, 62672, 62693, 661560, 80008, 8559748, AH1485646, EA1485646, PS1485646.
<br>
</p>
</div>
</div>
Here is my Regexp
这是我的正则表达式
preg_match_all("/<div class=\"rpr-help m-chm\">(.*)<\/.*>/s", $urlcontent, $description);
Its working fine whenever I assign this complete div
to $urlcontent
variable.
每当我将这个完整的div分配给$ urlcontent变量时,它的工作正常。
But when I am fetching data from real url like $urlcontent = "www.test.com/test.html";
its returning complete webpage script.
但是当我从真实网址获取数据时,例如$ urlcontent =“www.test.com/test.html”;它返回完整的网页脚本。
How can I get inside content of <div class="rpr-help m-chm">
?
如何获取
Is there any correction require in my regexp?
我的正则表达式中是否有任何更正要求?
Any help would be appreciated. Thanks
任何帮助,将不胜感激。谢谢
2 个解决方案
#1
1
It's not possible to parse HTML/XHTML by regex. Source
通过正则表达式解析HTML / XHTML是不可能的。资源
You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML
您无法使用正则表达式解析[X] HTML。因为正则表达式无法解析HTML。正则表达式不是可用于正确解析HTML的工具
Based on the language you use, Please consider using a thirdpart library for HTML parsing.
根据您使用的语言,请考虑使用第三方库进行HTML解析。
#2
0
use this function
function GetclassContent($tagStart,$tagEnd,$content)
{
$first_step = explode( $tagStart,$content );
$second_step = explode($tagEnd,$first_step[1] );
return $second_step[0];
}
Steps to Use Above function
$website="www.test.com/test.html";
$content=file_get_contents($website);
$tagStart ='<div class="rpr-help m-chm">';
$tagEnd = "</div >";
$RequiredContent = GetclassContent($tagStart,$tagEnd,$content);
#1
1
It's not possible to parse HTML/XHTML by regex. Source
通过正则表达式解析HTML / XHTML是不可能的。资源
You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML
您无法使用正则表达式解析[X] HTML。因为正则表达式无法解析HTML。正则表达式不是可用于正确解析HTML的工具
Based on the language you use, Please consider using a thirdpart library for HTML parsing.
根据您使用的语言,请考虑使用第三方库进行HTML解析。
#2
0
use this function
function GetclassContent($tagStart,$tagEnd,$content)
{
$first_step = explode( $tagStart,$content );
$second_step = explode($tagEnd,$first_step[1] );
return $second_step[0];
}
Steps to Use Above function
$website="www.test.com/test.html";
$content=file_get_contents($website);
$tagStart ='<div class="rpr-help m-chm">';
$tagEnd = "</div >";
$RequiredContent = GetclassContent($tagStart,$tagEnd,$content);