使用正则表达式解析HTML代码。 (选择一个独特的班级)

时间:2022-10-29 16:30:33

I am trying to run a regex search trough a html file with php(it is my last resort). I want to select only the content class. The basic code is.

我试图通过PHP的html文件运行正则表达式搜索(这是我的最后一招)。我想只选择内容类。基本代码是。

$theData = '<h1 style="color:#4A8CF6">heading</h1><div class="Content">the content</div><h2>heading 2</h2>';
$pattern = '/^.+(<div class="Content">.+<\/div>).+$/im';
preg_match($pattern, $theData, $result);
$output = htmlentities($result[1]);
echo $output;

The Output.

输出。

<div class="Content">the content</div>

Issues:

问题:

The inline style is removed from the output.

内联样式将从输出中删除。

If I run this on my html page nothing gets returned, even if there is a Content class.
I think the issue lies in my regex pattern.

如果我在我的html页面上运行它,即使有Content类,也不会返回任何内容。我认为问题在于我的正则表达式模式。

2 个解决方案

#1


2  

As you've probably noticed, RegEx is the wrong tool for this job. I suggest using PHP's DOMDocument class.

您可能已经注意到,RegEx是这项工作的错误工具。我建议使用PHP的DOMDocument类。

$theData = '<h1 style="color:#4A8CF6">heading</h1><div class="Content">the content</div><h2>heading 2</h2>';
$dom = new DOMDocument;
$dom->loadHTML($theData);
$xpath = new DOMXPath($dom);
$div = $xpath->query('//div[contains(@class,"Content")]');
echo $div->item(0)->nodeValue;

DEMO: http://codepad.viper-7.com/Df34ve

演示:http://codepad.viper-7.com/Df34ve

#2


1  

you can use SimpleHTMLDom :

你可以使用SimpleHTMLDom:

$html = new simple_html_dom();

// Load from a string
$html->load('<h1 style="color:#4A8CF6">heading</h1><div class="Content">the content</div><h2>heading 2</h2>');

// Load a file
//$html->load_file('http://net.tutsplus.com/');

# get an element representing the second paragraph  
$element = $html->find("div[class=content]");

#access HTML attr
echo $element->innertext ;

#1


2  

As you've probably noticed, RegEx is the wrong tool for this job. I suggest using PHP's DOMDocument class.

您可能已经注意到,RegEx是这项工作的错误工具。我建议使用PHP的DOMDocument类。

$theData = '<h1 style="color:#4A8CF6">heading</h1><div class="Content">the content</div><h2>heading 2</h2>';
$dom = new DOMDocument;
$dom->loadHTML($theData);
$xpath = new DOMXPath($dom);
$div = $xpath->query('//div[contains(@class,"Content")]');
echo $div->item(0)->nodeValue;

DEMO: http://codepad.viper-7.com/Df34ve

演示:http://codepad.viper-7.com/Df34ve

#2


1  

you can use SimpleHTMLDom :

你可以使用SimpleHTMLDom:

$html = new simple_html_dom();

// Load from a string
$html->load('<h1 style="color:#4A8CF6">heading</h1><div class="Content">the content</div><h2>heading 2</h2>');

// Load a file
//$html->load_file('http://net.tutsplus.com/');

# get an element representing the second paragraph  
$element = $html->find("div[class=content]");

#access HTML attr
echo $element->innertext ;