I am trying to run a regex search trough a html file with php(it is my last resort). I want to select only the content class. The basic code is.
我试图通过PHP的html文件运行正则表达式搜索(这是我的最后一招)。我想只选择内容类。基本代码是。
$theData = '<h1 style="color:#4A8CF6">heading</h1><div class="Content">the content</div><h2>heading 2</h2>';
$pattern = '/^.+(<div class="Content">.+<\/div>).+$/im';
preg_match($pattern, $theData, $result);
$output = htmlentities($result[1]);
echo $output;
The Output.
输出。
<div class="Content">the content</div>
Issues:
问题:
The inline style is removed from the output.
内联样式将从输出中删除。
If I run this on my html page nothing gets returned, even if there is a Content class.
I think the issue lies in my regex pattern.
如果我在我的html页面上运行它,即使有Content类,也不会返回任何内容。我认为问题在于我的正则表达式模式。
2 个解决方案
#1
2
As you've probably noticed, RegEx is the wrong tool for this job. I suggest using PHP's DOMDocument class.
您可能已经注意到,RegEx是这项工作的错误工具。我建议使用PHP的DOMDocument类。
$theData = '<h1 style="color:#4A8CF6">heading</h1><div class="Content">the content</div><h2>heading 2</h2>';
$dom = new DOMDocument;
$dom->loadHTML($theData);
$xpath = new DOMXPath($dom);
$div = $xpath->query('//div[contains(@class,"Content")]');
echo $div->item(0)->nodeValue;
DEMO: http://codepad.viper-7.com/Df34ve
演示:http://codepad.viper-7.com/Df34ve
#2
1
you can use SimpleHTMLDom :
你可以使用SimpleHTMLDom:
$html = new simple_html_dom();
// Load from a string
$html->load('<h1 style="color:#4A8CF6">heading</h1><div class="Content">the content</div><h2>heading 2</h2>');
// Load a file
//$html->load_file('http://net.tutsplus.com/');
# get an element representing the second paragraph
$element = $html->find("div[class=content]");
#access HTML attr
echo $element->innertext ;
#1
2
As you've probably noticed, RegEx is the wrong tool for this job. I suggest using PHP's DOMDocument class.
您可能已经注意到,RegEx是这项工作的错误工具。我建议使用PHP的DOMDocument类。
$theData = '<h1 style="color:#4A8CF6">heading</h1><div class="Content">the content</div><h2>heading 2</h2>';
$dom = new DOMDocument;
$dom->loadHTML($theData);
$xpath = new DOMXPath($dom);
$div = $xpath->query('//div[contains(@class,"Content")]');
echo $div->item(0)->nodeValue;
DEMO: http://codepad.viper-7.com/Df34ve
演示:http://codepad.viper-7.com/Df34ve
#2
1
you can use SimpleHTMLDom :
你可以使用SimpleHTMLDom:
$html = new simple_html_dom();
// Load from a string
$html->load('<h1 style="color:#4A8CF6">heading</h1><div class="Content">the content</div><h2>heading 2</h2>');
// Load a file
//$html->load_file('http://net.tutsplus.com/');
# get an element representing the second paragraph
$element = $html->find("div[class=content]");
#access HTML attr
echo $element->innertext ;