用php获取字符串中的第一个图像

时间:2021-10-28 08:59:27

I'm trying to get the first image from each of my posts. This code below works great if I only have one image. But if I have more then one it gives me an image but not always the first.

我正试图从我的每个帖子中获取第一张图片。如果我只有一个图像,下面这段代码很有效。但如果我有一个以上的它给了我一个图像,但并不总是第一个。

I really only want the first image. A lot of times the second image is a next button

我真的只想要第一张图片。很多时候第二个图像是下一个按钮

$texthtml = 'Who is Sara Bareilles on Sing Off<br>
<img alt="Sara" title="Sara" src="475993565.jpg"/><br>
<img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>';

preg_match_all('/<img.+src=[\'"]([^\'"]+)[\'"].*>/i', $texthtml, $matches);
$first_img = $matches [1] [0];

now I can take this "$first_img" and stick it in front of the short description

现在我可以把这个“$ first_img”放在简短的描述前面

<img alt="Sara" title="Sara" src="<?php echo $first_img;?>"/>

4 个解决方案

#1


41  

If you only need the first source tag, preg_match should do instead of preg_match_all, does this work for you?

如果你只需要第一个源标签,preg_match应该而不是preg_match_all,这对你有用吗?

<?php
    $texthtml = 'Who is Sara Bareilles on Sing Off<br>
    <img alt="Sara" title="Sara" src="475993565.jpg"/><br>
    <img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>';
    preg_match('/<img.+src=[\'"](?P<src>.+?)[\'"].*>/i', $texthtml, $image);
    echo $image['src'];
?>

#2


5  

Don't use regex to parse html. Use an html-parsing lib/class, as phpquery:

不要使用正则表达式来解析html。使用html解析lib / class,如phpquery:

require 'phpQuery-onefile.php';

$texthtml = 'Who is Sara Bareilles on Sing Off<br> 
<img alt="Sarahehe" title="Saraxd" src="475993565.jpg"/><br> 
<img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>'; 
$pq = phpQuery::newDocumentHTML($texthtml);
$img = $pq->find('img:first');
$src = $img->attr('src');
echo "<img alt='foo' title='baa' src='{$src}'>";

Download: http://code.google.com/p/phpquery/

下载:http://code.google.com/p/phpquery/

#3


3  

After testing an answer from here Using regular expressions to extract the first image source from html codes? I got better results with less broken link images than the answer provided here.

从这里测试答案后使用正则表达式从html代码中提取第一个图像源?与此处提供的答案相比,我获得了更好的结果,链接图像损坏更少。

While regular expressions can be good for a large variety of tasks, I find it usually falls short when parsing HTML DOM. The problem with HTML is that the structure of your document is so variable that it is hard to accurately (and by accurately I mean 100% success rate with no false positive) extract a tag.

虽然正则表达式可以适用于各种各样的任务,但我发现在解析HTML DOM时通常会出现问题。 HTML的问题在于,文档的结构变化很大,难以准确(并且准确地说,我的意思是100%的成功率,没有误报)提取标签。

For more consistent results use this object http://simplehtmldom.sourceforge.net/ which allows you to manipulate html. An example is provided in the response in the first link I posted.

为了获得更一致的结果,请使用此对象http://simplehtmldom.sourceforge.net/,它允许您操作html。我发布的第一个链接的响应中提供了一个示例。

function get_first_image($html){
require_once('SimpleHTML.class.php')

$post_html = str_get_html($html);

$first_img = $post_html->find('img', 0);

if($first_img !== null) {
    return $first_img->src';
}

return null;
}

Enjoy

请享用

#4


1  

Are you sure the regex is always matching the first one? Try printing the array each time you call it to see:

你确定正则表达式总是匹配第一个吗?每次调用它时都尝试打印数组以查看:

error_log(var_export($matches, true));

error_log(var_export($ matches,true));

#1


41  

If you only need the first source tag, preg_match should do instead of preg_match_all, does this work for you?

如果你只需要第一个源标签,preg_match应该而不是preg_match_all,这对你有用吗?

<?php
    $texthtml = 'Who is Sara Bareilles on Sing Off<br>
    <img alt="Sara" title="Sara" src="475993565.jpg"/><br>
    <img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>';
    preg_match('/<img.+src=[\'"](?P<src>.+?)[\'"].*>/i', $texthtml, $image);
    echo $image['src'];
?>

#2


5  

Don't use regex to parse html. Use an html-parsing lib/class, as phpquery:

不要使用正则表达式来解析html。使用html解析lib / class,如phpquery:

require 'phpQuery-onefile.php';

$texthtml = 'Who is Sara Bareilles on Sing Off<br> 
<img alt="Sarahehe" title="Saraxd" src="475993565.jpg"/><br> 
<img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>'; 
$pq = phpQuery::newDocumentHTML($texthtml);
$img = $pq->find('img:first');
$src = $img->attr('src');
echo "<img alt='foo' title='baa' src='{$src}'>";

Download: http://code.google.com/p/phpquery/

下载:http://code.google.com/p/phpquery/

#3


3  

After testing an answer from here Using regular expressions to extract the first image source from html codes? I got better results with less broken link images than the answer provided here.

从这里测试答案后使用正则表达式从html代码中提取第一个图像源?与此处提供的答案相比,我获得了更好的结果,链接图像损坏更少。

While regular expressions can be good for a large variety of tasks, I find it usually falls short when parsing HTML DOM. The problem with HTML is that the structure of your document is so variable that it is hard to accurately (and by accurately I mean 100% success rate with no false positive) extract a tag.

虽然正则表达式可以适用于各种各样的任务,但我发现在解析HTML DOM时通常会出现问题。 HTML的问题在于,文档的结构变化很大,难以准确(并且准确地说,我的意思是100%的成功率,没有误报)提取标签。

For more consistent results use this object http://simplehtmldom.sourceforge.net/ which allows you to manipulate html. An example is provided in the response in the first link I posted.

为了获得更一致的结果,请使用此对象http://simplehtmldom.sourceforge.net/,它允许您操作html。我发布的第一个链接的响应中提供了一个示例。

function get_first_image($html){
require_once('SimpleHTML.class.php')

$post_html = str_get_html($html);

$first_img = $post_html->find('img', 0);

if($first_img !== null) {
    return $first_img->src';
}

return null;
}

Enjoy

请享用

#4


1  

Are you sure the regex is always matching the first one? Try printing the array each time you call it to see:

你确定正则表达式总是匹配第一个吗?每次调用它时都尝试打印数组以查看:

error_log(var_export($matches, true));

error_log(var_export($ matches,true));