I am trying to cut off text after 236 chars without cutting words in half and preserving html tags. This is what I am using right now:
我试图在236个字符之后切断文本,而不会将单词切成两半并保留html标签。这就是我现在正在使用的:
$shortdesc = $_helper->productAttribute($_product, $_product->getShortDescription(), 'short_description');
$lenght = 236;
echo substr($shortdesc, 0, strrpos(substr($shortdesc, 0, $lenght), " "));
While this is working in most cases, it won't respect html tags. So for example this text:
虽然这在大多数情况下都有效,但它不会尊重html标签。例如,这个文本:
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. <strong>Stet clita kasd gubergren</strong>
will get cut off with the tag still being open. Is there any way to cut off text after 236 chars but respecting html tags?
标签仍然打开时会被切断。有没有办法在236个字符之后切断文本但是尊重html标签?
7 个解决方案
#1
14
This should do it:
这应该这样做:
class Html
{
protected
$reachedLimit = false,
$totalLen = 0,
$maxLen = 25,
$toRemove = array();
public static function trim($html, $maxLen = 25)
{
$dom = new DomDocument();
if (version_compare(PHP_VERSION, '5.4.0') < 0) {
$dom->loadHTML($html);
} else {
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
}
$instance = new static();
$toRemove = $instance->walk($dom, $maxLen);
// remove any nodes that exceed limit
foreach ($toRemove as $child) {
$child->parentNode->removeChild($child);
}
// remove wrapper tags added by DD (doctype, html...)
if (version_compare(PHP_VERSION, '5.4.0') < 0) {
// http://*.com/a/6953808/1058140
$dom->removeChild($dom->firstChild);
$dom->replaceChild($dom->firstChild->firstChild->firstChild, $dom->firstChild);
return $dom->saveHTML();
}
return $dom->saveHTML();
}
protected function walk(DomNode $node, $maxLen)
{
if ($this->reachedLimit) {
$this->toRemove[] = $node;
} else {
// only text nodes should have text,
// so do the splitting here
if ($node instanceof DomText) {
$this->totalLen += $nodeLen = strlen($node->nodeValue);
// use mb_strlen / mb_substr for UTF-8 support
if ($this->totalLen > $maxLen) {
$node->nodeValue = substr($node->nodeValue, 0, $nodeLen - ($this->totalLen - $maxLen)) . '...';
$this->reachedLimit = true;
}
}
// if node has children, walk its child elements
if (isset($node->childNodes)) {
foreach ($node->childNodes as $child) {
$this->walk($child, $maxLen);
}
}
}
return $this->toRemove;
}
}
Use like: $str = Html::trim($str, 236);
使用如:$ str = Html :: trim($ str,236);
Some performance comparisons between this and cakePHP's regex solution
There's very little difference, and at very large string sizes, DomDocument is actually faster. Reliability is more important than saving a few microseconds in my opinion.
差异非常小,而且在非常大的字符串大小下,DomDocument实际上更快。在我看来,可靠性比节省几微秒更重要。
#2
13
Best solution I have come across for this is from the CakePHP framework TextHelper class
我遇到的最佳解决方案来自CakePHP框架TextHelper类
Here is the method
这是方法
/**
* Truncates text.
*
* Cuts a string to the length of $length and replaces the last characters
* with the ending if the text is longer than length.
*
* ### Options:
*
* - `ending` Will be used as Ending and appended to the trimmed string
* - `exact` If false, $text will not be cut mid-word
* - `html` If true, HTML tags would be handled correctly
*
* @param string $text String to truncate.
* @param integer $length Length of returned string, including ellipsis.
* @param array $options An array of html attributes and options.
* @return string Trimmed string.
* @access public
* @link http://book.cakephp.org/view/1469/Text#truncate-1625
*/
function truncate($text, $length = 100, $options = array()) {
$default = array(
'ending' => '...', 'exact' => true, 'html' => false
);
$options = array_merge($default, $options);
extract($options);
if ($html) {
if (mb_strlen(preg_replace('/<.*?>/', '', $text)) <= $length) {
return $text;
}
$totalLength = mb_strlen(strip_tags($ending));
$openTags = array();
$truncate = '';
preg_match_all('/(<\/?([\w+]+)[^>]*>)?([^<>]*)/', $text, $tags, PREG_SET_ORDER);
foreach ($tags as $tag) {
if (!preg_match('/img|br|input|hr|area|base|basefont|col|frame|isindex|link|meta|param/s', $tag[2])) {
if (preg_match('/<[\w]+[^>]*>/s', $tag[0])) {
array_unshift($openTags, $tag[2]);
} else if (preg_match('/<\/([\w]+)[^>]*>/s', $tag[0], $closeTag)) {
$pos = array_search($closeTag[1], $openTags);
if ($pos !== false) {
array_splice($openTags, $pos, 1);
}
}
}
$truncate .= $tag[1];
$contentLength = mb_strlen(preg_replace('/&[0-9a-z]{2,8};|&#[0-9]{1,7};|&#x[0-9a-f]{1,6};/i', ' ', $tag[3]));
if ($contentLength + $totalLength > $length) {
$left = $length - $totalLength;
$entitiesLength = 0;
if (preg_match_all('/&[0-9a-z]{2,8};|&#[0-9]{1,7};|&#x[0-9a-f]{1,6};/i', $tag[3], $entities, PREG_OFFSET_CAPTURE)) {
foreach ($entities[0] as $entity) {
if ($entity[1] + 1 - $entitiesLength <= $left) {
$left--;
$entitiesLength += mb_strlen($entity[0]);
} else {
break;
}
}
}
$truncate .= mb_substr($tag[3], 0 , $left + $entitiesLength);
break;
} else {
$truncate .= $tag[3];
$totalLength += $contentLength;
}
if ($totalLength >= $length) {
break;
}
}
} else {
if (mb_strlen($text) <= $length) {
return $text;
} else {
$truncate = mb_substr($text, 0, $length - mb_strlen($ending));
}
}
if (!$exact) {
$spacepos = mb_strrpos($truncate, ' ');
if (isset($spacepos)) {
if ($html) {
$bits = mb_substr($truncate, $spacepos);
preg_match_all('/<\/([a-z]+)>/', $bits, $droppedTags, PREG_SET_ORDER);
if (!empty($droppedTags)) {
foreach ($droppedTags as $closingTag) {
if (!in_array($closingTag[1], $openTags)) {
array_unshift($openTags, $closingTag[1]);
}
}
}
}
$truncate = mb_substr($truncate, 0, $spacepos);
}
}
$truncate .= $ending;
if ($html) {
foreach ($openTags as $tag) {
$truncate .= '</'.$tag.'>';
}
}
return $truncate;
}
Other frameworks may have similar (or different) solutions to this problem, so you could take a look at them too. My familiarity with Cake is what prompted my linking to their solution
其他框架可能对此问题有类似(或不同)的解决方案,因此您也可以查看它们。我对Cake的熟悉促使我链接到他们的解决方案
Edit:
Just tested this method in an app I'm working on with the OP's text
刚刚在我正在使用OP的文本的应用程序中测试了这个方法
<?php
echo truncate(
'Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. <strong>Stet clita kasd gubergren</strong>',
236,
array('html' => true, 'ending' => ''));
?>
Output:
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. <strong>Stet clita kasd gubegre</strong>
Notice the output stops just short of completing the last word, but includes the complete strong tags
请注意,输出仅在完成最后一个单词时停止,但包括完整的强标记
#3
1
Can I just give a thought ?
我能想一想吗?
Sample text :
示例文本 :
Lorem ipsum dolor sit amet, <i class="red">magna aliquyam erat</i>, duo dolores et ea rebum. <strong>Stet clita kasd gubergren</strong> hello
First, parse it into:
首先,将其解析为:
array(
'0' => array(
'tag' => '',
'text' => 'Lorem ipsum dolor sit amet, '
),
'1' => array(
'tag' => '<i class="red">',
'text' => 'magna aliquyam erat',
)
'2' => ......
'3' => ......
)
then cut the text one by one, and wrap each one with its tag after cut,
然后逐个剪切文本,并在切割后用标签包裹每个文本,
then join them.
然后加入他们。
#4
0
function limitStrlen($input, $length, $ellipses = true, $strip_html = true, $skip_html)
{
// strip tags, if desired
if ($strip_html || !$skip_html)
{
$input = strip_tags($input);
// no need to trim, already shorter than trim length
if (strlen($input) <= $length)
{
return $input;
}
//find last space within length
$last_space = strrpos(substr($input, 0, $length), ' ');
if($last_space !== false)
{
$trimmed_text = substr($input, 0, $last_space);
}
else
{
$trimmed_text = substr($input, 0, $length);
}
}
else
{
if (strlen(strip_tags($input)) <= $length)
{
return $input;
}
$trimmed_text = $input;
$last_space = $length + 1;
while(true)
{
$last_space = strrpos($trimmed_text, ' ');
if($last_space !== false)
{
$trimmed_text = substr($trimmed_text, 0, $last_space);
if (strlen(strip_tags($trimmed_text)) <= $length)
{
break;
}
}
else
{
$trimmed_text = substr($trimmed_text, 0, $length);
break;
}
}
// close unclosed tags.
$doc = new DOMDocument();
$doc->loadHTML($trimmed_text);
$trimmed_text = $doc->saveHTML();
}
// add ellipses (...)
if ($ellipses)
{
$trimmed_text .= '...';
}
return $trimmed_text;
}
$str = "<h1><strong><span>Lorem</span></strong> <i>ipsum</i> <p class='some-class'>dolor</p> sit amet, consetetur.</h1>";
// view the HTML
echo htmlentities(limitStrlen($str, 22, false, false, true), ENT_COMPAT, 'UTF-8');
// view the result
echo limitStrlen($str, 22, false, false, true);
Note: There may be a better way to close tags instead of using DOMDocument
. For example we can use a p tag
inside a h1 tag
and it still will work. But in this case the heading tag will close before the p tag
because theoretically it's not possible to use p tag
inside it. So, be careful for HTML's strict standards.
注意:可能有更好的方法来关闭标记而不是使用DOMDocument。例如,我们可以在h1标签内使用p标签,它仍然可以工作。但在这种情况下,标题标记将在p标记之前关闭,因为理论上它不可能在其中使用p标记。所以,要小心HTML的严格标准。
#5
0
I did in JS, hope this logic will help in PHP too..
我在JS中做过,希望这个逻辑在PHP中也有帮助..
splitText : function(content, count){
var originalContent = content;
content = content.substring(0, count);
//If there is no occurance of matches before breaking point and the hit breakes in between html tags.
if (content.lastIndexOf("<") > content.lastIndexOf(">")){
content = content.substring(0, content.lastIndexOf('<'));
count = content.length;
if(originalContent.indexOf("</", count)!=-1){
content += originalContent.substring(count, originalContent.indexOf('>', originalContent.indexOf("</", count))+1);
}else{
content += originalContent.substring(count, originalContent.indexOf('>', count)+1);
}
//If the breaking point is in between tags.
}else if(content.lastIndexOf("<") != content.lastIndexOf("</")){
content = originalContent.substring(0, originalContent.indexOf('>', count)+1);
}
return content;
},
Hope this logic helps some one..
希望这个逻辑可以帮助一些人..
#6
-1
You can take an XML approach and push elements to a string var until the length of the string exceed 236
您可以采用XML方法并将元素推送到字符串var,直到字符串的长度超过236
example code ?
示例代码?
for each node // text or tag
push to the string var
if string length > 236
break
endfor
for parsing HTML in PHP http://simplehtmldom.sourceforge.net/
在PHP中解析HTML http://simplehtmldom.sourceforge.net/
#7
-1
Here is JS solution: trim-html
这是JS解决方案:trim-html
The idea is to split HTML string in that way to have an array with elements being html tag(open or closed) or just string.
我们的想法是以这种方式拆分HTML字符串,使数组的元素为html标签(打开或关闭)或只是字符串。
var arr = html.replace(/</g, "\n<")
.replace(/>/g, ">\n")
.replace(/\n\n/g, "\n")
.replace(/^\n/g, "")
.replace(/\n$/g, "")
.split("\n");
Than we can iterate through array and count characters.
比我们可以迭代数组和计数字符。
#1
14
This should do it:
这应该这样做:
class Html
{
protected
$reachedLimit = false,
$totalLen = 0,
$maxLen = 25,
$toRemove = array();
public static function trim($html, $maxLen = 25)
{
$dom = new DomDocument();
if (version_compare(PHP_VERSION, '5.4.0') < 0) {
$dom->loadHTML($html);
} else {
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
}
$instance = new static();
$toRemove = $instance->walk($dom, $maxLen);
// remove any nodes that exceed limit
foreach ($toRemove as $child) {
$child->parentNode->removeChild($child);
}
// remove wrapper tags added by DD (doctype, html...)
if (version_compare(PHP_VERSION, '5.4.0') < 0) {
// http://*.com/a/6953808/1058140
$dom->removeChild($dom->firstChild);
$dom->replaceChild($dom->firstChild->firstChild->firstChild, $dom->firstChild);
return $dom->saveHTML();
}
return $dom->saveHTML();
}
protected function walk(DomNode $node, $maxLen)
{
if ($this->reachedLimit) {
$this->toRemove[] = $node;
} else {
// only text nodes should have text,
// so do the splitting here
if ($node instanceof DomText) {
$this->totalLen += $nodeLen = strlen($node->nodeValue);
// use mb_strlen / mb_substr for UTF-8 support
if ($this->totalLen > $maxLen) {
$node->nodeValue = substr($node->nodeValue, 0, $nodeLen - ($this->totalLen - $maxLen)) . '...';
$this->reachedLimit = true;
}
}
// if node has children, walk its child elements
if (isset($node->childNodes)) {
foreach ($node->childNodes as $child) {
$this->walk($child, $maxLen);
}
}
}
return $this->toRemove;
}
}
Use like: $str = Html::trim($str, 236);
使用如:$ str = Html :: trim($ str,236);
Some performance comparisons between this and cakePHP's regex solution
There's very little difference, and at very large string sizes, DomDocument is actually faster. Reliability is more important than saving a few microseconds in my opinion.
差异非常小,而且在非常大的字符串大小下,DomDocument实际上更快。在我看来,可靠性比节省几微秒更重要。
#2
13
Best solution I have come across for this is from the CakePHP framework TextHelper class
我遇到的最佳解决方案来自CakePHP框架TextHelper类
Here is the method
这是方法
/**
* Truncates text.
*
* Cuts a string to the length of $length and replaces the last characters
* with the ending if the text is longer than length.
*
* ### Options:
*
* - `ending` Will be used as Ending and appended to the trimmed string
* - `exact` If false, $text will not be cut mid-word
* - `html` If true, HTML tags would be handled correctly
*
* @param string $text String to truncate.
* @param integer $length Length of returned string, including ellipsis.
* @param array $options An array of html attributes and options.
* @return string Trimmed string.
* @access public
* @link http://book.cakephp.org/view/1469/Text#truncate-1625
*/
function truncate($text, $length = 100, $options = array()) {
$default = array(
'ending' => '...', 'exact' => true, 'html' => false
);
$options = array_merge($default, $options);
extract($options);
if ($html) {
if (mb_strlen(preg_replace('/<.*?>/', '', $text)) <= $length) {
return $text;
}
$totalLength = mb_strlen(strip_tags($ending));
$openTags = array();
$truncate = '';
preg_match_all('/(<\/?([\w+]+)[^>]*>)?([^<>]*)/', $text, $tags, PREG_SET_ORDER);
foreach ($tags as $tag) {
if (!preg_match('/img|br|input|hr|area|base|basefont|col|frame|isindex|link|meta|param/s', $tag[2])) {
if (preg_match('/<[\w]+[^>]*>/s', $tag[0])) {
array_unshift($openTags, $tag[2]);
} else if (preg_match('/<\/([\w]+)[^>]*>/s', $tag[0], $closeTag)) {
$pos = array_search($closeTag[1], $openTags);
if ($pos !== false) {
array_splice($openTags, $pos, 1);
}
}
}
$truncate .= $tag[1];
$contentLength = mb_strlen(preg_replace('/&[0-9a-z]{2,8};|&#[0-9]{1,7};|&#x[0-9a-f]{1,6};/i', ' ', $tag[3]));
if ($contentLength + $totalLength > $length) {
$left = $length - $totalLength;
$entitiesLength = 0;
if (preg_match_all('/&[0-9a-z]{2,8};|&#[0-9]{1,7};|&#x[0-9a-f]{1,6};/i', $tag[3], $entities, PREG_OFFSET_CAPTURE)) {
foreach ($entities[0] as $entity) {
if ($entity[1] + 1 - $entitiesLength <= $left) {
$left--;
$entitiesLength += mb_strlen($entity[0]);
} else {
break;
}
}
}
$truncate .= mb_substr($tag[3], 0 , $left + $entitiesLength);
break;
} else {
$truncate .= $tag[3];
$totalLength += $contentLength;
}
if ($totalLength >= $length) {
break;
}
}
} else {
if (mb_strlen($text) <= $length) {
return $text;
} else {
$truncate = mb_substr($text, 0, $length - mb_strlen($ending));
}
}
if (!$exact) {
$spacepos = mb_strrpos($truncate, ' ');
if (isset($spacepos)) {
if ($html) {
$bits = mb_substr($truncate, $spacepos);
preg_match_all('/<\/([a-z]+)>/', $bits, $droppedTags, PREG_SET_ORDER);
if (!empty($droppedTags)) {
foreach ($droppedTags as $closingTag) {
if (!in_array($closingTag[1], $openTags)) {
array_unshift($openTags, $closingTag[1]);
}
}
}
}
$truncate = mb_substr($truncate, 0, $spacepos);
}
}
$truncate .= $ending;
if ($html) {
foreach ($openTags as $tag) {
$truncate .= '</'.$tag.'>';
}
}
return $truncate;
}
Other frameworks may have similar (or different) solutions to this problem, so you could take a look at them too. My familiarity with Cake is what prompted my linking to their solution
其他框架可能对此问题有类似(或不同)的解决方案,因此您也可以查看它们。我对Cake的熟悉促使我链接到他们的解决方案
Edit:
Just tested this method in an app I'm working on with the OP's text
刚刚在我正在使用OP的文本的应用程序中测试了这个方法
<?php
echo truncate(
'Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. <strong>Stet clita kasd gubergren</strong>',
236,
array('html' => true, 'ending' => ''));
?>
Output:
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. <strong>Stet clita kasd gubegre</strong>
Notice the output stops just short of completing the last word, but includes the complete strong tags
请注意,输出仅在完成最后一个单词时停止,但包括完整的强标记
#3
1
Can I just give a thought ?
我能想一想吗?
Sample text :
示例文本 :
Lorem ipsum dolor sit amet, <i class="red">magna aliquyam erat</i>, duo dolores et ea rebum. <strong>Stet clita kasd gubergren</strong> hello
First, parse it into:
首先,将其解析为:
array(
'0' => array(
'tag' => '',
'text' => 'Lorem ipsum dolor sit amet, '
),
'1' => array(
'tag' => '<i class="red">',
'text' => 'magna aliquyam erat',
)
'2' => ......
'3' => ......
)
then cut the text one by one, and wrap each one with its tag after cut,
然后逐个剪切文本,并在切割后用标签包裹每个文本,
then join them.
然后加入他们。
#4
0
function limitStrlen($input, $length, $ellipses = true, $strip_html = true, $skip_html)
{
// strip tags, if desired
if ($strip_html || !$skip_html)
{
$input = strip_tags($input);
// no need to trim, already shorter than trim length
if (strlen($input) <= $length)
{
return $input;
}
//find last space within length
$last_space = strrpos(substr($input, 0, $length), ' ');
if($last_space !== false)
{
$trimmed_text = substr($input, 0, $last_space);
}
else
{
$trimmed_text = substr($input, 0, $length);
}
}
else
{
if (strlen(strip_tags($input)) <= $length)
{
return $input;
}
$trimmed_text = $input;
$last_space = $length + 1;
while(true)
{
$last_space = strrpos($trimmed_text, ' ');
if($last_space !== false)
{
$trimmed_text = substr($trimmed_text, 0, $last_space);
if (strlen(strip_tags($trimmed_text)) <= $length)
{
break;
}
}
else
{
$trimmed_text = substr($trimmed_text, 0, $length);
break;
}
}
// close unclosed tags.
$doc = new DOMDocument();
$doc->loadHTML($trimmed_text);
$trimmed_text = $doc->saveHTML();
}
// add ellipses (...)
if ($ellipses)
{
$trimmed_text .= '...';
}
return $trimmed_text;
}
$str = "<h1><strong><span>Lorem</span></strong> <i>ipsum</i> <p class='some-class'>dolor</p> sit amet, consetetur.</h1>";
// view the HTML
echo htmlentities(limitStrlen($str, 22, false, false, true), ENT_COMPAT, 'UTF-8');
// view the result
echo limitStrlen($str, 22, false, false, true);
Note: There may be a better way to close tags instead of using DOMDocument
. For example we can use a p tag
inside a h1 tag
and it still will work. But in this case the heading tag will close before the p tag
because theoretically it's not possible to use p tag
inside it. So, be careful for HTML's strict standards.
注意:可能有更好的方法来关闭标记而不是使用DOMDocument。例如,我们可以在h1标签内使用p标签,它仍然可以工作。但在这种情况下,标题标记将在p标记之前关闭,因为理论上它不可能在其中使用p标记。所以,要小心HTML的严格标准。
#5
0
I did in JS, hope this logic will help in PHP too..
我在JS中做过,希望这个逻辑在PHP中也有帮助..
splitText : function(content, count){
var originalContent = content;
content = content.substring(0, count);
//If there is no occurance of matches before breaking point and the hit breakes in between html tags.
if (content.lastIndexOf("<") > content.lastIndexOf(">")){
content = content.substring(0, content.lastIndexOf('<'));
count = content.length;
if(originalContent.indexOf("</", count)!=-1){
content += originalContent.substring(count, originalContent.indexOf('>', originalContent.indexOf("</", count))+1);
}else{
content += originalContent.substring(count, originalContent.indexOf('>', count)+1);
}
//If the breaking point is in between tags.
}else if(content.lastIndexOf("<") != content.lastIndexOf("</")){
content = originalContent.substring(0, originalContent.indexOf('>', count)+1);
}
return content;
},
Hope this logic helps some one..
希望这个逻辑可以帮助一些人..
#6
-1
You can take an XML approach and push elements to a string var until the length of the string exceed 236
您可以采用XML方法并将元素推送到字符串var,直到字符串的长度超过236
example code ?
示例代码?
for each node // text or tag
push to the string var
if string length > 236
break
endfor
for parsing HTML in PHP http://simplehtmldom.sourceforge.net/
在PHP中解析HTML http://simplehtmldom.sourceforge.net/
#7
-1
Here is JS solution: trim-html
这是JS解决方案:trim-html
The idea is to split HTML string in that way to have an array with elements being html tag(open or closed) or just string.
我们的想法是以这种方式拆分HTML字符串,使数组的元素为html标签(打开或关闭)或只是字符串。
var arr = html.replace(/</g, "\n<")
.replace(/>/g, ">\n")
.replace(/\n\n/g, "\n")
.replace(/^\n/g, "")
.replace(/\n$/g, "")
.split("\n");
Than we can iterate through array and count characters.
比我们可以迭代数组和计数字符。