省略一定数量或字符的单词边界。

I'm trying to put an ellipsis (…) to shorten long descriptions and want to have word boundaries.

我想用省略号(…)来缩短冗长的描述，并希望有单词的边界。

Here's my current code eval.in:

这是我现在的代码。

# Assume $body is a long text.
$line = $body;
if(strlen($body) > 300 && preg_match('/^.{1,300}\b/su', $body, $match)) {
    $line = trim($match[0]) . "&hellip;";
}
echo $line;

This actually works pretty well and I like it except that there are times when the word boundary has a punctuation after it.

这实际上很有效，我很喜欢它，除了有些时候边界后面有标点符号。

If I use the code above, I get results like the following:

如果我使用上面的代码，我会得到如下结果:

This is a long description… or I have punctuations,…. I would love to remove the punctuation after the last word before putting the ellipsis.

这是一个长描述……或者我有标点符号,....我想在加省略号之前去掉最后一个词后面的标点符号。

Help?

帮助吗?

2 个解决方案

#1

Here is your fixed approach:

这是你的固定方法:

$body = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam eu congue ex. Nunc sem arcu, fermentum vel feugiat quis, consequat nec enim. Quisque et pulvinar velit, et laoreet justo. Integer quis sapien ac turpis mattis lobortis at at metus. Vestibulum euismod turpis odio, id luctus quam pharetra, at, et. Sed finibus, nunc at ultricies posuere, dui mauris aliquet quam, eget aliquet ligula libero a turpis. Pellentesque eu diam sodales, sollicitudin leo et, sagittis magna. Donec feugiat, velit quis condimentum porttitor, enim sapien varius elit, sit amet pretium risus turpis vitae massa. Sed ac ligula sit amet lorem scelerisque tristique a id ex. Nullam maximus tincidunt magna, vel molestie lectus tempus non. Sed euismod placerat ultricies. Morbi dapibus augue ut odio faucibus, vel maximus nisl pharetra. Aliquam hendrerit dolor in ipsum pharetra, eget tincidunt lacus ultrices.";

$line = $body;
if(strlen($body) > 300 && preg_match('/^(.{1,300})(?!\w)\b\p{P}*/su', $body, $match)) {
    $line = trim($match[1]) . "…";
}
echo $line;

See eval.in demo

看到eval。在演示

As I noted in the comments, you can match the punctuation (optionally, with \p{P}*), but I forgot that \b can match both trailing and leading word boundary. By restricting the \b with the negative lookahead (?!\w) (like (?!\w)\b) we only match the trailing word boundary.

正如我在评论中指出的，您可以匹配标点符号(可选地，使用\p{p}*)，但是我忘记了\b可以同时匹配结尾和开头的单词边界。通过使用负的前视(?!\w)(比如(?!\w)\b)来限制\b)，我们只匹配后面的单词边界。

Besides, the capturing group ((...)) is added to the pattern so that we only capture into Group 1 the string with trailing punctuation trimmed out, and the value can be accessed with $match[1].

此外，捕获组(…)被添加到模式中，以便我们只捕获到组1中的字符串，去掉了末尾的标点符号，并且可以使用$match[1]访问该值。

#2

You can use:

您可以使用:

$body = preg_replace('/^(.{0,299}\w)\b.*/su', '$1&hellip;', $body);

Regex Demo
Regex演示
Code Demo
代码演示

\w before \b ensures we don'e add ellipsis after a non-word character

\w before \b确保我们不会在非单词字符后添加省略号

#1