Is there way to keep delimiter while using php explode or other similar functions?

时间:2021-05-24 21:37:31

For example, I have an article should be splitted according to sentence boundary such as ".", "?", "!" and ":".

例如,我有一篇文章应根据句子边界分割,如“。”,“?”,“!”和“:”。

But as well all know, whether preg_split or explode function, they both remove the delimiter.

但是众所周知,无论是preg_split还是explode函数,它们都会删除分隔符。

Any help would be really appreciated!

任何帮助将非常感激!

EDIT:

I can only come up with the code below, it works great though.

我只能提出下面的代码,虽然效果很好。

$content=preg_replace('/([\.\?\!\:])/',"\\1[D]",$content);

Thank you!!! Everyone. It is only five minutes for getting 3 answers! And I must apologize for not being able to see the PHP manual carefully before asking question. Sorry.

谢谢!!!大家。得到3个答案只需五分钟!我必须道歉,因为在提问之前无法仔细查看PHP手册。抱歉。

3 个解决方案

#1


8  

preg_split with PREG_SPLIT_DELIM_CAPTURE flag

带有PREG_SPLIT_DELIM_CAPTURE标志的preg_split

Will return matches array with delimiter = 0, match = 1

将返回匹配数组,其中delimiter = 0,match = 1

#2


17  

I feel this is worth adding. You can keep the delimiter in the "after" string by using regex lookahead to split:

我觉得这值得补充。您可以使用正则表达式前瞻分割将分隔符保留在“after”字符串中:

$input = "The address is http://*.com/";
$parts = preg_split('@(?=http://)@', $input);
// $parts[1] is "http://*.com/"

And if the delimiter is of fixed length, you can keep the delimiter in the "before" part by using lookbehind:

如果分隔符具有固定长度,则可以使用lookbehind将分隔符保留在“之前”部分中:

$input = "The address is http://*.com/";
$parts = preg_split('@(?<=http://)@', $input);
// $parts[0] is "The address is http://"

This solution is simpler and cleaner in most cases.

在大多数情况下,该解决方案更简单,更清洁。

#3


15  

You can set the flag PREG_SPLIT_DELIM_CAPTURE when using preg_split and capture the delimiters too. Then you can take each pair of 2‍n and 2‍n+1 and put them back together:

使用preg_split时可以设置标志PREG_SPLIT_DELIM_CAPTURE并捕获分隔符。然后你可以把每一对2n和2n + 1放回原处:

$parts = preg_split('/([.?!:])/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
$sentences = array();
for ($i=0, $n=count($parts)-1; $i<$n; $i+=2) {
    $sentences[] = $parts[$i].$parts[$i+1];
}
if ($parts[$n] != '') {
    $sentences[] = $parts[$n];
}

Note to pack the splitting delimiter into a group, otherwise they won’t be captured.

请注意将拆分分隔符打包到一个组中,否则将不会捕获它们。

#1


8  

preg_split with PREG_SPLIT_DELIM_CAPTURE flag

带有PREG_SPLIT_DELIM_CAPTURE标志的preg_split

Will return matches array with delimiter = 0, match = 1

将返回匹配数组,其中delimiter = 0,match = 1

#2


17  

I feel this is worth adding. You can keep the delimiter in the "after" string by using regex lookahead to split:

我觉得这值得补充。您可以使用正则表达式前瞻分割将分隔符保留在“after”字符串中:

$input = "The address is http://*.com/";
$parts = preg_split('@(?=http://)@', $input);
// $parts[1] is "http://*.com/"

And if the delimiter is of fixed length, you can keep the delimiter in the "before" part by using lookbehind:

如果分隔符具有固定长度,则可以使用lookbehind将分隔符保留在“之前”部分中:

$input = "The address is http://*.com/";
$parts = preg_split('@(?<=http://)@', $input);
// $parts[0] is "The address is http://"

This solution is simpler and cleaner in most cases.

在大多数情况下,该解决方案更简单,更清洁。

#3


15  

You can set the flag PREG_SPLIT_DELIM_CAPTURE when using preg_split and capture the delimiters too. Then you can take each pair of 2‍n and 2‍n+1 and put them back together:

使用preg_split时可以设置标志PREG_SPLIT_DELIM_CAPTURE并捕获分隔符。然后你可以把每一对2n和2n + 1放回原处:

$parts = preg_split('/([.?!:])/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
$sentences = array();
for ($i=0, $n=count($parts)-1; $i<$n; $i+=2) {
    $sentences[] = $parts[$i].$parts[$i+1];
}
if ($parts[$n] != '') {
    $sentences[] = $parts[$n];
}

Note to pack the splitting delimiter into a group, otherwise they won’t be captured.

请注意将拆分分隔符打包到一个组中,否则将不会捕获它们。