For example, I have an article should be splitted according to sentence boundary such as ".
", "?
", "!
" and ":
".
例如,我有一篇文章应根据句子边界分割,如“。”,“?”,“!”和“:”。
But as well all know, whether preg_split
or explode
function, they both remove the delimiter.
但是众所周知,无论是preg_split还是explode函数,它们都会删除分隔符。
Any help would be really appreciated!
任何帮助将非常感激!
EDIT:
I can only come up with the code below, it works great though.
我只能提出下面的代码,虽然效果很好。
$content=preg_replace('/([\.\?\!\:])/',"\\1[D]",$content);
Thank you!!! Everyone. It is only five minutes for getting 3 answers! And I must apologize for not being able to see the PHP manual carefully before asking question. Sorry.
谢谢!!!大家。得到3个答案只需五分钟!我必须道歉,因为在提问之前无法仔细查看PHP手册。抱歉。
3 个解决方案
#1
8
preg_split
with PREG_SPLIT_DELIM_CAPTURE
flag
带有PREG_SPLIT_DELIM_CAPTURE标志的preg_split
Will return matches array with delimiter = 0
, match = 1
将返回匹配数组,其中delimiter = 0,match = 1
#2
17
I feel this is worth adding. You can keep the delimiter in the "after" string by using regex lookahead to split:
我觉得这值得补充。您可以使用正则表达式前瞻分割将分隔符保留在“after”字符串中:
$input = "The address is http://*.com/";
$parts = preg_split('@(?=http://)@', $input);
// $parts[1] is "http://*.com/"
And if the delimiter is of fixed length, you can keep the delimiter in the "before" part by using lookbehind:
如果分隔符具有固定长度,则可以使用lookbehind将分隔符保留在“之前”部分中:
$input = "The address is http://*.com/";
$parts = preg_split('@(?<=http://)@', $input);
// $parts[0] is "The address is http://"
This solution is simpler and cleaner in most cases.
在大多数情况下,该解决方案更简单,更清洁。
#3
15
You can set the flag PREG_SPLIT_DELIM_CAPTURE when using preg_split
and capture the delimiters too. Then you can take each pair of 2n and 2n+1 and put them back together:
使用preg_split时可以设置标志PREG_SPLIT_DELIM_CAPTURE并捕获分隔符。然后你可以把每一对2n和2n + 1放回原处:
$parts = preg_split('/([.?!:])/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
$sentences = array();
for ($i=0, $n=count($parts)-1; $i<$n; $i+=2) {
$sentences[] = $parts[$i].$parts[$i+1];
}
if ($parts[$n] != '') {
$sentences[] = $parts[$n];
}
Note to pack the splitting delimiter into a group, otherwise they won’t be captured.
请注意将拆分分隔符打包到一个组中,否则将不会捕获它们。
#1
8
preg_split
with PREG_SPLIT_DELIM_CAPTURE
flag
带有PREG_SPLIT_DELIM_CAPTURE标志的preg_split
Will return matches array with delimiter = 0
, match = 1
将返回匹配数组,其中delimiter = 0,match = 1
#2
17
I feel this is worth adding. You can keep the delimiter in the "after" string by using regex lookahead to split:
我觉得这值得补充。您可以使用正则表达式前瞻分割将分隔符保留在“after”字符串中:
$input = "The address is http://*.com/";
$parts = preg_split('@(?=http://)@', $input);
// $parts[1] is "http://*.com/"
And if the delimiter is of fixed length, you can keep the delimiter in the "before" part by using lookbehind:
如果分隔符具有固定长度,则可以使用lookbehind将分隔符保留在“之前”部分中:
$input = "The address is http://*.com/";
$parts = preg_split('@(?<=http://)@', $input);
// $parts[0] is "The address is http://"
This solution is simpler and cleaner in most cases.
在大多数情况下,该解决方案更简单,更清洁。
#3
15
You can set the flag PREG_SPLIT_DELIM_CAPTURE when using preg_split
and capture the delimiters too. Then you can take each pair of 2n and 2n+1 and put them back together:
使用preg_split时可以设置标志PREG_SPLIT_DELIM_CAPTURE并捕获分隔符。然后你可以把每一对2n和2n + 1放回原处:
$parts = preg_split('/([.?!:])/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
$sentences = array();
for ($i=0, $n=count($parts)-1; $i<$n; $i+=2) {
$sentences[] = $parts[$i].$parts[$i+1];
}
if ($parts[$n] != '') {
$sentences[] = $parts[$n];
}
Note to pack the splitting delimiter into a group, otherwise they won’t be captured.
请注意将拆分分隔符打包到一个组中,否则将不会捕获它们。