PHP preg_split by new line with \R

时间:2021-11-07 22:08:58

As far as I understand the following line of code should split a string at new lines (\r, \n and \r\n).

据我所知,下面一行代码应该在新行(\r, \n和\r\n)上分割一个字符串。

preg_split("%\R%", $str);

Why is it that

为什么

var_dump(preg_split("%\R%", "Å"));

outputs

输出

array(2) {
  [0]=>
  string(1) "▒"
  [1]=>
  string(0) ""
}

but

var_dump(preg_split("%(\r|\n|\r\n)%", "Å"));

works as expected and does not split the character? I know that I should use the "u" modifier (PCRE_UTF8) because the character is in UTF-8 but why does preg_split think that Å (0xC3 0x85) could contain a new line?

按预期工作,不分裂人物?我知道我应该使用“u”修饰符(PCRE_UTF8),因为字符在UTF-8中,但是为什么preg_split认为A (0xC3 0x85)可以包含一个新的行呢?

1 个解决方案

#1


3  

You have also mentioned that Å is 0xC3 0x85

您还提到了A是0xC3 0x85。

As per this PCRE documentation without using u modifier \R is equivalent of this atomic group:

根据此PCRE文档,不使用u修改器\R等同于这个原子组:

(?>\r\n|\n|\r|\f|\x0b|\x85)

Note presence of \x85 in both sets.

注意两组中都有\x85。

Hence split on \R without using u modifier gives one extra element in output array since it is able to split on \x85 giving you just \xC3 and an empty result in resulting array.

因此,在不使用u修改器的情况下在\R上拆分,会在输出数组中提供一个额外的元素,因为它可以在\x85上拆分,只会产生\xC3和结果为空的数组。

#1


3  

You have also mentioned that Å is 0xC3 0x85

您还提到了A是0xC3 0x85。

As per this PCRE documentation without using u modifier \R is equivalent of this atomic group:

根据此PCRE文档,不使用u修改器\R等同于这个原子组:

(?>\r\n|\n|\r|\f|\x0b|\x85)

Note presence of \x85 in both sets.

注意两组中都有\x85。

Hence split on \R without using u modifier gives one extra element in output array since it is able to split on \x85 giving you just \xC3 and an empty result in resulting array.

因此,在不使用u修改器的情况下在\R上拆分,会在输出数组中提供一个额外的元素,因为它可以在\x85上拆分,只会产生\xC3和结果为空的数组。