php - 由未知的正则表达式拆分

时间:2022-02-11 21:40:37

I need to split a string by seperators that are known to me and also unknown. For example i know i want to split the string by "\n" and "," and "." but also 1 sperator that can be user defined: for example it can be ";" or "hello" or pretty much anything.

我需要用我知道但也不为人知的分隔符来分割字符串。例如,我知道我想用“\ n”和“,”和“。”分割字符串。还有1个可以由用户定义的sperator:例如它可以是“;”或“你好”或几乎任何东西。

I tried this:



...but that didnt work as expected. As i understand | means or. So this reg exp should say that split by "\n" or "," or "." or "hello". I think its because if i try just [hello] then it splits by every letter, not the whole word. Thats strange because if i try just [\n] then it only splits by "\n" - not by "\" or "n".

......但是没有按预期工作。据我所知|意思是。所以这个reg exp应该用“\ n”或“,”或“。”分隔。或“你好”。我认为它是因为如果我尝试[hello]那么它会按每个字母而不是整个字分开。这很奇怪,因为如果我只尝试[\ n],那么它只会被“\ n”拆分 - 而不是“\”或“n”。

Can someone please explain this to me? :)

有人可以向我解释一下吗? :)

6 个解决方案



When you place a bunch of characters in a character class, as in [hello], this defines a token that matches one character that is either h, e, l or o. Also, | has no meaning inside of a character class - it's just matched as a normal character.

当您在字符类中放置一堆字符时,如[hello]中所示,这定义了一个匹配一个字符h h,e,l或o的标记。另外,|在字符类中没有任何意义 - 它只是作为普通字符匹配。

The correct solution isn't to use a character class - you meant to use normal brackets:

正确的解决方案是不使用字符类 - 您打算使用普通括号:


By the way - make sure that you escape any regex metacharacters that are in $exp. Basically, the full list here needs to be escaped with backslashes: There may be a helper function to do it for you.

顺便说一句 - 确保你逃避$ exp中的任何正则表达式元字符。基本上,这里的完整列表需要使用反斜杠进行转义:可能有一个辅助函数来为您执行此操作。

EDIT: Since you're not using a character class, we now need to escape \ the . which is now a metacharacter meaning 'match one of anything'. Almost forgot.




\n is actually only one character, a new line, (the \ before the n indicates an escape sequence) so that's why it works and hello doesn't.

\ n实际上只有一个字符,一个新行,(在n之前表示一个转义序列),这就是为什么它可以工作而hello不能。

Also, keep in mind that allowing arbitrary input into a regular expression can be a security risk, depending on what your regular expression is being used for, so be very careful and make sure you sanitize your input to that regular expression.




Try using this regex:


preg_split('#[\n,.]|'.$exp.'#', ...);

Note the single quots, to avoid \n getting replaced by the new line.




Drop the [ and ] as these define a character class. \n counts as a single character in a double-quoted string. Just using the string without the character class should work as you need:

删除[和],因为这些定义了一个字符类。 \ n计为双引号字符串中的单个字符。只使用不带字符类的字符串应该可以根据需要使用:

preg_split("/\n|,|.|$exp/", $input)



Use preg_split()

For example:


$exp = '#';
preg_split("/[,.\n$exp]/", "0\n1,2.3#4")


Array ( [0] => 0 [1] => 1 [2] => 2 [3] => 3 [4] => 4)



here is a simple solution:



or you can do it like:





When you place a bunch of characters in a character class, as in [hello], this defines a token that matches one character that is either h, e, l or o. Also, | has no meaning inside of a character class - it's just matched as a normal character.

当您在字符类中放置一堆字符时,如[hello]中所示,这定义了一个匹配一个字符h h,e,l或o的标记。另外,|在字符类中没有任何意义 - 它只是作为普通字符匹配。

The correct solution isn't to use a character class - you meant to use normal brackets:

正确的解决方案是不使用字符类 - 您打算使用普通括号:


By the way - make sure that you escape any regex metacharacters that are in $exp. Basically, the full list here needs to be escaped with backslashes: There may be a helper function to do it for you.

顺便说一句 - 确保你逃避$ exp中的任何正则表达式元字符。基本上,这里的完整列表需要使用反斜杠进行转义:可能有一个辅助函数来为您执行此操作。

EDIT: Since you're not using a character class, we now need to escape \ the . which is now a metacharacter meaning 'match one of anything'. Almost forgot.




\n is actually only one character, a new line, (the \ before the n indicates an escape sequence) so that's why it works and hello doesn't.

\ n实际上只有一个字符,一个新行,(在n之前表示一个转义序列),这就是为什么它可以工作而hello不能。

Also, keep in mind that allowing arbitrary input into a regular expression can be a security risk, depending on what your regular expression is being used for, so be very careful and make sure you sanitize your input to that regular expression.




Try using this regex:


preg_split('#[\n,.]|'.$exp.'#', ...);

Note the single quots, to avoid \n getting replaced by the new line.




Drop the [ and ] as these define a character class. \n counts as a single character in a double-quoted string. Just using the string without the character class should work as you need:

删除[和],因为这些定义了一个字符类。 \ n计为双引号字符串中的单个字符。只使用不带字符类的字符串应该可以根据需要使用:

preg_split("/\n|,|.|$exp/", $input)



Use preg_split()

For example:


$exp = '#';
preg_split("/[,.\n$exp]/", "0\n1,2.3#4")


Array ( [0] => 0 [1] => 1 [2] => 2 [3] => 3 [4] => 4)



here is a simple solution:



or you can do it like:

