php - 由未知的正则表达式拆分

时间:2022-02-11 21:40:37

I need to split a string by seperators that are known to me and also unknown. For example i know i want to split the string by "\n" and "," and "." but also 1 sperator that can be user defined: for example it can be ";" or "hello" or pretty much anything.

我需要用我知道但也不为人知的分隔符来分割字符串。例如,我知道我想用“\ n”和“,”和“。”分割字符串。还有1个可以由用户定义的sperator:例如它可以是“;”或“你好”或几乎任何东西。

I tried this:

我试过这个:

"[\n|,|.|".$exp."]"

...but that didnt work as expected. As i understand | means or. So this reg exp should say that split by "\n" or "," or "." or "hello". I think its because if i try just [hello] then it splits by every letter, not the whole word. Thats strange because if i try just [\n] then it only splits by "\n" - not by "\" or "n".

......但是没有按预期工作。据我所知|意思是。所以这个reg exp应该用“\ n”或“,”或“。”分隔。或“你好”。我认为它是因为如果我尝试[hello]那么它会按每个字母而不是整个字分开。这很奇怪,因为如果我只尝试[\ n],那么它只会被“\ n”拆分 - 而不是“\”或“n”。

Can someone please explain this to me? :)

有人可以向我解释一下吗? :)

6 个解决方案

#1


6  

When you place a bunch of characters in a character class, as in [hello], this defines a token that matches one character that is either h, e, l or o. Also, | has no meaning inside of a character class - it's just matched as a normal character.

当您在字符类中放置一堆字符时,如[hello]中所示,这定义了一个匹配一个字符h h,e,l或o的标记。另外,|在字符类中没有任何意义 - 它只是作为普通字符匹配。

The correct solution isn't to use a character class - you meant to use normal brackets:

正确的解决方案是不使用字符类 - 您打算使用普通括号:

(\n|,|\.|".$exp.")

By the way - make sure that you escape any regex metacharacters that are in $exp. Basically, the full list here needs to be escaped with backslashes: http://regular-expressions.info/reference.html There may be a helper function to do it for you.

顺便说一句 - 确保你逃避$ exp中的任何正则表达式元字符。基本上,这里的完整列表需要使用反斜杠进行转义:http://regular-expressions.info/reference.html可能有一个辅助函数来为您执行此操作。

EDIT: Since you're not using a character class, we now need to escape \ the . which is now a metacharacter meaning 'match one of anything'. Almost forgot.

编辑:因为你没有使用字符类,我们现在需要逃避\。现在这是一个元字符,意思是“匹配任何东西”。差点忘了。

#2


1  

\n is actually only one character, a new line, (the \ before the n indicates an escape sequence) so that's why it works and hello doesn't.

\ n实际上只有一个字符,一个新行,(在n之前表示一个转义序列),这就是为什么它可以工作而hello不能。

Also, keep in mind that allowing arbitrary input into a regular expression can be a security risk, depending on what your regular expression is being used for, so be very careful and make sure you sanitize your input to that regular expression.

另外,请记住,允许任意输入正则表达式可能存在安全风险,具体取决于正则表达式的用途,因此请务必小心并确保清理对该正则表达式的输入。

#3


1  

Try using this regex:

尝试使用这个正则表达式:

preg_split('#[\n,.]|'.$exp.'#', ...);

Note the single quots, to avoid \n getting replaced by the new line.

注意单个小数,以避免被新行替换。

#4


1  

Drop the [ and ] as these define a character class. \n counts as a single character in a double-quoted string. Just using the string without the character class should work as you need:

删除[和],因为这些定义了一个字符类。 \ n计为双引号字符串中的单个字符。只使用不带字符类的字符串应该可以根据需要使用:

preg_split("/\n|,|.|$exp/", $input)

#5


1  

Use preg_split()

For example:

Input:

$exp = '#';
preg_split("/[,.\n$exp]/", "0\n1,2.3#4")

Output:

Array ( [0] => 0 [1] => 1 [2] => 2 [3] => 3 [4] => 4)

#6


1  

here is a simple solution:

这是一个简单的解决方案:

"(\n|,|\.|".$exp.")"

or you can do it like:

或者你可以这样做:

"([\n,.]|".$exp.")"

#1


6  

When you place a bunch of characters in a character class, as in [hello], this defines a token that matches one character that is either h, e, l or o. Also, | has no meaning inside of a character class - it's just matched as a normal character.

当您在字符类中放置一堆字符时,如[hello]中所示,这定义了一个匹配一个字符h h,e,l或o的标记。另外,|在字符类中没有任何意义 - 它只是作为普通字符匹配。

The correct solution isn't to use a character class - you meant to use normal brackets:

正确的解决方案是不使用字符类 - 您打算使用普通括号:

(\n|,|\.|".$exp.")

By the way - make sure that you escape any regex metacharacters that are in $exp. Basically, the full list here needs to be escaped with backslashes: http://regular-expressions.info/reference.html There may be a helper function to do it for you.

顺便说一句 - 确保你逃避$ exp中的任何正则表达式元字符。基本上,这里的完整列表需要使用反斜杠进行转义:http://regular-expressions.info/reference.html可能有一个辅助函数来为您执行此操作。

EDIT: Since you're not using a character class, we now need to escape \ the . which is now a metacharacter meaning 'match one of anything'. Almost forgot.

编辑:因为你没有使用字符类,我们现在需要逃避\。现在这是一个元字符,意思是“匹配任何东西”。差点忘了。

#2


1  

\n is actually only one character, a new line, (the \ before the n indicates an escape sequence) so that's why it works and hello doesn't.

\ n实际上只有一个字符,一个新行,(在n之前表示一个转义序列),这就是为什么它可以工作而hello不能。

Also, keep in mind that allowing arbitrary input into a regular expression can be a security risk, depending on what your regular expression is being used for, so be very careful and make sure you sanitize your input to that regular expression.

另外,请记住,允许任意输入正则表达式可能存在安全风险,具体取决于正则表达式的用途,因此请务必小心并确保清理对该正则表达式的输入。

#3


1  

Try using this regex:

尝试使用这个正则表达式:

preg_split('#[\n,.]|'.$exp.'#', ...);

Note the single quots, to avoid \n getting replaced by the new line.

注意单个小数,以避免被新行替换。

#4


1  

Drop the [ and ] as these define a character class. \n counts as a single character in a double-quoted string. Just using the string without the character class should work as you need:

删除[和],因为这些定义了一个字符类。 \ n计为双引号字符串中的单个字符。只使用不带字符类的字符串应该可以根据需要使用:

preg_split("/\n|,|.|$exp/", $input)

#5


1  

Use preg_split()

For example:

Input:

$exp = '#';
preg_split("/[,.\n$exp]/", "0\n1,2.3#4")

Output:

Array ( [0] => 0 [1] => 1 [2] => 2 [3] => 3 [4] => 4)

#6


1  

here is a simple solution:

这是一个简单的解决方案:

"(\n|,|\.|".$exp.")"

or you can do it like:

或者你可以这样做:

"([\n,.]|".$exp.")"