如何用java regex拆分字符串后面的东西?

时间:2022-09-29 21:40:37

I read this string from file:

我从文件中读取这个字符串:

abc | abc (abc\|abc)|def

abc | abc(abc \ | abc)| def

I want to get array inludes 3 items:

我想得到数组包括3项:

  1. abc
  2. abc (abc\|abc)
  3. def

How to write regex correctly? line.split("(?!<=\\)\\|") doesn't work.

如何正确编写正则表达式? line.split(“(?!<= \\)\\ |”)不起作用。

3 个解决方案

#1


2  

Code:

public class __QuickTester {

    public static void main (String [] args) {

        String test = "abc|abc (abc\\|abc)|def|banana\\|apple|orange";

        // \\\\ becomes \\ <-- String
        // \\ becomes \ <-- In Regex
        String[] result = test.split("(?<!\\\\)\\|");

        for(String part : result) {
            System.out.println(part);
        }
    }
}

Output:

abc
abc (abc\|abc)
def
banana\|apple
orange


Note: You need \\\\ (4 backslashes) to get \\ (2 backslashes) as a String, and then \\ (2 backslashes) becomes a single \ in Regex.

注意:您需要\\\\(4个反斜杠)来获得\\(2个反斜杠)作为字符串,然后\\(2个反斜杠)在Regex中成为单个\。

#2


0  

try this regex: ([\w()]|(\\|))+

试试这个正则表达式:([\ w()] |(\\ |))+

#3


0  

Main problem in your approach is that \ is special in regex, but also in String. So to create \ literal you need to escape it twice:

你的方法中的主要问题是\在regex中是特殊的,但在String中也是如此。因此,要创建\ literal,您需要将其转义两次:

  • in regex \\
  • 在正则表达式\\

  • in String "\\\\".
  • 在字符串“\\\\”中。

so you would need to write it as split("(?<!\\\\)\\|")

所以你需要把它写成split(“(?

But there are also possible problems with this approach since splitting on | which is simple preceded by \ can be error-prone. Because you are using \ as special character to create \ literal you probably need to write it as \\, for instance to create c:\foo\bar\ you probably need to write it in your text as c:\\foo\\bar\\.

但是这种方法也存在问题,因为拆分|在\之前很简单,可能容易出错。因为您使用\作为特殊字符来创建\ literal,所以您可能需要将其写为\\,例如创建c:\ foo \ bar \您可能需要在文本中将其写为c:\\ foo \\酒吧\\。

So in that case lets say that you want to split text like

所以在这种情况下,我们可以说你要分割文本

abc|foo\|c:\\bar\\|cde

I assume that you want to split only in this places

我假设你只想在这个地方分开

abc|foo\|c:\\bar\\|cde
   ^              ^

because

  • in abc|foo pipe | have no \ before it,
  • 在abc | foo管道中|在它之前没有\

  • in bar\\|cde despite pipe having \ before it, we know that this \ wasn't used to escape |, but to generate text representing \ literal (so generally | which have non or even number of \ characters are OK to split on).
  • 在bar \\ | cde中,尽管管道在其之前有\,我们知道这个\不用于转义|,而是生成代表\ literal的文本(所以通常|具有非偶数或偶数个\字符可以拆分上)。

But split(onEachPipeWhichHaveBackslashBeforeIt) like split("(?<!\\\\)\\|") you will not split between bar\\|cde because there is \ before | which will prevent such split.

但是拆分(onEachPipeWhichHaveBackslashBeforeIt)就像拆分(“(?

To solve this problem you could check if there are odd number of \ before |, but this is hard to do in Java since look-behind needs to have limited width.

要解决这个问题,您可以检查是否有奇数个\ before |,但这在Java中很难做到,因为后视需要有限的宽度。

Possible solution would be split("(?<!(?<!\\\\)((\\\\){2}){0,1000}\\\\)\\|") and assumption that string will never contain more than 1000 continuous \ characters, but it seems like overkill.

可能的解决方案将被拆分(“(?

IMO better solution would be searching for strings you want to find, ninstead of searching for strings you want to split on. And strings you want to find are

IMO更好的解决方案是搜索您想要查找的字符串,而不是搜索您想要拆分的字符串。你想要找到的字符串是

  • all characters except |
  • 除|以外的所有字符

  • all characters which are preceded by \ (including | since \ will simply escape it).
  • 所有以\开头的字符(包括|因为\只会将其转义)。

So our regex could look like (\\\\.|[^|])+ (I placed \\\\. at start to prevent [^|] consuming \ which will be used to escape other characters).

所以我们的正则表达式看起来像(\\\\。| [^ |])+(我在开始时放置\\\\。以防止[^ |]消耗\将用于转义其他字符)。

Example:

Pattern p = Pattern.compile("(\\\\.|[^|])+");
Matcher m = p.matcher(text);
while (m.find()){
    System.out.println(m.group());
}

Output:

abc
foo\|c:\\bar\\
cde

#1


2  

Code:

public class __QuickTester {

    public static void main (String [] args) {

        String test = "abc|abc (abc\\|abc)|def|banana\\|apple|orange";

        // \\\\ becomes \\ <-- String
        // \\ becomes \ <-- In Regex
        String[] result = test.split("(?<!\\\\)\\|");

        for(String part : result) {
            System.out.println(part);
        }
    }
}

Output:

abc
abc (abc\|abc)
def
banana\|apple
orange


Note: You need \\\\ (4 backslashes) to get \\ (2 backslashes) as a String, and then \\ (2 backslashes) becomes a single \ in Regex.

注意:您需要\\\\(4个反斜杠)来获得\\(2个反斜杠)作为字符串,然后\\(2个反斜杠)在Regex中成为单个\。

#2


0  

try this regex: ([\w()]|(\\|))+

试试这个正则表达式:([\ w()] |(\\ |))+

#3


0  

Main problem in your approach is that \ is special in regex, but also in String. So to create \ literal you need to escape it twice:

你的方法中的主要问题是\在regex中是特殊的,但在String中也是如此。因此,要创建\ literal,您需要将其转义两次:

  • in regex \\
  • 在正则表达式\\

  • in String "\\\\".
  • 在字符串“\\\\”中。

so you would need to write it as split("(?<!\\\\)\\|")

所以你需要把它写成split(“(?

But there are also possible problems with this approach since splitting on | which is simple preceded by \ can be error-prone. Because you are using \ as special character to create \ literal you probably need to write it as \\, for instance to create c:\foo\bar\ you probably need to write it in your text as c:\\foo\\bar\\.

但是这种方法也存在问题,因为拆分|在\之前很简单,可能容易出错。因为您使用\作为特殊字符来创建\ literal,所以您可能需要将其写为\\,例如创建c:\ foo \ bar \您可能需要在文本中将其写为c:\\ foo \\酒吧\\。

So in that case lets say that you want to split text like

所以在这种情况下,我们可以说你要分割文本

abc|foo\|c:\\bar\\|cde

I assume that you want to split only in this places

我假设你只想在这个地方分开

abc|foo\|c:\\bar\\|cde
   ^              ^

because

  • in abc|foo pipe | have no \ before it,
  • 在abc | foo管道中|在它之前没有\

  • in bar\\|cde despite pipe having \ before it, we know that this \ wasn't used to escape |, but to generate text representing \ literal (so generally | which have non or even number of \ characters are OK to split on).
  • 在bar \\ | cde中,尽管管道在其之前有\,我们知道这个\不用于转义|,而是生成代表\ literal的文本(所以通常|具有非偶数或偶数个\字符可以拆分上)。

But split(onEachPipeWhichHaveBackslashBeforeIt) like split("(?<!\\\\)\\|") you will not split between bar\\|cde because there is \ before | which will prevent such split.

但是拆分(onEachPipeWhichHaveBackslashBeforeIt)就像拆分(“(?

To solve this problem you could check if there are odd number of \ before |, but this is hard to do in Java since look-behind needs to have limited width.

要解决这个问题,您可以检查是否有奇数个\ before |,但这在Java中很难做到,因为后视需要有限的宽度。

Possible solution would be split("(?<!(?<!\\\\)((\\\\){2}){0,1000}\\\\)\\|") and assumption that string will never contain more than 1000 continuous \ characters, but it seems like overkill.

可能的解决方案将被拆分(“(?

IMO better solution would be searching for strings you want to find, ninstead of searching for strings you want to split on. And strings you want to find are

IMO更好的解决方案是搜索您想要查找的字符串,而不是搜索您想要拆分的字符串。你想要找到的字符串是

  • all characters except |
  • 除|以外的所有字符

  • all characters which are preceded by \ (including | since \ will simply escape it).
  • 所有以\开头的字符(包括|因为\只会将其转义)。

So our regex could look like (\\\\.|[^|])+ (I placed \\\\. at start to prevent [^|] consuming \ which will be used to escape other characters).

所以我们的正则表达式看起来像(\\\\。| [^ |])+(我在开始时放置\\\\。以防止[^ |]消耗\将用于转义其他字符)。

Example:

Pattern p = Pattern.compile("(\\\\.|[^|])+");
Matcher m = p.matcher(text);
while (m.find()){
    System.out.println(m.group());
}

Output:

abc
foo\|c:\\bar\\
cde