Java正则表达式用逗号分割字符串但忽略引号和括号[复制]

时间:2021-08-13 03:51:33

This question already has an answer here:

这个问题在这里已有答案:

I'm stuck with this regex.

我坚持这个正则表达式。

So, I have input as:

所以,我输入:

  • "Crane device, (physical object)"(X1,x2,x4), not "Seen by research nurse (finding)", EntirePatellaBodyStructure(X1,X8), "Besnoitia wallacei (organism)", "Catatropis (organism)"(X1,x2,x4), not IntracerebralRouteQualifierValue, "Diospyros virginiana (organism)"(X1,x2,x4), not SuturingOfHandProcedure(X1)
  • “起重机装置,(物理对象)”(X1,x2,x4),不是“由研究护士(发现)看见”,EntirePatellaBodyStructure(X1,X8),“Besnoitia wallacei(有机体)”,“Catatropis(有机体)”( X1,x2,x4),而不是IntracerebralRouteQualifierValue,“Diospyros virginiana(有机体)”(X1,x2,x4),而不是SuturingOfHandProcedure(X1)

and in the end I would like to get is:

最后我想得到的是:

  • "Crane device, (physical object)"(X1,x2,x4)
  • “起重机装置,(物理对象)”(X1,x2,x4)

  • not "Seen by research nurse (finding)"
  • 不是“由研究护士(发现)看到”

  • EntirePatellaBodyStructure(X1,X8)
  • "Besnoitia wallacei (organism)"
  • “Besnoitia wallacei(有机体)”

  • "Catatropis (organism)"(X1,x2,x4)
  • not IntracerebralRouteQualifierValue
  • "Diospyros virginiana (organism)"(X1,x2,x4)
  • “Diospyros virginiana(有机体)”(X1,x2,x4)

  • not SuturingOfHandProcedure(X1)

I've tried regex

我试过正则表达式

(\'[^\']*\')|(\"[^\"]*\")|([^,]+)|\\s*,\\s*

It works if I don't have a comma inside parentheses.

如果我在括号内没有逗号,它可以工作。

4 个解决方案

#1


3  

RegEx

(\w+\s)?("[^"]+"|\w+)(\(\w\d(,\w\d)*\))?

Java Code

String input = ... ;
Matcher m = Pattern.compile(
          "(\\w+\\s)?(\"[^\"]+\"|\\w+)(\\(\\w\\d(,\\w\\d)*\\))?").matcher(input);
while(matcher.find()) {
    System.out.println(matcher.group());
}

Output

"Crane device, (physical object)"(X1,x2,x4)
not "Seen by research nurse (finding)"
EntirePatellaBodyStructure(X1,X8)
not "Besnoitia wallacei (organism)"(X1,x2,x4)
not "Catatropis (organism)"(X1,x2,x4)
not IntracerebralRouteQualifierValue
not "Diospyros virginiana (organism)"(X1,x2,x4)
not SuturingOfHandProcedure(X1)

#2


1  

Don't use regexes for this. Write a simple parser that keeps track of the number of parentheses encountered, and whether or not you are inside quotes. For more information, see: RegEx match open tags except XHTML self-contained tags

不要使用正则表达式。编写一个简单的解析器,跟踪遇到的括号数,以及是否在引号内。有关更多信息,请参阅:RegEx匹配除XHTML自包含标记之外的开放标记

#3


0  

Would this do what you need?

这会做你需要的吗?

System.out.println(yourString.replaceAll(", not", "\nnot"));

#4


0  

Assuming that there is no possibility of nesting () within (), and no possibility of (say) \" within "", you can write something like:

假设在()中没有嵌套()的可能性,并且没有(比如说)“在”内“的可能性,你可以这样写:

private static final Pattern CUSTOM_SPLIT_PATTERN =
    Pattern.compile("\\s*((?:\"[^\"]*\"|[(][^)]*[)]|[^\"(]+)+)");
private static final String[] customSplit(final String input) {
    final List<String> ret = new ArrayList<String>();
    final Matcher m = CUSTOM_SPLIT_PATTERN.matcher(input);
    while(m.find()) {
        ret.add(m.group(1));
    }
    return ret.toArray(new String[ret.size()]);
}

(disclaimer: not tested).

(免责声明:未经测试)。

#1


3  

RegEx

(\w+\s)?("[^"]+"|\w+)(\(\w\d(,\w\d)*\))?

Java Code

String input = ... ;
Matcher m = Pattern.compile(
          "(\\w+\\s)?(\"[^\"]+\"|\\w+)(\\(\\w\\d(,\\w\\d)*\\))?").matcher(input);
while(matcher.find()) {
    System.out.println(matcher.group());
}

Output

"Crane device, (physical object)"(X1,x2,x4)
not "Seen by research nurse (finding)"
EntirePatellaBodyStructure(X1,X8)
not "Besnoitia wallacei (organism)"(X1,x2,x4)
not "Catatropis (organism)"(X1,x2,x4)
not IntracerebralRouteQualifierValue
not "Diospyros virginiana (organism)"(X1,x2,x4)
not SuturingOfHandProcedure(X1)

#2


1  

Don't use regexes for this. Write a simple parser that keeps track of the number of parentheses encountered, and whether or not you are inside quotes. For more information, see: RegEx match open tags except XHTML self-contained tags

不要使用正则表达式。编写一个简单的解析器,跟踪遇到的括号数,以及是否在引号内。有关更多信息,请参阅:RegEx匹配除XHTML自包含标记之外的开放标记

#3


0  

Would this do what you need?

这会做你需要的吗?

System.out.println(yourString.replaceAll(", not", "\nnot"));

#4


0  

Assuming that there is no possibility of nesting () within (), and no possibility of (say) \" within "", you can write something like:

假设在()中没有嵌套()的可能性,并且没有(比如说)“在”内“的可能性,你可以这样写:

private static final Pattern CUSTOM_SPLIT_PATTERN =
    Pattern.compile("\\s*((?:\"[^\"]*\"|[(][^)]*[)]|[^\"(]+)+)");
private static final String[] customSplit(final String input) {
    final List<String> ret = new ArrayList<String>();
    final Matcher m = CUSTOM_SPLIT_PATTERN.matcher(input);
    while(m.find()) {
        ret.add(m.group(1));
    }
    return ret.toArray(new String[ret.size()]);
}

(disclaimer: not tested).

(免责声明:未经测试)。