This question already has an answer here:
这个问题在这里已有答案:
- Java: splitting a comma-separated string but ignoring commas in quotes 9 answers
Java:拆分以逗号分隔的字符串,但忽略引号9答案中的逗号
I'm stuck with this regex.
我坚持这个正则表达式。
So, I have input as:
所以,我输入:
- "Crane device, (physical object)"(X1,x2,x4), not "Seen by research nurse (finding)", EntirePatellaBodyStructure(X1,X8), "Besnoitia wallacei (organism)", "Catatropis (organism)"(X1,x2,x4), not IntracerebralRouteQualifierValue, "Diospyros virginiana (organism)"(X1,x2,x4), not SuturingOfHandProcedure(X1)
“起重机装置,(物理对象)”(X1,x2,x4),不是“由研究护士(发现)看见”,EntirePatellaBodyStructure(X1,X8),“Besnoitia wallacei(有机体)”,“Catatropis(有机体)”( X1,x2,x4),而不是IntracerebralRouteQualifierValue,“Diospyros virginiana(有机体)”(X1,x2,x4),而不是SuturingOfHandProcedure(X1)
and in the end I would like to get is:
最后我想得到的是:
- "Crane device, (physical object)"(X1,x2,x4)
- not "Seen by research nurse (finding)"
- EntirePatellaBodyStructure(X1,X8)
- "Besnoitia wallacei (organism)"
- "Catatropis (organism)"(X1,x2,x4)
- not IntracerebralRouteQualifierValue
- "Diospyros virginiana (organism)"(X1,x2,x4)
- not SuturingOfHandProcedure(X1)
“起重机装置,(物理对象)”(X1,x2,x4)
不是“由研究护士(发现)看到”
“Besnoitia wallacei(有机体)”
“Diospyros virginiana(有机体)”(X1,x2,x4)
I've tried regex
我试过正则表达式
(\'[^\']*\')|(\"[^\"]*\")|([^,]+)|\\s*,\\s*
It works if I don't have a comma inside parentheses.
如果我在括号内没有逗号,它可以工作。
4 个解决方案
#1
3
RegEx
(\w+\s)?("[^"]+"|\w+)(\(\w\d(,\w\d)*\))?
Java Code
String input = ... ;
Matcher m = Pattern.compile(
"(\\w+\\s)?(\"[^\"]+\"|\\w+)(\\(\\w\\d(,\\w\\d)*\\))?").matcher(input);
while(matcher.find()) {
System.out.println(matcher.group());
}
Output
"Crane device, (physical object)"(X1,x2,x4)
not "Seen by research nurse (finding)"
EntirePatellaBodyStructure(X1,X8)
not "Besnoitia wallacei (organism)"(X1,x2,x4)
not "Catatropis (organism)"(X1,x2,x4)
not IntracerebralRouteQualifierValue
not "Diospyros virginiana (organism)"(X1,x2,x4)
not SuturingOfHandProcedure(X1)
#2
1
Don't use regexes for this. Write a simple parser that keeps track of the number of parentheses encountered, and whether or not you are inside quotes. For more information, see: RegEx match open tags except XHTML self-contained tags
不要使用正则表达式。编写一个简单的解析器,跟踪遇到的括号数,以及是否在引号内。有关更多信息,请参阅:RegEx匹配除XHTML自包含标记之外的开放标记
#3
0
Would this do what you need?
这会做你需要的吗?
System.out.println(yourString.replaceAll(", not", "\nnot"));
#4
0
Assuming that there is no possibility of nesting ()
within ()
, and no possibility of (say) \"
within ""
, you can write something like:
假设在()中没有嵌套()的可能性,并且没有(比如说)“在”内“的可能性,你可以这样写:
private static final Pattern CUSTOM_SPLIT_PATTERN =
Pattern.compile("\\s*((?:\"[^\"]*\"|[(][^)]*[)]|[^\"(]+)+)");
private static final String[] customSplit(final String input) {
final List<String> ret = new ArrayList<String>();
final Matcher m = CUSTOM_SPLIT_PATTERN.matcher(input);
while(m.find()) {
ret.add(m.group(1));
}
return ret.toArray(new String[ret.size()]);
}
(disclaimer: not tested).
(免责声明:未经测试)。
#1
3
RegEx
(\w+\s)?("[^"]+"|\w+)(\(\w\d(,\w\d)*\))?
Java Code
String input = ... ;
Matcher m = Pattern.compile(
"(\\w+\\s)?(\"[^\"]+\"|\\w+)(\\(\\w\\d(,\\w\\d)*\\))?").matcher(input);
while(matcher.find()) {
System.out.println(matcher.group());
}
Output
"Crane device, (physical object)"(X1,x2,x4)
not "Seen by research nurse (finding)"
EntirePatellaBodyStructure(X1,X8)
not "Besnoitia wallacei (organism)"(X1,x2,x4)
not "Catatropis (organism)"(X1,x2,x4)
not IntracerebralRouteQualifierValue
not "Diospyros virginiana (organism)"(X1,x2,x4)
not SuturingOfHandProcedure(X1)
#2
1
Don't use regexes for this. Write a simple parser that keeps track of the number of parentheses encountered, and whether or not you are inside quotes. For more information, see: RegEx match open tags except XHTML self-contained tags
不要使用正则表达式。编写一个简单的解析器,跟踪遇到的括号数,以及是否在引号内。有关更多信息,请参阅:RegEx匹配除XHTML自包含标记之外的开放标记
#3
0
Would this do what you need?
这会做你需要的吗?
System.out.println(yourString.replaceAll(", not", "\nnot"));
#4
0
Assuming that there is no possibility of nesting ()
within ()
, and no possibility of (say) \"
within ""
, you can write something like:
假设在()中没有嵌套()的可能性,并且没有(比如说)“在”内“的可能性,你可以这样写:
private static final Pattern CUSTOM_SPLIT_PATTERN =
Pattern.compile("\\s*((?:\"[^\"]*\"|[(][^)]*[)]|[^\"(]+)+)");
private static final String[] customSplit(final String input) {
final List<String> ret = new ArrayList<String>();
final Matcher m = CUSTOM_SPLIT_PATTERN.matcher(input);
while(m.find()) {
ret.add(m.group(1));
}
return ret.toArray(new String[ret.size()]);
}
(disclaimer: not tested).
(免责声明:未经测试)。