I want to Split following strings to an array with Regex in JAVA but I don't know how to do.
我想在JAVA中使用Regex将字符串拆分为数组,但我不知道该怎么做。
string1="advmod(likes-4, also-3)" ==> advmod, likes, also
string2="nsubj(likes-4, dog24-2)" ==> bsubj, likes, dog24
string3="num(dog24-3, 8-2)" ==> num, dog24, 8
Please help me to do this work? how split the string like "num(dog24-3, 8-2)" in three tokens num, dog24 and 8 and then putting they to an string array.
请帮我做这个工作?如何在三个标记num,dog24和8中将字符串分为“num(dog24-3,8-2)”,然后将它们放入字符串数组中。
Thanks a lot.
非常感谢。
4 个解决方案
#1
2
This is generic:
这是通用的:
String string[] = {"advmod(likes-4, also-3)",// ==> advmod , likes , also
"nsubj(likes-4, dog24-2)",// ==> bsubj , likes , dog24
"num(dog24-3, 8-2)"};//==> num ,dog24 , 8
Pattern p = Pattern.compile("(\\w+)\\(([^-]+).*, ([^-]+)");
for (int i = 0; i < string.length; i++) {
Matcher m = p.matcher(string[i]);
while(m.find()) {
System.out.print(i+": ");
for(int j=1; j<= m.groupCount(); j++){
System.out.print(m.group(j));
if(j!=m.groupCount()) {
System.out.print(", ");
}
}
System.out.println("");
}
}
Hope this helps, it works for me.
希望这有帮助,它对我有用。
This is the output:
这是输出:
0: advmod, likes, also
1: nsubj, likes, dog24
2: num, dog24, 8
#2
1
For 3rd String
对于第三串
String re1="(num)"; // Word 1
String re2=".*?"; // Non-greedy match on filler
String re3="(dog24)"; // Alphanum 1
String re4=".*?"; // Non-greedy match on filler
String re5="(8)"; // Integer Number 1
Pattern p = Pattern.compile(re1+re2+re3+re4+re5,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
String word1=m.group(1);
String alphanum1=m.group(2);
String int1=m.group(3);
System.out.print("("+word1.toString()+")"+"("+alphanum1.toString()+")"+"("+int1.toString()+")"+"\n");
}
#3
1
You if you want to split, you could use this:
如果你想拆分,你可以使用这个:
str.split("\\(|-[0-9]+(?:,\\s+|\\))");
#4
0
You really haven't described your grammar, but assuming that it's something like looks like a Java method or a Prolog statement, try
你真的没有描述你的语法,但假设它看起来像Java方法或Prolog语句,试试吧
final static String TOKEN_CHARACTERS="[\w\d-]"
final Pattern p = Pattern.compile("^(" + TOKEN_CHARACTERS + "+)\((" + TOKEN_CHARACTERS + "+,\s*(" + TOKEN_CHARACTERS + ")\)$";
Then split on the -
; I presume that it really is there for some reason, and it's not clear that it's always present (if so, you can change the pattern to hard-code the single -
instead of considering it part of the token). If you allow additional space or such, adjust accordingly.
然后拆分 - ;我认为它确实存在是出于某种原因,并且不清楚它是否总是存在(如果是这样,你可以改变模式以硬编码单一 - 而不是将其视为令牌的一部分)。如果您允许额外的空间等,请相应调整。
#1
2
This is generic:
这是通用的:
String string[] = {"advmod(likes-4, also-3)",// ==> advmod , likes , also
"nsubj(likes-4, dog24-2)",// ==> bsubj , likes , dog24
"num(dog24-3, 8-2)"};//==> num ,dog24 , 8
Pattern p = Pattern.compile("(\\w+)\\(([^-]+).*, ([^-]+)");
for (int i = 0; i < string.length; i++) {
Matcher m = p.matcher(string[i]);
while(m.find()) {
System.out.print(i+": ");
for(int j=1; j<= m.groupCount(); j++){
System.out.print(m.group(j));
if(j!=m.groupCount()) {
System.out.print(", ");
}
}
System.out.println("");
}
}
Hope this helps, it works for me.
希望这有帮助,它对我有用。
This is the output:
这是输出:
0: advmod, likes, also
1: nsubj, likes, dog24
2: num, dog24, 8
#2
1
For 3rd String
对于第三串
String re1="(num)"; // Word 1
String re2=".*?"; // Non-greedy match on filler
String re3="(dog24)"; // Alphanum 1
String re4=".*?"; // Non-greedy match on filler
String re5="(8)"; // Integer Number 1
Pattern p = Pattern.compile(re1+re2+re3+re4+re5,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
String word1=m.group(1);
String alphanum1=m.group(2);
String int1=m.group(3);
System.out.print("("+word1.toString()+")"+"("+alphanum1.toString()+")"+"("+int1.toString()+")"+"\n");
}
#3
1
You if you want to split, you could use this:
如果你想拆分,你可以使用这个:
str.split("\\(|-[0-9]+(?:,\\s+|\\))");
#4
0
You really haven't described your grammar, but assuming that it's something like looks like a Java method or a Prolog statement, try
你真的没有描述你的语法,但假设它看起来像Java方法或Prolog语句,试试吧
final static String TOKEN_CHARACTERS="[\w\d-]"
final Pattern p = Pattern.compile("^(" + TOKEN_CHARACTERS + "+)\((" + TOKEN_CHARACTERS + "+,\s*(" + TOKEN_CHARACTERS + ")\)$";
Then split on the -
; I presume that it really is there for some reason, and it's not clear that it's always present (if so, you can change the pattern to hard-code the single -
instead of considering it part of the token). If you allow additional space or such, adjust accordingly.
然后拆分 - ;我认为它确实存在是出于某种原因,并且不清楚它是否总是存在(如果是这样,你可以改变模式以硬编码单一 - 而不是将其视为令牌的一部分)。如果您允许额外的空间等,请相应调整。