I've been looking through the Internet an after a big headache, cannon't find why this regular expression is wrong:
经过一段大问题后,我一直在寻找互联网,无法找到为什么这个正则表达式是错误的:
"\"\w*&&[\p{Punct}]\"["+sepChar+"]\"\w*&&[\p{Punct}]\""
I'm trying to read a master data file with the following pattern (quotes included):
我正在尝试使用以下模式读取主数据文件(包括引号):
"TEXTVALUE":"TEXTVALUE":"TEXTVALUE"
and split each line with the regular expression above.
并使用上面的正则表达式拆分每一行。
So, for example:
所以,例如:
"Hello:John":"Hello:World":"Hello:Mark"
will be splitted into:
将分为:
{"Hello:John", "Hello:World", "Hello:Mark"}
2 个解决方案
#1
3
The backwards slash is the escape character in Java. You need to use two backslashes \\
to include a single backslash in the regex.
反向斜杠是Java中的转义字符。你需要使用两个反斜杠\\在正则表达式中包含一个反斜杠。
Try:
尝试:
"\"\\w*&&[\\p{Punct}]\"["+sepChar+"]\"\\w*&&[\\p{Punct}]\""
#2
0
Ok.
好。
Thanks to @kevin-bowersox for the help.
感谢@ kevin-bowersox的帮助。
It seems that Oracle has done a great job improving Java with version 7. With this code:
似乎Oracle在使用版本7改进Java方面做得很好。使用以下代码:
File file = new File(someFile);
BufferedReader br = new BufferedReader(file);
String line = null;
while((line = br.readLine()) != null){
//todo
}
If your file has been formatted with a constant patern, for example:
如果您的文件已使用常量patern格式化,例如:
"TEXTVALUE":"TEXTVALUE":"TEXTVALUE"
It reads:
它写道:
"TEXTVALUE-->TEXTVALUE-->TEXTVALUE"
where '-->' stands for tabs ('\t')
其中' - >'代表制表符('\ t')
So, at the end, my solution is:
所以,最后,我的解决方案是:
public ArrayList getSplittedTextFromFile(String filePath) throws FileNotFoundException, IOException{
ArrayList<String[]> ret = null;
if (!filePath.isEmpty()){
File input = new File(filePath);
BufferedReader br = new BufferedReader(input);
String line = null;
while((line = br.readLine()) != null){
String[] aSplit = line.split("\\t");
if (ret == null)
ret = new ArrayList<>();
ret.add(aSplit);
}//while
}//fi
}//fnc
#1
3
The backwards slash is the escape character in Java. You need to use two backslashes \\
to include a single backslash in the regex.
反向斜杠是Java中的转义字符。你需要使用两个反斜杠\\在正则表达式中包含一个反斜杠。
Try:
尝试:
"\"\\w*&&[\\p{Punct}]\"["+sepChar+"]\"\\w*&&[\\p{Punct}]\""
#2
0
Ok.
好。
Thanks to @kevin-bowersox for the help.
感谢@ kevin-bowersox的帮助。
It seems that Oracle has done a great job improving Java with version 7. With this code:
似乎Oracle在使用版本7改进Java方面做得很好。使用以下代码:
File file = new File(someFile);
BufferedReader br = new BufferedReader(file);
String line = null;
while((line = br.readLine()) != null){
//todo
}
If your file has been formatted with a constant patern, for example:
如果您的文件已使用常量patern格式化,例如:
"TEXTVALUE":"TEXTVALUE":"TEXTVALUE"
It reads:
它写道:
"TEXTVALUE-->TEXTVALUE-->TEXTVALUE"
where '-->' stands for tabs ('\t')
其中' - >'代表制表符('\ t')
So, at the end, my solution is:
所以,最后,我的解决方案是:
public ArrayList getSplittedTextFromFile(String filePath) throws FileNotFoundException, IOException{
ArrayList<String[]> ret = null;
if (!filePath.isEmpty()){
File input = new File(filePath);
BufferedReader br = new BufferedReader(input);
String line = null;
while((line = br.readLine()) != null){
String[] aSplit = line.split("\\t");
if (ret == null)
ret = new ArrayList<>();
ret.add(aSplit);
}//while
}//fi
}//fnc