I'm actually developping a parser and I'm stuck on a method.
我实际上正在开发一个解析器而且我坚持使用一种方法。
I need to clean specifics words in some sentences, meaning replacing those by a whitespace or a null
character. For now, I came up with this code:
我需要在一些句子中清除特定的单词,意味着用空格或空字符替换它们。现在,我想出了这段代码:
private void clean(String sentence)
{
try {
FileInputStream fis = new FileInputStream(
ConfigHandler.getDefault(DictionaryType.CLEANING).getDictionaryFile());
BufferedReader bis = new BufferedReader(new InputStreamReader(fis));
String read;
List<String> wordList = new ArrayList<String>();
while ((read = bis.readLine()) != null) {
wordList.add(read);
}
}
catch (IOException e) {
e.printStackTrace();
}
for (String s : wordList) {
if (StringUtils.containsIgnoreCase(sentence, s)) { // this comes from Apache Lang
sentence = sentence.replaceAll("(?i)" + s + "\\b", " ");
}
}
cleanedList.add(sentence);
}
But when I look at the output, I got all of the occurences of the word to be replaced in my sentence
replaced by a whitespace.
但是当我查看输出时,我得到了在我的句子中替换为空格的单词的所有出现。
Does anybody can help me out on replacing only the exact words to be replaced on my sentence?
有没有人能帮助我更换我的句子中要替换的确切单词?
Thanks in advance !
提前致谢 !
1 个解决方案
#1
2
There are two problems in your code:
您的代码中存在两个问题:
- You are missing the
\b
before the string - You will run into issues if any of the words from the file has special characters
你错过了字符串之前的\ b
如果文件中的任何单词都有特殊字符,您将遇到问题
To fix this problem construct your regex as follows:
要解决此问题,请构建正则表达式,如下所示:
sentence = sentence.replaceAll("(?i)\\b\\Q" + s + "\\E\\b", " ");
or
sentence = sentence.replaceAll("(?i)\\b" + Pattern.quote(s) + "\\b", " ");
#1
2
There are two problems in your code:
您的代码中存在两个问题:
- You are missing the
\b
before the string - You will run into issues if any of the words from the file has special characters
你错过了字符串之前的\ b
如果文件中的任何单词都有特殊字符,您将遇到问题
To fix this problem construct your regex as follows:
要解决此问题,请构建正则表达式,如下所示:
sentence = sentence.replaceAll("(?i)\\b\\Q" + s + "\\E\\b", " ");
or
sentence = sentence.replaceAll("(?i)\\b" + Pattern.quote(s) + "\\b", " ");