I need to create a regex through which i can find all the sentences containing a specific word/regex.
我需要创建一个正则表达式,通过它我可以找到包含特定单词/正则表达式的所有句子。
For eg. if i have the following text
例如。如果我有以下文字
Harrison Ford is working on a new Film. The film is yet to be released
哈里森福特正在制作一部新电影。这部电影尚未发行
The film has a gud star cast. Most paid actor is Harrison Ford in the film.
这部电影有一个gud星演员。最赚钱的演员是电影中的哈里森福特。
Here if i want to get all the sentences where I can find the word Harrison, How should i go about it. The regex should return the following selections
在这里,如果我想得到所有句子,我可以找到哈里森这个词,我该怎么办呢。正则表达式应返回以下选择
- Harrison Ford is working on a new Film.
- Most paid actor is Harrison Ford in the film.
哈里森福特正在制作一部新电影。
最赚钱的演员是电影中的哈里森福特。
The sentence beginning and ending can be marked by a new line character, or a full stop or if it is the first line in the paragraph.
句子的开头和结尾可以用新行字符标记,也可以用句号标记,或者如果它是段落中的第一行。
I used the following regex
我使用了以下正则表达式
.*?((\n|.|^\\s*).*?\\b(Harrison)\\b.*?[.\n]).*
But i am unable to get the splitting of the lines. I get the sentence from the start till the first Harrison Ford.
但我无法分割线条。从开始到第一个哈里森福特,我得到的判决。
Please let me know of any suggestions that any of you may have
如果您有任何建议,请告知我们
3 个解决方案
#1
1
If you can guarantee that a sentence and only a sentence ends with a new line character or a full stop then I suggest you first split the text and then search each line:
如果你可以保证一个句子和一个句子以一个新的行字符或句号结尾,那么我建议你首先拆分文本,然后搜索每一行:
String[] sentences = text.split("\\.|\\R+");
for (String se : sentences) {
if (se.indexOf("Harrison") != -1)
System.out.println(se.trim());
}
Output:
Harrison Ford is working on a new Film
Most paid actor is Harrison Ford in the film
#2
1
For Java, the following code should do the trick
对于Java,以下代码应该可以解决问题
String data = "Harrison Ford is working on a new Film\n The film is yet to be released. "
+ "The film has a gud star cast. "
+ "Most paid actor is Harrison Ford in the film.";
String tmpData = data.replace('\n', '.');
Pattern myPattern = Pattern.compile("([\\w|\\s]*Harrison[\\w|\\s]*)[\\.]");
Matcher m = myPattern.matcher(tmpData);
while(m.find()) {
System.out.println("Result: " + m.group(1));
}
#3
0
You should use the global flag to match all occurences in a string. Then use this regex to find all sentences containing "Harrison":
您应该使用全局标志来匹配字符串中的所有出现。然后使用此正则表达式查找包含“Harrison”的所有句子:
(?:[\w][^.]+)?Harrison[^.]+
See a demo here.
在这里看一个演示。
#1
1
If you can guarantee that a sentence and only a sentence ends with a new line character or a full stop then I suggest you first split the text and then search each line:
如果你可以保证一个句子和一个句子以一个新的行字符或句号结尾,那么我建议你首先拆分文本,然后搜索每一行:
String[] sentences = text.split("\\.|\\R+");
for (String se : sentences) {
if (se.indexOf("Harrison") != -1)
System.out.println(se.trim());
}
Output:
Harrison Ford is working on a new Film
Most paid actor is Harrison Ford in the film
#2
1
For Java, the following code should do the trick
对于Java,以下代码应该可以解决问题
String data = "Harrison Ford is working on a new Film\n The film is yet to be released. "
+ "The film has a gud star cast. "
+ "Most paid actor is Harrison Ford in the film.";
String tmpData = data.replace('\n', '.');
Pattern myPattern = Pattern.compile("([\\w|\\s]*Harrison[\\w|\\s]*)[\\.]");
Matcher m = myPattern.matcher(tmpData);
while(m.find()) {
System.out.println("Result: " + m.group(1));
}
#3
0
You should use the global flag to match all occurences in a string. Then use this regex to find all sentences containing "Harrison":
您应该使用全局标志来匹配字符串中的所有出现。然后使用此正则表达式查找包含“Harrison”的所有句子:
(?:[\w][^.]+)?Harrison[^.]+
See a demo here.
在这里看一个演示。