正则表达式分割包含特定单词的句子

时间:2021-09-16 12:46:07

I need to create a regex through which i can find all the sentences containing a specific word/regex.

我需要创建一个正则表达式,通过它我可以找到包含特定单词/正则表达式的所有句子。

For eg. if i have the following text

例如。如果我有以下文字

Harrison Ford is working on a new Film. The film is yet to be released

哈里森福特正在制作一部新电影。这部电影尚未发行

The film has a gud star cast. Most paid actor is Harrison Ford in the film.

这部电影有一个gud星演员。最赚钱的演员是电影中的哈里森福特。

Here if i want to get all the sentences where I can find the word Harrison, How should i go about it. The regex should return the following selections

在这里,如果我想得到所有句子,我可以找到哈里森这个词,我该怎么办呢。正则表达式应返回以下选择

  • Harrison Ford is working on a new Film.
  • 哈里森福特正在制作一部新电影。

  • Most paid actor is Harrison Ford in the film.
  • 最赚钱的演员是电影中的哈里森福特。

The sentence beginning and ending can be marked by a new line character, or a full stop or if it is the first line in the paragraph.

句子的开头和结尾可以用新行字符标记,也可以用句号标记,或者如果它是段落中的第一行。

I used the following regex

我使用了以下正则表达式

.*?((\n|.|^\\s*).*?\\b(Harrison)\\b.*?[.\n]).*

But i am unable to get the splitting of the lines. I get the sentence from the start till the first Harrison Ford.

但我无法分割线条。从开始到第一个哈里森福特,我得到的判决。

Please let me know of any suggestions that any of you may have

如果您有任何建议,请告知我们

3 个解决方案

#1


1  

If you can guarantee that a sentence and only a sentence ends with a new line character or a full stop then I suggest you first split the text and then search each line:

如果你可以保证一个句子和一个句子以一个新的行字符或句号结尾,那么我建议你首先拆分文本,然后搜索每一行:

String[] sentences = text.split("\\.|\\R+");
for (String se : sentences) {
    if (se.indexOf("Harrison") != -1)
        System.out.println(se.trim());
}

Output:

Harrison Ford is working on a new Film
Most paid actor is Harrison Ford in the film

#2


1  

For Java, the following code should do the trick

对于Java,以下代码应该可以解决问题

String data = "Harrison Ford is working on a new Film\n The film is yet to be released. "
    + "The film has a gud star cast. "
    + "Most paid actor is Harrison Ford in the film.";

String tmpData = data.replace('\n', '.');
Pattern myPattern = Pattern.compile("([\\w|\\s]*Harrison[\\w|\\s]*)[\\.]");
Matcher m = myPattern.matcher(tmpData);

while(m.find()) {
    System.out.println("Result: " + m.group(1));
}

#3


0  

You should use the global flag to match all occurences in a string. Then use this regex to find all sentences containing "Harrison":

您应该使用全局标志来匹配字符串中的所有出现。然后使用此正则表达式查找包含“Harrison”的所有句子:

(?:[\w][^.]+)?Harrison[^.]+

正则表达式分割包含特定单词的句子

See a demo here.

在这里看一个演示。

#1


1  

If you can guarantee that a sentence and only a sentence ends with a new line character or a full stop then I suggest you first split the text and then search each line:

如果你可以保证一个句子和一个句子以一个新的行字符或句号结尾,那么我建议你首先拆分文本,然后搜索每一行:

String[] sentences = text.split("\\.|\\R+");
for (String se : sentences) {
    if (se.indexOf("Harrison") != -1)
        System.out.println(se.trim());
}

Output:

Harrison Ford is working on a new Film
Most paid actor is Harrison Ford in the film

#2


1  

For Java, the following code should do the trick

对于Java,以下代码应该可以解决问题

String data = "Harrison Ford is working on a new Film\n The film is yet to be released. "
    + "The film has a gud star cast. "
    + "Most paid actor is Harrison Ford in the film.";

String tmpData = data.replace('\n', '.');
Pattern myPattern = Pattern.compile("([\\w|\\s]*Harrison[\\w|\\s]*)[\\.]");
Matcher m = myPattern.matcher(tmpData);

while(m.find()) {
    System.out.println("Result: " + m.group(1));
}

#3


0  

You should use the global flag to match all occurences in a string. Then use this regex to find all sentences containing "Harrison":

您应该使用全局标志来匹配字符串中的所有出现。然后使用此正则表达式查找包含“Harrison”的所有句子:

(?:[\w][^.]+)?Harrison[^.]+

正则表达式分割包含特定单词的句子

See a demo here.

在这里看一个演示。