在Java中将句子字符串拆分为每行句子

时间:2022-11-15 21:40:26

I want to split the sentences into one sentence per line in Java.

我想在Java中将每行句子分成一行。

Input String: "Volatility returned to the municipal bond market during the first half of the funds’ fiscal year as investors weighed the potential impact of the U.S. presidential election, strengthening economic conditions and rising interest rates. The market was further pressured by a record level of municipal bond issuance in 2016. Against this backdrop, all six funds registered declines, ranging from –0.92% for American Funds Short-Term Tax-Exempt Bond Fund to –3.77% for American High-Income Municipal Bond Fund. (See pages 4 through 10 for fund specific results and information.)"

输入字符串:“由于投资者权衡了美国总统大选的潜在影响,加强了经济状况和利率上升,”基金会财政年度上半年市场债券市场回归波动。市场受到创纪录水平的进一步压力在此背景下,所有六只基金均下跌,从美国基金短期免税债券基金的-0.92%到美国高收入市政债券基金的-3.77%不等。(见第4页)通过10来获得基金特定的结果和信息。)“

Output:

Sentence1: Volatility returned to the municipal bond market during the first half of the funds’ fiscal year as investors weighed the potential impact of the U.S. presidential election, strengthening economic conditions and rising interest rates.

判决1:由于投资者权衡了美国总统大选的潜在影响,加强了经济状况和利率上升,因此在基金财政年度的上半年,波动性回归市政债券市场。

Sentence2: The market was further pressured by a record level of municipal bond issuance in 2016. Against this backdrop, all six funds registered declines, ranging from –0.92% for American Funds Short-Term Tax-Exempt Bond Fund to –3.77% for American High-Income Municipal Bond Fund.

判决2:市场受2016年市政债券发行创纪录水平的进一步压力。在此背景下,所有六只基金均下跌,美国基金短期免税债券基金的-0.92%至美国的-3.77%。高收入市政债券基金。

Sentence3:(See pages 4 through 10 for fund specific results and information.

句子3 :(有关基金的具体结果和信息,请参见第4至10页。

I have written a java code to split the Sentences when .('Full stop') occurs, A new line has been coming after U.S.

我写了一个java代码来分解句子。('完全停止')发生了,一条新的线路已经在美国之后出现。

string = string.replace(". ", ".\n")

string = string.replace(“。”,“。\ n”)

3 个解决方案

#1


1  

You could use String::split with regex to accomplish this like so:

您可以使用String :: split with regex来完成此操作:

String[] sentences = paragraph.split("(?<=[^ ]\\.) (?=[^a-z])");
int count = 0;
for(String str:sentences)
    System.out.println("Sentence " + (++count) + ":" + str);

This uses advanced regex techniques called look ahead and look behind to retain the delimiters upon matching.

这使用了名为look ahead的高级正则表达式技术,并在匹配时保留了分隔符。

#2


0  

String#split() takes a regex. In regex, . means anything other than \n. Escape the dot using \, so the resulting parameter becomes \\.

String#split()采用正则表达式。在正则表达式中,。是指除了\ n之外的任何东西。使用\来转义点,因此生成的参数变为\\。

#3


0  

Try something like this inside your code:

在代码中尝试这样的事情:

List<String> eachLine = new ArrayList<String>();
String initialString = new String("Volatility returned to the municipal bond market during the first half of the funds’ fiscal year as investors weighed the potential impact of the U.S. presidential election, strengthening economic conditions and rising interest rates. The market was further pressured by a record level of municipal bond issuance in 2016. Against this backdrop, all six funds registered declines, ranging from –0.92% for American Funds Short-Term Tax-Exempt Bond Fund to –3.77% for American High-Income Municipal Bond Fund. (See pages 4 through 10 for fund specific results and information.)");

int stopIndex = initialString.indexOf( '. ' );//I am searching for the first occurance of '. ' in the string. 
//Note full stop followed blank space, which would denote either end of a sentence or words like U.K. or U.S. etc.

boolean UpperCase = checkForUpperCase(stopIndex+1);//write a function to check whether the alphabet/character following '. ' is in uppercase or not
//checking for Uppercase because a senetence starts with Uppercase
if(UpperCase){
   eachLine.add(initialString.substring(0,stopIndex));//add the sentence to List<String> to be processed later
   initialString = initialString.substring(stopIndex+1);//storing the rest of the sentence in the same string to be processed again
}
//keep parsing till you parse the whole string

You can get general idea regarding how you may check for Uppercase from here: Java Program to test if a character is uppercase/lowercase/number/vowel

您可以从这里获得关于如何检查大写的一般概念:Java程序来测试字符是否为大写/小写/数字/元音

The aforementioned code is just a snippet to provide you understanding of how you may achieve your goal or approach your issue.

上述代码只是一个片段,可让您了解如何实现目标或解决问题。

You can also use Regular Expressions to find the full stop pattern as well, but understanding the basic approach might be more useful later.

您也可以使用正则表达式来查找完整停止模式,但了解基本方法可能会在以后更有用。

Regular Expressions in Java: https://www.tutorialspoint.com/java/java_regular_expressions.htm

Java中的正则表达式:https://www.tutorialspoint.com/java/java_regular_expressions.htm

#1


1  

You could use String::split with regex to accomplish this like so:

您可以使用String :: split with regex来完成此操作:

String[] sentences = paragraph.split("(?<=[^ ]\\.) (?=[^a-z])");
int count = 0;
for(String str:sentences)
    System.out.println("Sentence " + (++count) + ":" + str);

This uses advanced regex techniques called look ahead and look behind to retain the delimiters upon matching.

这使用了名为look ahead的高级正则表达式技术,并在匹配时保留了分隔符。

#2


0  

String#split() takes a regex. In regex, . means anything other than \n. Escape the dot using \, so the resulting parameter becomes \\.

String#split()采用正则表达式。在正则表达式中,。是指除了\ n之外的任何东西。使用\来转义点,因此生成的参数变为\\。

#3


0  

Try something like this inside your code:

在代码中尝试这样的事情:

List<String> eachLine = new ArrayList<String>();
String initialString = new String("Volatility returned to the municipal bond market during the first half of the funds’ fiscal year as investors weighed the potential impact of the U.S. presidential election, strengthening economic conditions and rising interest rates. The market was further pressured by a record level of municipal bond issuance in 2016. Against this backdrop, all six funds registered declines, ranging from –0.92% for American Funds Short-Term Tax-Exempt Bond Fund to –3.77% for American High-Income Municipal Bond Fund. (See pages 4 through 10 for fund specific results and information.)");

int stopIndex = initialString.indexOf( '. ' );//I am searching for the first occurance of '. ' in the string. 
//Note full stop followed blank space, which would denote either end of a sentence or words like U.K. or U.S. etc.

boolean UpperCase = checkForUpperCase(stopIndex+1);//write a function to check whether the alphabet/character following '. ' is in uppercase or not
//checking for Uppercase because a senetence starts with Uppercase
if(UpperCase){
   eachLine.add(initialString.substring(0,stopIndex));//add the sentence to List<String> to be processed later
   initialString = initialString.substring(stopIndex+1);//storing the rest of the sentence in the same string to be processed again
}
//keep parsing till you parse the whole string

You can get general idea regarding how you may check for Uppercase from here: Java Program to test if a character is uppercase/lowercase/number/vowel

您可以从这里获得关于如何检查大写的一般概念:Java程序来测试字符是否为大写/小写/数字/元音

The aforementioned code is just a snippet to provide you understanding of how you may achieve your goal or approach your issue.

上述代码只是一个片段,可让您了解如何实现目标或解决问题。

You can also use Regular Expressions to find the full stop pattern as well, but understanding the basic approach might be more useful later.

您也可以使用正则表达式来查找完整停止模式,但了解基本方法可能会在以后更有用。

Regular Expressions in Java: https://www.tutorialspoint.com/java/java_regular_expressions.htm

Java中的正则表达式:https://www.tutorialspoint.com/java/java_regular_expressions.htm