String a = "This is a book! I am a boy. He is a good man. My friend Mrs. Roy is good man. He is nice person. Miss. Star is my friend.";
String b = Pattern.compile("([a-zA-Z]+)([.!?])( )([A-Z]+)",Pattern.CASE_INSENSITIVE).matcher(a).replaceAll("$1$2 →$4");
The result is :
结果是:
This is a book! →I am a boy. →He is a good man. →My friend Mrs. →Roy is good man. →He is nice person. →Miss. Star is my friend.
这是一本书! →我是个男孩。 →他是个好人。 →我的朋友太太→罗伊是好人。 →他是个好人。 →小姐。明星是我的朋友。
but What i want is:
但我想要的是:
This is a book! →I am a boy. →He is a good man. →My friend Mrs. Roy is good man. →He is nice person. →Miss. Star is my friend.
这是一本书! →我是个男孩。 →他是个好人。 →我的朋友罗伊夫人是好人。 →他是个好人。 →小姐。明星是我的朋友。
I don't want add the "→" after some string "Mr."、"Miss."、"Mrs."、"Ms.". → is tag the start of sentence.
我不想在一些字符串“先生”,“小姐”,“太太”,“女士”之后添加“→”。 →标记句子的开头。
Thanks~!
谢谢〜!
1 个解决方案
#1
1
If you desire to match a well written sentences in your test you should first match Uppercase or Lowercase , then anything that is not a quotation and then one of them. And then not Match special groups Like Dr, Mr , Mrs and etc.
如果你想在你的测试中匹配一个写得好的句子,你应该首先匹配大写或小写,然后匹配任何不是引用然后其中一个。然后不匹配像博士,先生,夫人等特殊群体。
(([^.!?]|(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)[.!?])*[\.!?])
the explanation :
说明 :
1st Capturing Group
第一捕获组
(([^.!?]|(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)[.!?])*[\.!?])
2nd Capturing Group
第二捕获小组
([^.!?]|(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)[.!?])*
*
Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy) A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data 1st Alternative
量词 - 在零和无限时间之间匹配,尽可能多次,根据需要返回(贪婪)重复捕获组将仅捕获最后一次迭代。如果您对数据不感兴趣,请在重复组周围放置捕获组以捕获所有迭代或使用非捕获组
[^.!?]
Match a single character not present in the list .!?
匹配列表中不存在的单个字符。!?
2nd Alternative
第二种选择
(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)[.!?]
Positive Lookbehind
积极的看后
(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)
Assert that the Regex below matches
断言下面的正则表达式匹配
1st Alternatives
第一选择
Dr|Mr|Mrs
Dr matches the characters Dr,Mr,Mrs literally (case sensitive)
博士匹配字符Dr,Mr,Mrs字符(区分大小写)
2th Alternative
第二种选择
\b[A-Za-z]\s
And both the groups
两个小组
[.!?]
Match a single character present in the list .!?
匹配列表中的单个字符。!?
正则表达链接
#1
1
If you desire to match a well written sentences in your test you should first match Uppercase or Lowercase , then anything that is not a quotation and then one of them. And then not Match special groups Like Dr, Mr , Mrs and etc.
如果你想在你的测试中匹配一个写得好的句子,你应该首先匹配大写或小写,然后匹配任何不是引用然后其中一个。然后不匹配像博士,先生,夫人等特殊群体。
(([^.!?]|(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)[.!?])*[\.!?])
the explanation :
说明 :
1st Capturing Group
第一捕获组
(([^.!?]|(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)[.!?])*[\.!?])
2nd Capturing Group
第二捕获小组
([^.!?]|(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)[.!?])*
*
Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy) A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data 1st Alternative
量词 - 在零和无限时间之间匹配,尽可能多次,根据需要返回(贪婪)重复捕获组将仅捕获最后一次迭代。如果您对数据不感兴趣,请在重复组周围放置捕获组以捕获所有迭代或使用非捕获组
[^.!?]
Match a single character not present in the list .!?
匹配列表中不存在的单个字符。!?
2nd Alternative
第二种选择
(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)[.!?]
Positive Lookbehind
积极的看后
(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)
Assert that the Regex below matches
断言下面的正则表达式匹配
1st Alternatives
第一选择
Dr|Mr|Mrs
Dr matches the characters Dr,Mr,Mrs literally (case sensitive)
博士匹配字符Dr,Mr,Mrs字符(区分大小写)
2th Alternative
第二种选择
\b[A-Za-z]\s
And both the groups
两个小组
[.!?]
Match a single character present in the list .!?
匹配列表中的单个字符。!?
正则表达链接