正则表达式替换字符串和特定字符串

时间:2022-09-13 09:23:48
String a = "This is a book! I am a boy. He is a good man. My friend Mrs. Roy is good man. He is nice person. Miss. Star is my friend.";
String b = Pattern.compile("([a-zA-Z]+)([.!?])( )([A-Z]+)",Pattern.CASE_INSENSITIVE).matcher(a).replaceAll("$1$2 →$4");

The result is :

结果是:

This is a book! →I am a boy. →He is a good man. →My friend Mrs. →Roy is good man. →He is nice person. →Miss. Star is my friend.

这是一本书! →我是个男孩。 →他是个好人。 →我的朋友太太→罗伊是好人。 →他是个好人。 →小姐。明星是我的朋友。

but What i want is:

但我想要的是:

This is a book! →I am a boy. →He is a good man. →My friend Mrs. Roy is good man. →He is nice person. →Miss. Star is my friend.

这是一本书! →我是个男孩。 →他是个好人。 →我的朋友罗伊夫人是好人。 →他是个好人。 →小姐。明星是我的朋友。

I don't want add the "→" after some string "Mr."、"Miss."、"Mrs."、"Ms.". → is tag the start of sentence.

我不想在一些字符串“先生”,“小姐”,“太太”,“女士”之后添加“→”。 →标记句子的开头。

Thanks~!

谢谢〜!

1 个解决方案

#1


1  

If you desire to match a well written sentences in your test you should first match Uppercase or Lowercase , then anything that is not a quotation and then one of them. And then not Match special groups Like Dr, Mr , Mrs and etc.

如果你想在你的测试中匹配一个写得好的句子,你应该首先匹配大写或小写,然后匹配任何不是引用然后其中一个。然后不匹配像博士,先生,夫人等特殊群体。

(([^.!?]|(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)[.!?])*[\.!?])

the explanation :

说明 :

1st Capturing Group

第一捕获组

(([^.!?]|(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)[.!?])*[\.!?])

2nd Capturing Group

第二捕获小组

([^.!?]|(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)[.!?])*

*

Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy) A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data 1st Alternative

量词 - 在零和无限时间之间匹配,尽可能多次,根据需要返回(贪婪)重复捕获组将仅捕获最后一次迭代。如果您对数据不感兴趣,请在重复组周围放置捕获组以捕获所有迭代或使用非捕获组

[^.!?]

Match a single character not present in the list .!?

匹配列表中不存在的单个字符。!?

2nd Alternative

第二种选择

(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)[.!?]

Positive Lookbehind

积极的看后

(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)

Assert that the Regex below matches

断言下面的正则表达式匹配

1st Alternatives

第一选择

Dr|Mr|Mrs

Dr matches the characters Dr,Mr,Mrs literally (case sensitive)

博士匹配字符Dr,Mr,Mrs字符(区分大小写)

2th Alternative

第二种选择

 \b[A-Za-z]\s

And both the groups

两个小组

[.!?]

Match a single character present in the list .!?

匹配列表中的单个字符。!?

Regex Link

正则表达链接

#1


1  

If you desire to match a well written sentences in your test you should first match Uppercase or Lowercase , then anything that is not a quotation and then one of them. And then not Match special groups Like Dr, Mr , Mrs and etc.

如果你想在你的测试中匹配一个写得好的句子,你应该首先匹配大写或小写,然后匹配任何不是引用然后其中一个。然后不匹配像博士,先生,夫人等特殊群体。

(([^.!?]|(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)[.!?])*[\.!?])

the explanation :

说明 :

1st Capturing Group

第一捕获组

(([^.!?]|(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)[.!?])*[\.!?])

2nd Capturing Group

第二捕获小组

([^.!?]|(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)[.!?])*

*

Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy) A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data 1st Alternative

量词 - 在零和无限时间之间匹配,尽可能多次,根据需要返回(贪婪)重复捕获组将仅捕获最后一次迭代。如果您对数据不感兴趣,请在重复组周围放置捕获组以捕获所有迭代或使用非捕获组

[^.!?]

Match a single character not present in the list .!?

匹配列表中不存在的单个字符。!?

2nd Alternative

第二种选择

(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)[.!?]

Positive Lookbehind

积极的看后

(?<=Dr|Mr|Mrs|\b[A-Za-z]|\s)

Assert that the Regex below matches

断言下面的正则表达式匹配

1st Alternatives

第一选择

Dr|Mr|Mrs

Dr matches the characters Dr,Mr,Mrs literally (case sensitive)

博士匹配字符Dr,Mr,Mrs字符(区分大小写)

2th Alternative

第二种选择

 \b[A-Za-z]\s

And both the groups

两个小组

[.!?]

Match a single character present in the list .!?

匹配列表中的单个字符。!?

Regex Link

正则表达链接