I have a series of sentences containing uppercase keywords in a large text containing several other sentences. I just need to match those sentences that contain uppercase words (1 or more), for instance:
我有一系列句子,在包含几个其他句子的大文本中包含大写关键字。我只需匹配那些包含大写单词(1或更多)的句子,例如:
This is MY SENTENCE that should be matched.
And THIS one should be too.
This other sentence should not be matched.
Any suggestion? Thanks! I am not an advanced user...
有什么建议吗?谢谢!我不是高级用户......
3 个解决方案
#1
1
This is it:
就是这个:
^.*\b[A-Z]+\b.*$
- \b assert position at a word boundary
- A-Z a single character in the range between A (index 65) and Z
\ b在单词边界处断言位置
A-Z是A(索引65)和Z之间范围内的单个字符
https://regex101.com/r/kUN41W/1
If I
is NOT counted as an UPPERCASE word in a sentence that matches your conditions. Then use this:
如果我不算作符合条件的句子中的大写单词。然后用这个:
^.*\b[A-Z]{2,}\b.*$
- {2,} Quantifier — Matches between 2 and unlimited times, as many times as possible, giving back as needed
{2,}量词 - 匹配2次和无限次,尽可能多次,根据需要回馈
#2
1
Try some tools like https://regexr.com/. They really help visualizing which effect your regex has.
尝试一些工具,如https://regexr.com/。它们确实有助于可视化正则表达式的效果。
For your testdata this regex is fine:
对于你的testdata这个正则表达式很好:
([^\.]*[A-Z]{2,}[^\.]*)\.
It is composed of
它由...组成
-
[^\.]*
anything that is no dot -
[A-Z]{2,}
at least 2 uppercase characters -
[^\.]*
anything that is no dot
[^ \。] *任何没有点的东西
[A-Z] {2,}至少2个大写字符
[^ \。] *任何没有点的东西
#3
0
using Python
import re
txt = 'This is MY SENTENCE and I would like, this sentence, to be matched because it contains uppercase words. This other sentence should not be matched. And THIS one should be.'
for s in txt.split('.'):
if re.search(r'\b[A-Z]+\b', s):
print(s)
output:
This is MY SENTENCE and I would like, this sentence, to be matched because it contains uppercase words And THIS one should be
#1
1
This is it:
就是这个:
^.*\b[A-Z]+\b.*$
- \b assert position at a word boundary
- A-Z a single character in the range between A (index 65) and Z
\ b在单词边界处断言位置
A-Z是A(索引65)和Z之间范围内的单个字符
https://regex101.com/r/kUN41W/1
If I
is NOT counted as an UPPERCASE word in a sentence that matches your conditions. Then use this:
如果我不算作符合条件的句子中的大写单词。然后用这个:
^.*\b[A-Z]{2,}\b.*$
- {2,} Quantifier — Matches between 2 and unlimited times, as many times as possible, giving back as needed
{2,}量词 - 匹配2次和无限次,尽可能多次,根据需要回馈
#2
1
Try some tools like https://regexr.com/. They really help visualizing which effect your regex has.
尝试一些工具,如https://regexr.com/。它们确实有助于可视化正则表达式的效果。
For your testdata this regex is fine:
对于你的testdata这个正则表达式很好:
([^\.]*[A-Z]{2,}[^\.]*)\.
It is composed of
它由...组成
-
[^\.]*
anything that is no dot -
[A-Z]{2,}
at least 2 uppercase characters -
[^\.]*
anything that is no dot
[^ \。] *任何没有点的东西
[A-Z] {2,}至少2个大写字符
[^ \。] *任何没有点的东西
#3
0
using Python
import re
txt = 'This is MY SENTENCE and I would like, this sentence, to be matched because it contains uppercase words. This other sentence should not be matched. And THIS one should be.'
for s in txt.split('.'):
if re.search(r'\b[A-Z]+\b', s):
print(s)
output:
This is MY SENTENCE and I would like, this sentence, to be matched because it contains uppercase words And THIS one should be