正则表达式：如何匹配大写序列之前和之后的任何内容，并以句点作为分隔符？

I have a series of sentences containing uppercase keywords in a large text containing several other sentences. I just need to match those sentences that contain uppercase words (1 or more), for instance:

我有一系列句子,在包含几个其他句子的大文本中包含大写关键字。我只需匹配那些包含大写单词(1或更多)的句子,例如:

This is MY SENTENCE that should be matched.
And THIS one should be too.
This other sentence should not be matched.

Any suggestion? Thanks! I am not an advanced user...

有什么建议吗?谢谢!我不是高级用户......

3 个解决方案

#1

This is it:

就是这个:

^.*\b[A-Z]+\b.*$

\b assert position at a word boundary

\ b在单词边界处断言位置

A-Z a single character in the range between A (index 65) and Z

A-Z是A(索引65)和Z之间范围内的单个字符

https://regex101.com/r/kUN41W/1

If I is NOT counted as an UPPERCASE word in a sentence that matches your conditions. Then use this:

如果我不算作符合条件的句子中的大写单词。然后用这个:

^.*\b[A-Z]{2,}\b.*$

{2,} Quantifier — Matches between 2 and unlimited times, as many times as possible, giving back as needed

{2,}量词 - 匹配2次和无限次,尽可能多次,根据需要回馈

#2

Try some tools like https://regexr.com/. They really help visualizing which effect your regex has.

尝试一些工具,如https://regexr.com/。它们确实有助于可视化正则表达式的效果。

For your testdata this regex is fine:

对于你的testdata这个正则表达式很好:

([^\.]*[A-Z]{2,}[^\.]*)\.

It is composed of

它由...组成

[^\.]* anything that is no dot

[^ \。] *任何没有点的东西

[A-Z]{2,} at least 2 uppercase characters

[A-Z] {2,}至少2个大写字符

[^\.]* anything that is no dot

[^ \。] *任何没有点的东西

#3

using Python

import re

txt = 'This is MY SENTENCE and I would like, this sentence, to be matched because it contains uppercase words. This other sentence should not be matched. And THIS one should be.' 

for s in txt.split('.'):
    if re.search(r'\b[A-Z]+\b', s): 
        print(s)

output:

This is MY SENTENCE and I would like, this sentence, to be matched because it contains uppercase words
 And THIS one should be

#1

This is it:

就是这个:

^.*\b[A-Z]+\b.*$

\b assert position at a word boundary

\ b在单词边界处断言位置

A-Z a single character in the range between A (index 65) and Z

A-Z是A(索引65)和Z之间范围内的单个字符

https://regex101.com/r/kUN41W/1

If I is NOT counted as an UPPERCASE word in a sentence that matches your conditions. Then use this:

如果我不算作符合条件的句子中的大写单词。然后用这个:

^.*\b[A-Z]{2,}\b.*$

{2,} Quantifier — Matches between 2 and unlimited times, as many times as possible, giving back as needed

{2,}量词 - 匹配2次和无限次,尽可能多次,根据需要回馈

#2

Try some tools like https://regexr.com/. They really help visualizing which effect your regex has.

尝试一些工具,如https://regexr.com/。它们确实有助于可视化正则表达式的效果。

For your testdata this regex is fine:

对于你的testdata这个正则表达式很好:

([^\.]*[A-Z]{2,}[^\.]*)\.

It is composed of

它由...组成

[^\.]* anything that is no dot

[^ \。] *任何没有点的东西

[A-Z]{2,} at least 2 uppercase characters

[A-Z] {2,}至少2个大写字符

[^\.]* anything that is no dot

[^ \。] *任何没有点的东西

#3

using Python

import re

txt = 'This is MY SENTENCE and I would like, this sentence, to be matched because it contains uppercase words. This other sentence should not be matched. And THIS one should be.' 

for s in txt.split('.'):
    if re.search(r'\b[A-Z]+\b', s): 
        print(s)

output:

This is MY SENTENCE and I would like, this sentence, to be matched because it contains uppercase words
 And THIS one should be

秒客网

正则表达式：如何匹配大写序列之前和之后的任何内容，并以句点作为分隔符？

3 个解决方案

#1

#2

#3

#1

#2

#3

相关文章