正则表达式:如何匹配大写序列之前和之后的任何内容,并以句点作为分隔符?

时间:2022-09-13 00:20:18

I have a series of sentences containing uppercase keywords in a large text containing several other sentences. I just need to match those sentences that contain uppercase words (1 or more), for instance:

我有一系列句子,在包含几个其他句子的大文本中包含大写关键字。我只需匹配那些包含大写单词(1或更多)的句子,例如:

This is MY SENTENCE that should be matched.
And THIS one should be too.
This other sentence should not be matched.

Any suggestion? Thanks! I am not an advanced user...

有什么建议吗?谢谢!我不是高级用户......

3 个解决方案

#1


1  

This is it:

就是这个:

^.*\b[A-Z]+\b.*$
  • \b assert position at a word boundary
  • \ b在单词边界处断言位置

  • A-Z a single character in the range between A (index 65) and Z
  • A-Z是A(索引65)和Z之间范围内的单个字符

https://regex101.com/r/kUN41W/1


If I is NOT counted as an UPPERCASE word in a sentence that matches your conditions. Then use this:

如果我不算作符合条件的句子中的大写单词。然后用这个:

^.*\b[A-Z]{2,}\b.*$
  • {2,} Quantifier — Matches between 2 and unlimited times, as many times as possible, giving back as needed
  • {2,}量词 - 匹配2次和无限次,尽可能多次,根据需要回馈

#2


1  

Try some tools like https://regexr.com/. They really help visualizing which effect your regex has.

尝试一些工具,如https://regexr.com/。它们确实有助于可视化正则表达式的效果。

For your testdata this regex is fine:

对于你的testdata这个正则表达式很好:

([^\.]*[A-Z]{2,}[^\.]*)\.

It is composed of

它由...组成

  • [^\.]* anything that is no dot
  • [^ \。] *任何没有点的东西

  • [A-Z]{2,} at least 2 uppercase characters
  • [A-Z] {2,}至少2个大写字符

  • [^\.]* anything that is no dot
  • [^ \。] *任何没有点的东西

#3


0  

using Python

import re

txt = 'This is MY SENTENCE and I would like, this sentence, to be matched because it contains uppercase words. This other sentence should not be matched. And THIS one should be.' 

for s in txt.split('.'):
    if re.search(r'\b[A-Z]+\b', s): 
        print(s)

output:

This is MY SENTENCE and I would like, this sentence, to be matched because it contains uppercase words
 And THIS one should be

#1


1  

This is it:

就是这个:

^.*\b[A-Z]+\b.*$
  • \b assert position at a word boundary
  • \ b在单词边界处断言位置

  • A-Z a single character in the range between A (index 65) and Z
  • A-Z是A(索引65)和Z之间范围内的单个字符

https://regex101.com/r/kUN41W/1


If I is NOT counted as an UPPERCASE word in a sentence that matches your conditions. Then use this:

如果我不算作符合条件的句子中的大写单词。然后用这个:

^.*\b[A-Z]{2,}\b.*$
  • {2,} Quantifier — Matches between 2 and unlimited times, as many times as possible, giving back as needed
  • {2,}量词 - 匹配2次和无限次,尽可能多次,根据需要回馈

#2


1  

Try some tools like https://regexr.com/. They really help visualizing which effect your regex has.

尝试一些工具,如https://regexr.com/。它们确实有助于可视化正则表达式的效果。

For your testdata this regex is fine:

对于你的testdata这个正则表达式很好:

([^\.]*[A-Z]{2,}[^\.]*)\.

It is composed of

它由...组成

  • [^\.]* anything that is no dot
  • [^ \。] *任何没有点的东西

  • [A-Z]{2,} at least 2 uppercase characters
  • [A-Z] {2,}至少2个大写字符

  • [^\.]* anything that is no dot
  • [^ \。] *任何没有点的东西

#3


0  

using Python

import re

txt = 'This is MY SENTENCE and I would like, this sentence, to be matched because it contains uppercase words. This other sentence should not be matched. And THIS one should be.' 

for s in txt.split('.'):
    if re.search(r'\b[A-Z]+\b', s): 
        print(s)

output:

This is MY SENTENCE and I would like, this sentence, to be matched because it contains uppercase words
 And THIS one should be