用正则表达式匹配字符串中的多个单词

时间:2022-09-13 11:32:58

I am using Python to match a few words within a sentence and testing them against unit tests. I want a regular expression that matches all these words and gives me these outputs mentioned below:

我正在使用Python来匹配句子中的几个单词,并对它们进行单元测试。我想要一个符合所有这些词的正则表达式,并给出下面提到的输出:

firstword = "<p>This is @Timberlake</p>"
outputfirstword = "@Timberlake"

Finds the word that starts with the @ symbol

查找以@符号开头的单词

secondword = "<p>This is @timber.lake</p>"
outputsecondword = "@timber.lake"

Period between words are okay.

单词之间的句点是可以的。

thirdword = "This is @Timberlake. Yo!"
outputthirdword = "@Timberlake"

If there is a space after the period then both the period and space don't count towards the outputthirdword

如果在句点之后有一个空格,那么句点和空格都不能算作outputthirdword

fourthword = "This is @Timberlake."
outputfourthword = "@Timberlake"

The final period (.) is not included.

最后一个周期(.)不包括在内。

4 个解决方案

#1


2  

Using this regex:

使用这个正则表达式:

(?i)@[a-z.]+\b

You are able to extract the needed part by using capturing groups. Live demo

您可以通过使用捕获组提取所需的部分。现场演示

Explanations:

解释:

(?i)     # Enabling case-insensitive modifier
@        # Literal @
[a-z.]   # Match letters a to z as well as a period
\b       # Ending at a word boundary

#2


1  

@[a-zA-Z]+\b(?:\.[a-zA-Z]+\b)?

You can use this.See demo.

你可以使用这个。看到演示。

import re
p = re.compile(r'@[a-zA-Z]+\b(?:\.[a-zA-Z]+\b)?')
test_str = "This is @Timberlake. Yo!\n<p>This is @timber.lake</p>"

re.findall(p, test_str)

#3


0  

One way is using following regex and strip the result with dot :

一种方法是使用如下regex,用dot去掉结果:

@[a-zA-Z.]+

For example if you use re.search you can do :

例如,如果你使用re.search,你可以:

re.search(r'@[a-zA-Z.]+','my_string').group(0).strip('.')

And you can use following regex that doesn't need strip :

您可以使用以下不需要带的regex:

@[a-zA-Z]+.?[a-zA-Z]+

Demo

演示

#4


0  

As @Kasra mention, the regex works nice. But it will not remove the dot in the end.

正如@Kasra提到的,regex运行良好。但它最终不会去掉这个点。

Use the regex below and i believe that it is what you expect.

使用下面的regex,我相信这就是您所期望的。

@[a-zA-Z.]+[a-zA-Z]+

See an example below, it is not in Python, but the regex should be the same.

请参见下面的示例,它不是在Python中,但是regex应该是相同的。

$ (echo "<p>This is @Timberlake</p>"; echo "<p>This is @timber.lake</p>"; echo "This is @Timberlake."; echo "<p>This is @tim.ber.lake</p>") | grep -Eo '@[a-zA-Z.]+[a-zA-Z]+'
@Timberlake
@timber.lake
@Timberlake
@tim.ber.lake

#1


2  

Using this regex:

使用这个正则表达式:

(?i)@[a-z.]+\b

You are able to extract the needed part by using capturing groups. Live demo

您可以通过使用捕获组提取所需的部分。现场演示

Explanations:

解释:

(?i)     # Enabling case-insensitive modifier
@        # Literal @
[a-z.]   # Match letters a to z as well as a period
\b       # Ending at a word boundary

#2


1  

@[a-zA-Z]+\b(?:\.[a-zA-Z]+\b)?

You can use this.See demo.

你可以使用这个。看到演示。

import re
p = re.compile(r'@[a-zA-Z]+\b(?:\.[a-zA-Z]+\b)?')
test_str = "This is @Timberlake. Yo!\n<p>This is @timber.lake</p>"

re.findall(p, test_str)

#3


0  

One way is using following regex and strip the result with dot :

一种方法是使用如下regex,用dot去掉结果:

@[a-zA-Z.]+

For example if you use re.search you can do :

例如,如果你使用re.search,你可以:

re.search(r'@[a-zA-Z.]+','my_string').group(0).strip('.')

And you can use following regex that doesn't need strip :

您可以使用以下不需要带的regex:

@[a-zA-Z]+.?[a-zA-Z]+

Demo

演示

#4


0  

As @Kasra mention, the regex works nice. But it will not remove the dot in the end.

正如@Kasra提到的,regex运行良好。但它最终不会去掉这个点。

Use the regex below and i believe that it is what you expect.

使用下面的regex,我相信这就是您所期望的。

@[a-zA-Z.]+[a-zA-Z]+

See an example below, it is not in Python, but the regex should be the same.

请参见下面的示例,它不是在Python中,但是regex应该是相同的。

$ (echo "<p>This is @Timberlake</p>"; echo "<p>This is @timber.lake</p>"; echo "This is @Timberlake."; echo "<p>This is @tim.ber.lake</p>") | grep -Eo '@[a-zA-Z.]+[a-zA-Z]+'
@Timberlake
@timber.lake
@Timberlake
@tim.ber.lake