正则表达式从推文中提取关键字

时间:2022-08-22 11:24:50

I'm trying to match 'apple' in tweet data from Twitter. I want to be able to match it to hashtags too, so a match for 'apple' would be either: 'apple' or '#apple'.

我正试图在Twitter的推特数据中匹配'apple'。我希望能够将它与主题标签相匹配,因此'apple'的匹配将是:'apple'或'#apple'。

Edit: An example tweet might be:

编辑:推文的示例可能是:

"Today I am going to eat an apple"

“今天我要吃一个苹果”

or

要么

"Today I am going to eat an #apple"

“今天我要吃#apple”

I do NOT want to match:

我不想匹配:

"Today I am going to eat lots of apples"

“今天我要吃很多苹果”

I managed to match hashtags using the following \s#([^ ]*), how would I make the hashtag optional?

我设法使用以下\ s#([^] *)匹配主题标签,我如何使主题标签可选?

Eventually I need to create two variations, one for case sensitive and one for case insensitive.

最终我需要创建两个变体,一个用于区分大小写,另一个用于区分大小写。

3 个解决方案

#1


1  

You can make the hash optional by appending a question mark:

您可以通过附加问号使哈希值可选:

\s#?([^ ]*)

#2


2  

To match apple but not apples insert a word boundary at the end:

要匹配苹果而不是苹果,最后插入一个单词边界:

#?apple\b

#3


0  

As the hashtag is optional, you might also need to precede "apple" with a word boundary:

由于主题标签是可选的,您可能还需要在“apple”之前加上单词边界:

#?\bapple\b

#1


1  

You can make the hash optional by appending a question mark:

您可以通过附加问号使哈希值可选:

\s#?([^ ]*)

#2


2  

To match apple but not apples insert a word boundary at the end:

要匹配苹果而不是苹果,最后插入一个单词边界:

#?apple\b

#3


0  

As the hashtag is optional, you might also need to precede "apple" with a word boundary:

由于主题标签是可选的,您可能还需要在“apple”之前加上单词边界:

#?\bapple\b