I'm trying to match 'apple' in tweet data from Twitter. I want to be able to match it to hashtags too, so a match for 'apple' would be either: 'apple' or '#apple'.
我正试图在Twitter的推特数据中匹配'apple'。我希望能够将它与主题标签相匹配,因此'apple'的匹配将是:'apple'或'#apple'。
Edit: An example tweet might be:
编辑:推文的示例可能是:
"Today I am going to eat an apple"
“今天我要吃一个苹果”
or
要么
"Today I am going to eat an #apple"
“今天我要吃#apple”
I do NOT want to match:
我不想匹配:
"Today I am going to eat lots of apples"
“今天我要吃很多苹果”
I managed to match hashtags using the following \s#([^ ]*)
, how would I make the hashtag optional?
我设法使用以下\ s#([^] *)匹配主题标签,我如何使主题标签可选?
Eventually I need to create two variations, one for case sensitive and one for case insensitive.
最终我需要创建两个变体,一个用于区分大小写,另一个用于区分大小写。
3 个解决方案
#1
1
You can make the hash optional by appending a question mark:
您可以通过附加问号使哈希值可选:
\s#?([^ ]*)
#2
2
To match apple
but not apples
insert a word boundary at the end:
要匹配苹果而不是苹果,最后插入一个单词边界:
#?apple\b
#3
0
As the hashtag is optional, you might also need to precede "apple" with a word boundary:
由于主题标签是可选的,您可能还需要在“apple”之前加上单词边界:
#?\bapple\b
#1
1
You can make the hash optional by appending a question mark:
您可以通过附加问号使哈希值可选:
\s#?([^ ]*)
#2
2
To match apple
but not apples
insert a word boundary at the end:
要匹配苹果而不是苹果,最后插入一个单词边界:
#?apple\b
#3
0
As the hashtag is optional, you might also need to precede "apple" with a word boundary:
由于主题标签是可选的,您可能还需要在“apple”之前加上单词边界:
#?\bapple\b