I was wondering if it's possible to use regex with python to capture a word, or a part of the word (if it's at the end of the string).
我想知道是否可以使用带有python的正则表达式来捕获单词或单词的一部分(如果它位于字符串的末尾)。
Eg:
target word - potato
string - "this is a sentence about a potato"
string - "this is a sentence about a potat"
string - "this is another sentence about a pota"
例如:目标词 - 马铃薯串 - “这是一个关于马铃薯的句子”字符串 - “这是一个关于potat”字符串的句子 - “这是另一个关于pota的句子”
Thanks!
谢谢!
5 个解决方案
#1
2
import re
def get_matcher(word, minchars):
reg = '|'.join([word[0:i] for i in range(len(word), minchars - 1, -1)])
return re.compile('(%s)$' % (reg))
matcher = get_matcher('potato', 4)
for s in ["this is a sentence about a potato", "this is a sentence about a potat", "this is another sentence about a pota"]:
print matcher.search(s).groups()
OUTPUT
OUTPUT
('potato',)
('potat',)
('pota',)
#2
1
Dont know how to match a regex in python, but the regex would be:
不知道如何匹配python中的正则表达式,但正则表达式将是:
"\bp$|\bpo$|\bpot$|\bpota$|\bpotat$|\bpotato$"
This would match anything from p
to potato
if its the last word in the string, and also for example not something like "foopotato", if this is what you want.
如果它是字符串中的最后一个单词,那么这将匹配从p到马铃薯的任何东西,并且例如不是像“foopotato”这样的东西,如果这是你想要的。
The |
denotes an alternative, the \b
is a "word boundary", so it matches a position (not a character) between a word- and a non-word character. And the $
matches the end of the string (also a position).
|表示替代,\ b是“单词边界”,因此它匹配单词和非单词字符之间的位置(不是字符)。 $匹配字符串的结尾(也是一个位置)。
#3
0
Use the $
to match at the end of a string. For example, the following would match 'potato' only at the end of a string (first example):
使用$匹配字符串的末尾。例如,以下内容仅匹配字符串末尾的“马铃薯”(第一个示例):
"potato$"
This would match all of your examples:
这将匹配您的所有示例:
"pota[to]{1,2}$"
However, some risk of also matching "potao" or "potaot".
然而,一些风险也匹配“potao”或“potaot”。
#4
0
import re
patt = re.compile(r'(p|po|pot|pota|potat|potato)$')
patt.search(string)
I was tempted to use r'po?t?a?t?o?$'
, but that would also match poto or pott.
我很想用r'po?t?a?t?o?$',但这也会与poto或pott相匹配。
#5
0
No, you can't do that with a regex as far as I know, without pointless (p|po|pot ...)
matches which are excessive. Instead, just pick off the last word, and match that using a substring:
不,据我所知,你不能用正则表达式做到这一点,没有毫无意义的(p | po | pot ...)匹配过多。相反,只需选择最后一个单词,然后使用子字符串匹配:
match = re.search('\S+$', haystack)
if match.group(0) == needle[:len(match.group(0))]:
# matches.
#1
2
import re
def get_matcher(word, minchars):
reg = '|'.join([word[0:i] for i in range(len(word), minchars - 1, -1)])
return re.compile('(%s)$' % (reg))
matcher = get_matcher('potato', 4)
for s in ["this is a sentence about a potato", "this is a sentence about a potat", "this is another sentence about a pota"]:
print matcher.search(s).groups()
OUTPUT
OUTPUT
('potato',)
('potat',)
('pota',)
#2
1
Dont know how to match a regex in python, but the regex would be:
不知道如何匹配python中的正则表达式,但正则表达式将是:
"\bp$|\bpo$|\bpot$|\bpota$|\bpotat$|\bpotato$"
This would match anything from p
to potato
if its the last word in the string, and also for example not something like "foopotato", if this is what you want.
如果它是字符串中的最后一个单词,那么这将匹配从p到马铃薯的任何东西,并且例如不是像“foopotato”这样的东西,如果这是你想要的。
The |
denotes an alternative, the \b
is a "word boundary", so it matches a position (not a character) between a word- and a non-word character. And the $
matches the end of the string (also a position).
|表示替代,\ b是“单词边界”,因此它匹配单词和非单词字符之间的位置(不是字符)。 $匹配字符串的结尾(也是一个位置)。
#3
0
Use the $
to match at the end of a string. For example, the following would match 'potato' only at the end of a string (first example):
使用$匹配字符串的末尾。例如,以下内容仅匹配字符串末尾的“马铃薯”(第一个示例):
"potato$"
This would match all of your examples:
这将匹配您的所有示例:
"pota[to]{1,2}$"
However, some risk of also matching "potao" or "potaot".
然而,一些风险也匹配“potao”或“potaot”。
#4
0
import re
patt = re.compile(r'(p|po|pot|pota|potat|potato)$')
patt.search(string)
I was tempted to use r'po?t?a?t?o?$'
, but that would also match poto or pott.
我很想用r'po?t?a?t?o?$',但这也会与poto或pott相匹配。
#5
0
No, you can't do that with a regex as far as I know, without pointless (p|po|pot ...)
matches which are excessive. Instead, just pick off the last word, and match that using a substring:
不,据我所知,你不能用正则表达式做到这一点,没有毫无意义的(p | po | pot ...)匹配过多。相反,只需选择最后一个单词,然后使用子字符串匹配:
match = re.search('\S+$', haystack)
if match.group(0) == needle[:len(match.group(0))]:
# matches.