如何匹配python regexp中的字母数字字符？ [重复]

This question already has an answer here:

这个问题在这里已有答案：

Split Strings with Multiple Delimiters? 29 answers
带有多个分隔符的拆分字符串？ 29个答案

I’d like to get all the words from a text, including unicode characters, not including hyphens or underscores or any other non-alphanumeric characters.

我想从文本中获取所有单词，包括unicode字符，不包括连字符或下划线或任何其他非字母数字字符。

I.e. I want something like this:

即我想要这样的东西：

>>> getWords('John eats apple_pie')
['John', 'eats', 'apple', 'pie']
>>> getWords(u'André eats apple-pie')
[u'André', u'eats', u'apple', u'pie']

With

同

getWords = lambda text: re.compile(r'[A-Za-z0-9]+').findall(text)

it works for the first example, but not the second, and the other way around with this:

它适用于第一个示例，但不适用于第二个示例，反之亦然：

getWords = lambda text: re.compile(r'\w+', re.UNICODE).findall(text)

1 个解决方案

#1

You can use str.isalnum() instead of RegEx in this case:

在这种情况下，您可以使用str.isalnum（）而不是RegEx：

getWords = lambda x: ''.join(i if i.isalnum() else ' ' for i in x).split()

#1

You can use str.isalnum() instead of RegEx in this case:

在这种情况下，您可以使用str.isalnum（）而不是RegEx：

getWords = lambda x: ''.join(i if i.isalnum() else ' ' for i in x).split()

秒客网

如何匹配python regexp中的字母数字字符？ [重复]

1 个解决方案

#1

#1

相关文章