I'm trying to write a simple program that removes all words containing digits from a received string.
我正在尝试编写一个简单的程序,删除包含接收字符串中的数字的所有单词。
Here is my current implementation:
这是我目前的实施:
import re
def checkio(text):
text = text.replace(",", " ").replace(".", " ") .replace("!", " ").replace("?", " ").lower()
counter = 0
words = text.split()
print words
for each in words:
if bool(re.search(r'\d', each)):
words.remove(each)
print words
checkio("1a4 4ad, d89dfsfaj.")
However, when I execute this program, I get the following output:
但是,当我执行此程序时,我得到以下输出:
['1a4', '4ad', 'd89dfsfaj']
['4ad']
I can't figure out why '4ad'
is printed in the second line as it contains digits and should have been removed from the list. Any ideas?
我无法弄清楚为什么'4ad'在第二行打印,因为它包含数字并且应该从列表中删除。有任何想法吗?
4 个解决方案
#1
Assuming that your regular expression does what you want, you can do this to avoid removing while iterating.
假设您的正则表达式符合您的要求,您可以执行此操作以避免在迭代时删除。
import re
def checkio(text):
text = re.sub('[,\.\?\!]', ' ', text).lower()
words = [w for w in text.split() if not re.search(r'\d', w)]
print words ## prints [] in this case
Also, note that I simplified your text = text.replace(...)
line.
另外,请注意我简化了text = text.replace(...)行。
Additionally, if you do not need to reuse your text
variable, you can use regex to split it directly.
此外,如果您不需要重用文本变量,则可以使用正则表达式直接拆分它。
import re
def checkio(text):
words = [w for w in re.split('[,.?!]', text.lower()) if w and not re.search(r'\d', w)]
print words ## prints [] in this case
#2
If you are testing for alpha numeric strings why not use isalnum()
instead of regex ?
如果您正在测试字母数字字符串,为什么不使用isalnum()而不是正则表达式?
In [1695]: x = ['1a4', '4ad', 'd89dfsfaj']
In [1696]: [word for word in x if not word.isalnum()]
Out[1696]: []
#3
This would be possible through using re.sub
, re.search
and list_comprehension
.
这可以通过使用re.sub,re.search和list_comprehension来实现。
>>> import re
>>> def checkio(s):
print([i for i in re.sub(r'[.,!?]', '', s.lower()).split() if not re.search(r'\d', i)])
>>> checkio("1a4 4ad, d89dfsfaj.")
[]
>>> checkio("1a4 ?ad, d89dfsfaj.")
['ad']
#4
So apparently what happens is a concurrent access error. Namely - you are deleting an element while traversing the array.
显然,发生的是并发访问错误。即 - 您在遍历数组时删除元素。
At the first iteration we have words = ['1a4', '4ad', 'd89dfsfaj']. Since '1a4' has a number, we remove it. Now, words = ['4ad','d89dfsfaj']. However, at the second iteration, the current word is now 'd89dfsfaj' and we remove it. What happens is that we skip '4ad', because it is now at index 0 and the current pointer for the for cycle is at 1.
在第一次迭代中,我们有单词= ['1a4','4ad','d89dfsfaj']。由于'1a4'有一个数字,我们将其删除。现在,words = ['4ad','d89dfsfaj']。但是,在第二次迭代中,当前单词现在是'd89dfsfaj',我们将其删除。发生的事情是我们跳过'4ad',因为它现在位于索引0并且for循环的当前指针为1。
#1
Assuming that your regular expression does what you want, you can do this to avoid removing while iterating.
假设您的正则表达式符合您的要求,您可以执行此操作以避免在迭代时删除。
import re
def checkio(text):
text = re.sub('[,\.\?\!]', ' ', text).lower()
words = [w for w in text.split() if not re.search(r'\d', w)]
print words ## prints [] in this case
Also, note that I simplified your text = text.replace(...)
line.
另外,请注意我简化了text = text.replace(...)行。
Additionally, if you do not need to reuse your text
variable, you can use regex to split it directly.
此外,如果您不需要重用文本变量,则可以使用正则表达式直接拆分它。
import re
def checkio(text):
words = [w for w in re.split('[,.?!]', text.lower()) if w and not re.search(r'\d', w)]
print words ## prints [] in this case
#2
If you are testing for alpha numeric strings why not use isalnum()
instead of regex ?
如果您正在测试字母数字字符串,为什么不使用isalnum()而不是正则表达式?
In [1695]: x = ['1a4', '4ad', 'd89dfsfaj']
In [1696]: [word for word in x if not word.isalnum()]
Out[1696]: []
#3
This would be possible through using re.sub
, re.search
and list_comprehension
.
这可以通过使用re.sub,re.search和list_comprehension来实现。
>>> import re
>>> def checkio(s):
print([i for i in re.sub(r'[.,!?]', '', s.lower()).split() if not re.search(r'\d', i)])
>>> checkio("1a4 4ad, d89dfsfaj.")
[]
>>> checkio("1a4 ?ad, d89dfsfaj.")
['ad']
#4
So apparently what happens is a concurrent access error. Namely - you are deleting an element while traversing the array.
显然,发生的是并发访问错误。即 - 您在遍历数组时删除元素。
At the first iteration we have words = ['1a4', '4ad', 'd89dfsfaj']. Since '1a4' has a number, we remove it. Now, words = ['4ad','d89dfsfaj']. However, at the second iteration, the current word is now 'd89dfsfaj' and we remove it. What happens is that we skip '4ad', because it is now at index 0 and the current pointer for the for cycle is at 1.
在第一次迭代中,我们有单词= ['1a4','4ad','d89dfsfaj']。由于'1a4'有一个数字,我们将其删除。现在,words = ['4ad','d89dfsfaj']。但是,在第二次迭代中,当前单词现在是'd89dfsfaj',我们将其删除。发生的事情是我们跳过'4ad',因为它现在位于索引0并且for循环的当前指针为1。