Python:从字符串中删除除数字之外的字符

时间:2022-01-22 17:06:45

How can i remove all characters except numbers from string?

如何从字符串中删除除数字之外的所有字符?

12 个解决方案

#1


98  

In Python 2.*, by far the fastest approach is the .translate method:

在Python 2. *中,到目前为止最快的方法是.translate方法:

>>> x='aaa12333bb445bb54b5b52'
>>> import string
>>> all=string.maketrans('','')
>>> nodigs=all.translate(all, string.digits)
>>> x.translate(all, nodigs)
'1233344554552'
>>> 

string.maketrans makes a translation table (a string of length 256) which in this case is the same as ''.join(chr(x) for x in range(256)) (just faster to make;-). .translate applies the translation table (which here is irrelevant since all essentially means identity) AND deletes characters present in the second argument -- the key part.

string.maketrans创建一个转换表(长度为256的字符串),在这种情况下与''.join(chr(x)for x in range(256))相同(只需要更快;-)。 .translate应用转换表(这里不相关,因为所有本质上都意味着身份)并删除第二个参数中存在的字符 - 关键部分。

.translate works very differently on Unicode strings (and strings in Python 3 -- I do wish questions specified which major-release of Python is of interest!) -- not quite this simple, not quite this fast, though still quite usable.

.translate在Unicode字符串上的作用非常不同(以及Python 3中的字符串 - 我确实希望问题指出哪些主要版本的Python很有用!) - 不是很简单,不是很快,但仍然非常有用。

Back to 2.*, the performance difference is impressive...:

回到2. *,性能差异令人印象深刻......:

$ python -mtimeit -s'import string; all=string.maketrans("", ""); nodig=all.translate(all, string.digits); x="aaa12333bb445bb54b5b52"' 'x.translate(all, nodig)'
1000000 loops, best of 3: 1.04 usec per loop
$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 7.9 usec per loop

Speeding things up by 7-8 times is hardly peanuts, so the translate method is well worth knowing and using. The other popular non-RE approach...:

加速7到8倍的东西几乎不是花生,所以翻译方法非常值得了解和使用。另一种流行的非RE方法......:

$ python -mtimeit -s'x="aaa12333bb445bb54b5b52"' '"".join(i for i in x if i.isdigit())'
100000 loops, best of 3: 11.5 usec per loop

is 50% slower than RE, so the .translate approach beats it by over an order of magnitude.

比RE慢50%,因此.translate方法超过一个数量级。

In Python 3, or for Unicode, you need to pass .translate a mapping (with ordinals, not characters directly, as keys) that returns None for what you want to delete. Here's a convenient way to express this for deletion of "everything but" a few characters:

在Python 3或Unicode中,您需要传递.translate映射(使用序数,而不是直接作为键的字符),它为您要删除的内容返回None。这是一种方便的方式来表达这个删除“除了”几个字符之外的所有内容:

import string

class Del:
  def __init__(self, keep=string.digits):
    self.comp = dict((ord(c),c) for c in keep)
  def __getitem__(self, k):
    return self.comp.get(k)

DD = Del()

x='aaa12333bb445bb54b5b52'
x.translate(DD)

also emits '1233344554552'. However, putting this in xx.py we have...:

也发出'1233344554552'。但是,把它放在xx.py中我们有......:

$ python3.1 -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 8.43 usec per loop
$ python3.1 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
10000 loops, best of 3: 24.3 usec per loop

...which shows the performance advantage disappears, for this kind of "deletion" tasks, and becomes a performance decrease.

...显示性能优势消失,对于这种“删除”任务,并成为性能下降。

#2


145  

Use re.sub, like so:

使用re.sub,如下所示:

>>> import re
>>> re.sub("\D", "", "aas30dsa20")
'3020'

\D matches any non-digit character so, the code above, is essentially replacing every non-digit character for the empty string.

\ D匹配任何非数字字符,因此,上面的代码实际上取代了空字符串的每个非数字字符。

Or you can use filter, like so (in Python 2k):

或者你可以像这样使用过滤器(在Python 2k中):

>>> filter(lambda x: x.isdigit(), "aas30dsa20")
'3020'

Since in Python 3k, filter returns an iterator instead of a list, you can use the following instead:

因为在Python 3k中,filter返回迭代器而不是列表,所以您可以使用以下代码:

>>> ''.join(filter(lambda x: x.isdigit(), "aas30dsa20"))
'3020'

#3


53  

s=''.join(i for i in s if i.isdigit())

Another generator variant.

另一种发电机型号

#4


16  

You can use filter:

你可以使用过滤器:

filter(lambda x: x.isdigit(), "dasdasd2313dsa")

On python3.0 you have to join this (kinda ugly :( )

在python3.0你必须加入这个(有点难看:()

''.join(filter(lambda x: x.isdigit(), "dasdasd2313dsa"))

#5


10  

along the lines of bayer's answer:

沿着拜耳的回答:

''.join(i for i in s if i.isdigit())

#6


7  

x.translate(None, string.digits)

will delete all digits from string. To delete letters and keep the digits, do this:

将删除字符串中的所有数字。要删除字母并保留数字,请执行以下操作:

x.translate(None, string.letters)

#7


7  

You can easily do it using Regex

您可以使用Regex轻松完成

>>> import re
>>> re.sub("\D","","£70,000")
70000

#8


5  

The op mentions in the comments that he wants to keep the decimal place. This can be done with the re.sub method (as per the second and IMHO best answer) by explicitly listing the characters to keep e.g.

评论中提到他想保留小数位。这可以通过re.sub方法(根据第二和IMHO最佳答案)通过明确列出要保持的字符来完成。

>>> re.sub("[^0123456789\.]","","poo123.4and5fish")
'123.45'

#9


4  

A fast version for Python 3:

Python 3的快速版本:

# xx3.py
from collections import defaultdict
import string
_NoneType = type(None)

def keeper(keep):
    table = defaultdict(_NoneType)
    table.update({ord(c): c for c in keep})
    return table

digit_keeper = keeper(string.digits)

Here's a performance comparison vs. regex:

这是与正则表达式的性能比较:

$ python3.3 -mtimeit -s'import xx3; x="aaa12333bb445bb54b5b52"' 'x.translate(xx3.digit_keeper)'
1000000 loops, best of 3: 1.02 usec per loop
$ python3.3 -mtimeit -s'import re; r = re.compile(r"\D"); x="aaa12333bb445bb54b5b52"' 'r.sub("", x)'
100000 loops, best of 3: 3.43 usec per loop

So it's a little bit more than 3 times faster than regex, for me. It's also faster than class Del above, because defaultdict does all its lookups in C, rather than (slow) Python. Here's that version on my same system, for comparison.

对我来说,它比正则表达式快3倍多。它也比上面的Del类快,因为defaultdict在C中执行所有查找,而不是(慢)Python。这是我在同一系统上的那个版本,用于比较。

$ python3.3 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
100000 loops, best of 3: 13.6 usec per loop

#10


2  

Ugly but works:

丑陋但有效:

>>> s
'aaa12333bb445bb54b5b52'
>>> a = ''.join(filter(lambda x : x.isdigit(), s))
>>> a
'1233344554552'
>>>

#11


1  

Use a generator expression:

使用生成器表达式:

>>> s = "foo200bar"
>>> new_s = "".join(i for i in s if i in "0123456789")

#12


0  

Not a one liner but very simple:

不是一个班轮,但很简单:

buffer = ""
some_str = "aas30dsa20"

for char in some_str:
    if not char.isdigit():
        buffer += char

print( buffer )

#1


98  

In Python 2.*, by far the fastest approach is the .translate method:

在Python 2. *中,到目前为止最快的方法是.translate方法:

>>> x='aaa12333bb445bb54b5b52'
>>> import string
>>> all=string.maketrans('','')
>>> nodigs=all.translate(all, string.digits)
>>> x.translate(all, nodigs)
'1233344554552'
>>> 

string.maketrans makes a translation table (a string of length 256) which in this case is the same as ''.join(chr(x) for x in range(256)) (just faster to make;-). .translate applies the translation table (which here is irrelevant since all essentially means identity) AND deletes characters present in the second argument -- the key part.

string.maketrans创建一个转换表(长度为256的字符串),在这种情况下与''.join(chr(x)for x in range(256))相同(只需要更快;-)。 .translate应用转换表(这里不相关,因为所有本质上都意味着身份)并删除第二个参数中存在的字符 - 关键部分。

.translate works very differently on Unicode strings (and strings in Python 3 -- I do wish questions specified which major-release of Python is of interest!) -- not quite this simple, not quite this fast, though still quite usable.

.translate在Unicode字符串上的作用非常不同(以及Python 3中的字符串 - 我确实希望问题指出哪些主要版本的Python很有用!) - 不是很简单,不是很快,但仍然非常有用。

Back to 2.*, the performance difference is impressive...:

回到2. *,性能差异令人印象深刻......:

$ python -mtimeit -s'import string; all=string.maketrans("", ""); nodig=all.translate(all, string.digits); x="aaa12333bb445bb54b5b52"' 'x.translate(all, nodig)'
1000000 loops, best of 3: 1.04 usec per loop
$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 7.9 usec per loop

Speeding things up by 7-8 times is hardly peanuts, so the translate method is well worth knowing and using. The other popular non-RE approach...:

加速7到8倍的东西几乎不是花生,所以翻译方法非常值得了解和使用。另一种流行的非RE方法......:

$ python -mtimeit -s'x="aaa12333bb445bb54b5b52"' '"".join(i for i in x if i.isdigit())'
100000 loops, best of 3: 11.5 usec per loop

is 50% slower than RE, so the .translate approach beats it by over an order of magnitude.

比RE慢50%,因此.translate方法超过一个数量级。

In Python 3, or for Unicode, you need to pass .translate a mapping (with ordinals, not characters directly, as keys) that returns None for what you want to delete. Here's a convenient way to express this for deletion of "everything but" a few characters:

在Python 3或Unicode中,您需要传递.translate映射(使用序数,而不是直接作为键的字符),它为您要删除的内容返回None。这是一种方便的方式来表达这个删除“除了”几个字符之外的所有内容:

import string

class Del:
  def __init__(self, keep=string.digits):
    self.comp = dict((ord(c),c) for c in keep)
  def __getitem__(self, k):
    return self.comp.get(k)

DD = Del()

x='aaa12333bb445bb54b5b52'
x.translate(DD)

also emits '1233344554552'. However, putting this in xx.py we have...:

也发出'1233344554552'。但是,把它放在xx.py中我们有......:

$ python3.1 -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 8.43 usec per loop
$ python3.1 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
10000 loops, best of 3: 24.3 usec per loop

...which shows the performance advantage disappears, for this kind of "deletion" tasks, and becomes a performance decrease.

...显示性能优势消失,对于这种“删除”任务,并成为性能下降。

#2


145  

Use re.sub, like so:

使用re.sub,如下所示:

>>> import re
>>> re.sub("\D", "", "aas30dsa20")
'3020'

\D matches any non-digit character so, the code above, is essentially replacing every non-digit character for the empty string.

\ D匹配任何非数字字符,因此,上面的代码实际上取代了空字符串的每个非数字字符。

Or you can use filter, like so (in Python 2k):

或者你可以像这样使用过滤器(在Python 2k中):

>>> filter(lambda x: x.isdigit(), "aas30dsa20")
'3020'

Since in Python 3k, filter returns an iterator instead of a list, you can use the following instead:

因为在Python 3k中,filter返回迭代器而不是列表,所以您可以使用以下代码:

>>> ''.join(filter(lambda x: x.isdigit(), "aas30dsa20"))
'3020'

#3


53  

s=''.join(i for i in s if i.isdigit())

Another generator variant.

另一种发电机型号

#4


16  

You can use filter:

你可以使用过滤器:

filter(lambda x: x.isdigit(), "dasdasd2313dsa")

On python3.0 you have to join this (kinda ugly :( )

在python3.0你必须加入这个(有点难看:()

''.join(filter(lambda x: x.isdigit(), "dasdasd2313dsa"))

#5


10  

along the lines of bayer's answer:

沿着拜耳的回答:

''.join(i for i in s if i.isdigit())

#6


7  

x.translate(None, string.digits)

will delete all digits from string. To delete letters and keep the digits, do this:

将删除字符串中的所有数字。要删除字母并保留数字,请执行以下操作:

x.translate(None, string.letters)

#7


7  

You can easily do it using Regex

您可以使用Regex轻松完成

>>> import re
>>> re.sub("\D","","£70,000")
70000

#8


5  

The op mentions in the comments that he wants to keep the decimal place. This can be done with the re.sub method (as per the second and IMHO best answer) by explicitly listing the characters to keep e.g.

评论中提到他想保留小数位。这可以通过re.sub方法(根据第二和IMHO最佳答案)通过明确列出要保持的字符来完成。

>>> re.sub("[^0123456789\.]","","poo123.4and5fish")
'123.45'

#9


4  

A fast version for Python 3:

Python 3的快速版本:

# xx3.py
from collections import defaultdict
import string
_NoneType = type(None)

def keeper(keep):
    table = defaultdict(_NoneType)
    table.update({ord(c): c for c in keep})
    return table

digit_keeper = keeper(string.digits)

Here's a performance comparison vs. regex:

这是与正则表达式的性能比较:

$ python3.3 -mtimeit -s'import xx3; x="aaa12333bb445bb54b5b52"' 'x.translate(xx3.digit_keeper)'
1000000 loops, best of 3: 1.02 usec per loop
$ python3.3 -mtimeit -s'import re; r = re.compile(r"\D"); x="aaa12333bb445bb54b5b52"' 'r.sub("", x)'
100000 loops, best of 3: 3.43 usec per loop

So it's a little bit more than 3 times faster than regex, for me. It's also faster than class Del above, because defaultdict does all its lookups in C, rather than (slow) Python. Here's that version on my same system, for comparison.

对我来说,它比正则表达式快3倍多。它也比上面的Del类快,因为defaultdict在C中执行所有查找,而不是(慢)Python。这是我在同一系统上的那个版本,用于比较。

$ python3.3 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
100000 loops, best of 3: 13.6 usec per loop

#10


2  

Ugly but works:

丑陋但有效:

>>> s
'aaa12333bb445bb54b5b52'
>>> a = ''.join(filter(lambda x : x.isdigit(), s))
>>> a
'1233344554552'
>>>

#11


1  

Use a generator expression:

使用生成器表达式:

>>> s = "foo200bar"
>>> new_s = "".join(i for i in s if i in "0123456789")

#12


0  

Not a one liner but very simple:

不是一个班轮,但很简单:

buffer = ""
some_str = "aas30dsa20"

for char in some_str:
    if not char.isdigit():
        buffer += char

print( buffer )