在Python中从字符串中删除所有非数字字符

时间:2022-12-03 09:37:39

How do we remove all non-numeric characters from a string in Python?

如何从Python中删除字符串中的所有非数字字符?

6 个解决方案

#1


171  

>>> import re
>>> re.sub("[^0-9]", "", "sdkjh987978asd098as0980a98sd")
'987978098098098'

#2


63  

Not sure if this is the most efficient way, but:

不确定这是否是最有效的方法,但是:

>>> ''.join(c for c in "abc123def456" if c.isdigit())
'123456'

The ''.join part means to combine all the resulting characters together without any characters in between. Then the rest of it is a list comprehension, where (as you can probably guess) we only take the parts of the string that match the condition isdigit.

”。join部分是指将所有生成的字符组合在一起,而不包含中间的任何字符。然后它的其余部分是列表理解,其中(正如您可能猜到的那样)我们只取与条件isdigit匹配的字符串的部分。

#3


12  

This should work for strings and unicode objects:

这应该适用于字符串和unicode对象:

# python <3.0
def only_numerics(seq):
    return filter(type(seq).isdigit, seq)

# python ≥3.0
def only_numerics(seq):
    seq_type= type(seq)
    return seq_type().join(filter(seq_type.isdigit, seq))

#4


5  

Fastest approach, if you need to perform more than just one or two such removal operations (or even just one, but on a very long string!-), is to rely on the translate method of strings, even though it does need some prep:

最快的方法,如果你需要执行不止一两个这样的删除操作(甚至一个,但是在一个很长的字符串上!),是依赖字符串的翻译方法,即使它确实需要一些准备:

>>> import string
>>> allchars = ''.join(chr(i) for i in xrange(256))
>>> identity = string.maketrans('', '')
>>> nondigits = allchars.translate(identity, string.digits)
>>> s = 'abc123def456'
>>> s.translate(identity, nondigits)
'123456'

The translate method is different, and maybe a tad simpler simpler to use, on Unicode strings than it is on byte strings, btw:

翻译方法是不同的,在Unicode字符串上使用可能比在字节字符串上使用简单一点,顺便说一句:

>>> unondig = dict.fromkeys(xrange(65536))
>>> for x in string.digits: del unondig[ord(x)]
... 
>>> s = u'abc123def456'
>>> s.translate(unondig)
u'123456'

You might want to use a mapping class rather than an actual dict, especially if your Unicode string may potentially contain characters with very high ord values (that would make the dict excessively large;-). For example:

您可能希望使用映射类而不是实际的dict类型,特别是如果您的Unicode字符串可能包含具有非常高的ord值的字符(这将使dict类型过大;-)。例如:

>>> class keeponly(object):
...   def __init__(self, keep): 
...     self.keep = set(ord(c) for c in keep)
...   def __getitem__(self, key):
...     if key in self.keep:
...       return key
...     return None
... 
>>> s.translate(keeponly(string.digits))
u'123456'
>>> 

#5


4  

Just to add another option to the mix, there are several useful constants within the string module. While more useful in other cases, they can be used here.

为了添加另一个选项,在字符串模块中有几个有用的常量。虽然在其他情况下更有用,但是可以在这里使用。

>>> from string import digits
>>> ''.join(c for c in "abc123def456" if c in digits)
'123456'

There are several constants in the module, including:

模块中有几个常量,包括:

  • ascii_letters (abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ)
  • ascii_letters(abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ)
  • hexdigits (0123456789abcdefABCDEF)
  • hexdigits(0123456789 abcdefabcdef)

If you are using these constants heavily, it can be worthwhile to covert them to a frozenset. That enables O(1) lookups, rather than O(n), where n is the length of the constant for the original strings.

如果您大量使用这些常量,那么将它们转换为一个冻结集是值得的。它支持O(1)查找,而不是O(n),其中n是原始字符串的常数长度。

>>> digits = frozenset(digits)
>>> ''.join(c for c in "abc123def456" if c in digits)
'123456'

#6


-8  

user = (input):
print ("hello")

#1


171  

>>> import re
>>> re.sub("[^0-9]", "", "sdkjh987978asd098as0980a98sd")
'987978098098098'

#2


63  

Not sure if this is the most efficient way, but:

不确定这是否是最有效的方法,但是:

>>> ''.join(c for c in "abc123def456" if c.isdigit())
'123456'

The ''.join part means to combine all the resulting characters together without any characters in between. Then the rest of it is a list comprehension, where (as you can probably guess) we only take the parts of the string that match the condition isdigit.

”。join部分是指将所有生成的字符组合在一起,而不包含中间的任何字符。然后它的其余部分是列表理解,其中(正如您可能猜到的那样)我们只取与条件isdigit匹配的字符串的部分。

#3


12  

This should work for strings and unicode objects:

这应该适用于字符串和unicode对象:

# python <3.0
def only_numerics(seq):
    return filter(type(seq).isdigit, seq)

# python ≥3.0
def only_numerics(seq):
    seq_type= type(seq)
    return seq_type().join(filter(seq_type.isdigit, seq))

#4


5  

Fastest approach, if you need to perform more than just one or two such removal operations (or even just one, but on a very long string!-), is to rely on the translate method of strings, even though it does need some prep:

最快的方法,如果你需要执行不止一两个这样的删除操作(甚至一个,但是在一个很长的字符串上!),是依赖字符串的翻译方法,即使它确实需要一些准备:

>>> import string
>>> allchars = ''.join(chr(i) for i in xrange(256))
>>> identity = string.maketrans('', '')
>>> nondigits = allchars.translate(identity, string.digits)
>>> s = 'abc123def456'
>>> s.translate(identity, nondigits)
'123456'

The translate method is different, and maybe a tad simpler simpler to use, on Unicode strings than it is on byte strings, btw:

翻译方法是不同的,在Unicode字符串上使用可能比在字节字符串上使用简单一点,顺便说一句:

>>> unondig = dict.fromkeys(xrange(65536))
>>> for x in string.digits: del unondig[ord(x)]
... 
>>> s = u'abc123def456'
>>> s.translate(unondig)
u'123456'

You might want to use a mapping class rather than an actual dict, especially if your Unicode string may potentially contain characters with very high ord values (that would make the dict excessively large;-). For example:

您可能希望使用映射类而不是实际的dict类型,特别是如果您的Unicode字符串可能包含具有非常高的ord值的字符(这将使dict类型过大;-)。例如:

>>> class keeponly(object):
...   def __init__(self, keep): 
...     self.keep = set(ord(c) for c in keep)
...   def __getitem__(self, key):
...     if key in self.keep:
...       return key
...     return None
... 
>>> s.translate(keeponly(string.digits))
u'123456'
>>> 

#5


4  

Just to add another option to the mix, there are several useful constants within the string module. While more useful in other cases, they can be used here.

为了添加另一个选项,在字符串模块中有几个有用的常量。虽然在其他情况下更有用,但是可以在这里使用。

>>> from string import digits
>>> ''.join(c for c in "abc123def456" if c in digits)
'123456'

There are several constants in the module, including:

模块中有几个常量,包括:

  • ascii_letters (abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ)
  • ascii_letters(abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ)
  • hexdigits (0123456789abcdefABCDEF)
  • hexdigits(0123456789 abcdefabcdef)

If you are using these constants heavily, it can be worthwhile to covert them to a frozenset. That enables O(1) lookups, rather than O(n), where n is the length of the constant for the original strings.

如果您大量使用这些常量,那么将它们转换为一个冻结集是值得的。它支持O(1)查找,而不是O(n),其中n是原始字符串的常数长度。

>>> digits = frozenset(digits)
>>> ''.join(c for c in "abc123def456" if c in digits)
'123456'

#6


-8  

user = (input):
print ("hello")