I want to remove characters in a string in python:
我想删除python中的字符串中的字符:
string.replace(',', '').replace("!", '').replace(":", '').replace(";", '')...
But I have many characters I have to remove. I thought about a list
但是我有很多字符需要删除。我想到了一个清单。
list = [',', '!', '.', ';'...]
But how can I use the list
to replace the characters in the string
?
但是如何使用列表来替换字符串中的字符呢?
16 个解决方案
#1
246
If you're using python2 and your inputs are strings (not unicodes), the absolutely best method is str.translate
:
如果你使用的是python2,而你的输入是字符串(不是unicodes),那么最好的方法就是str.translate。
>>> chars_to_remove = ['.', '!', '?']>>> subj = 'A.B!C?'>>> subj.translate(None, ''.join(chars_to_remove))'ABC'
Otherwise, there are following options to consider:
否则,可考虑下列选择:
A. Iterate the subject char by char, omit unwanted characters and join
the resulting list:
A.按字符迭代主题字符,省略不需要的字符并加入结果列表:
>>> sc = set(chars_to_remove)>>> ''.join([c for c in subj if c not in sc])'ABC'
(Note that the generator version ''.join(c for c ...)
will be less efficient).
(注意生成器版本)。join(c for c…)会降低效率。
B. Create a regular expression on the fly and re.sub
with an empty string:
B.动态创建正则表达式,re.sub空字符串:
>>> import re>>> rx = '[' + re.escape(''.join(chars_to_remove)) + ']'>>> re.sub(rx, '', subj)'ABC'
(re.escape
ensures that characters like ^
or ]
won't break the regular expression).
(re.escape确保字符^或者]不会打破的正则表达式)。
C. Use the mapping variant of translate
:
C.使用翻译的映射变体:
>>> chars_to_remove = [u'δ', u'Γ', u'ж']>>> subj = u'AжBδCΓ'>>> dd = {ord(c):None for c in chars_to_remove}>>> subj.translate(dd)u'ABC'
Full testing code and timings:
完整的测试代码和时间:
#coding=utf8import redef remove_chars_iter(subj, chars): sc = set(chars) return ''.join([c for c in subj if c not in sc])def remove_chars_re(subj, chars): return re.sub('[' + re.escape(''.join(chars)) + ']', '', subj)def remove_chars_re_unicode(subj, chars): return re.sub(u'(?u)[' + re.escape(''.join(chars)) + ']', '', subj)def remove_chars_translate_bytes(subj, chars): return subj.translate(None, ''.join(chars))def remove_chars_translate_unicode(subj, chars): d = {ord(c):None for c in chars} return subj.translate(d)import timeit, sysdef profile(f): assert f(subj, chars_to_remove) == test t = timeit.timeit(lambda: f(subj, chars_to_remove), number=1000) print ('{0:.3f} {1}'.format(t, f.__name__))print (sys.version)PYTHON2 = sys.version_info[0] == 2print ('\n"plain" string:\n')chars_to_remove = ['.', '!', '?']subj = 'A.B!C?' * 1000test = 'ABC' * 1000profile(remove_chars_iter)profile(remove_chars_re)if PYTHON2: profile(remove_chars_translate_bytes)else: profile(remove_chars_translate_unicode)print ('\nunicode string:\n')if PYTHON2: chars_to_remove = [u'δ', u'Γ', u'ж'] subj = u'AжBδCΓ'else: chars_to_remove = ['δ', 'Γ', 'ж'] subj = 'AжBδCΓ'subj = subj * 1000test = 'ABC' * 1000profile(remove_chars_iter)if PYTHON2: profile(remove_chars_re_unicode)else: profile(remove_chars_re)profile(remove_chars_translate_unicode)
Results:
结果:
2.7.5 (default, Mar 9 2014, 22:15:05) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]"plain" string:0.637 remove_chars_iter0.649 remove_chars_re0.010 remove_chars_translate_bytesunicode string:0.866 remove_chars_iter0.680 remove_chars_re_unicode1.373 remove_chars_translate_unicode---3.4.2 (v3.4.2:ab2c023a9432, Oct 5 2014, 20:42:22) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]"plain" string:0.512 remove_chars_iter0.574 remove_chars_re0.765 remove_chars_translate_unicodeunicode string:0.817 remove_chars_iter0.686 remove_chars_re0.876 remove_chars_translate_unicode
(As a side note, the figure for remove_chars_translate_bytes
might give us a clue why the industry was reluctant to adopt Unicode for such a long time).
(顺便说一句,remove_chars_translate_bytes的图片可能会告诉我们,为什么业界这么长时间不愿意采用Unicode)。
#2
106
You can use str.translate()
:
您可以使用str.translate():
s.translate(None, ",!.;")
Example:
例子:
>>> s = "asjo,fdjk;djaso,oio!kod.kjods;dkps">>> s.translate(None, ",!.;")'asjofdjkdjasooiokodkjodsdkps'
#4
14
''.join(c for c in myString if not c in badTokens)
#5
8
Another approach using regex:
另一种方法使用正则表达式:
''.join(re.split(r'[.;!?,]', s))
#6
7
If you are using python3 and looking for the translate
solution - the function was changed and now takes 1 parameter instead of 2.
如果您正在使用python3并且寻找翻译解决方案——函数被改变了,现在取一个参数而不是2。
That parameter is a table (can be dictionary) where each key is the Unicode ordinal (int) of the character to find and the value is the replacement (can be either a Unicode ordinal or a string to map the key to).
该参数是一个表(可以是字典),其中每个键是要查找的字符的Unicode序号(int),而值是替换的(可以是Unicode的序号,也可以是映射键的字符串)。
Here is a usage example:
下面是一个使用示例:
>>> list = [',', '!', '.', ';']>>> s = "This is, my! str,ing.">>> s.translate({ord(x): '' for x in list})'This is my string'
#7
6
you could use something like this
你可以用这样的东西。
def replace_all(text, dic): for i, j in dic.iteritems(): text = text.replace(i, j) return text
This code is not my own and comes from here its a great article and dicusses in depth doing this
这段代码不是我自己的,从这里开始,它是一篇很好的文章,并且在深度上做了这个。
#8
5
Why not a simple loop?
为什么不做一个简单的循环呢?
for i in replace_list: string = string.replace(i, '')
Also, avoid naming lists 'list'. It overrides the built-in function list
.
此外,避免命名列表“列表”。它覆盖内置函数列表。
#9
3
Also an interesting topic on removal UTF-8 accent form a string converting char to their standard non-accentuated char:
另一个有趣的话题是去除UTF-8重音,将字符转换为标准的非加重字符:
What is the best way to remove accents in a python unicode string?
删除python unicode字符串中的重音的最好方法是什么?
code extract from the topic:
主题代码摘录:
import unicodedatadef remove_accents(input_str): nkfd_form = unicodedata.normalize('NFKD', input_str) return u"".join([c for c in nkfd_form if not unicodedata.combining(c)])
#10
3
Perhaps a more modern and functional way to achieve what you wish:
也许是一种更现代、更实用的方式来实现你的愿望:
>>> subj = 'A.B!C?'>>> list = set([',', '!', '.', ';', '?'])>>> filter(lambda x: x not in list, subj)'ABC'
please note that for this particular purpose it's quite an overkill, but once you need more complex conditions, filter comes handy
请注意,对于这个特殊的用途来说,它是一个相当大的优势,但是一旦您需要更复杂的条件,过滤器就会派上用场
#11
2
simple way,
简单的方法,
import restr = 'this is string ! >><< (foo---> bar) @-tuna-# sandwich-%-is-$-* good'// condense multiple empty spaces into 1str = ' '.join(str.split()// replace empty space with dashstr = str.replace(" ","-")// take out any char that matches regexstr = re.sub('[!@#$%^&*()_+<>]', '', str)
output:
输出:
this-is-string--foo----bar--tuna---sandwich--is---good
这是细绳——foo - - - - - bar金枪鱼三明治——是,好的
#12
1
These days I am diving into scheme, and now I think am good at recursing and eval. HAHAHA. Just share some new ways:
这些天我正在深入研究scheme,现在我认为我擅长递归和eval。哈哈哈。分享一些新的方法:
first ,eval it
首先,eval它
print eval('string%s' % (''.join(['.replace("%s","")'%i for i in replace_list])))
second , recurse it
第二,递归它
def repn(string,replace_list): if replace_list==[]: return string else: return repn(string.replace(replace_list.pop(),""),replace_list)print repn(string,replace_list)
Hey ,don't downvote. I am just want to share some new idea.
嘿,不要downvote。我只是想分享一些新的想法。
#13
1
How about this - a one liner.
这个怎么样——一个衬垫。
reduce(lambda x,y : x.replace(y,"") ,[',', '!', '.', ';'],";Test , , !Stri!ng ..")
#14
1
i think this is simple enough and will do!
我认为这很简单,可以做到!
list = [",",",","!",";",":"] #the list goes on.....theString = "dlkaj;lkdjf'adklfaj;lsd'fa'dfj;alkdjf" #is an example string;newString="" #the unwanted character free stringfor i in range(len(TheString)): if theString[i] in list: newString += "" #concatenate an empty string. else: newString += theString[i]
this is one way to do it. But if you are tired of keeping a list of characters that you want to remove, you can actually do it by using the order number of the strings you iterate through. the order number is the ascii value of that character. the ascii number for 0 as a char is 48 and the ascii number for lower case z is 122 so:
这是一种方法。但是,如果您厌倦了保留想要删除的字符列表,实际上可以通过使用迭代的字符串的序号来实现。订单号是该字符的ascii值。0作为字符的ascii码是48,小写z的ascii码是122,所以:
theString = "lkdsjf;alkd8a'asdjf;lkaheoialkdjf;ad"newString = ""for i in range(len(theString)): if ord(theString[i]) < 48 or ord(theString[i]) > 122: #ord() => ascii num. newString += "" else: newString += theString[i]
#15
0
I am thinking about a solution for this. First I would make the string input as a list. Then I would replace the items of list. Then through using join command, I will return list as a string. The code can be like this:
我正在考虑解决这个问题的办法。首先,我将字符串输入作为一个列表。然后我将替换列表中的项目。然后通过使用join命令,我将返回list作为字符串。代码可以是这样的:
def the_replacer(text): test = [] for m in range(len(text)): test.append(text[m]) if test[m]==','\ or test[m]=='!'\ or test[m]=='.'\ or test[m]=='\''\ or test[m]==';': #.... test[n]='' return ''.join(test)
This would remove anything from the string. What do you think about that?
这将从字符串中删除任何内容。你怎么看?
#16
0
Here is a more_itertools
approach:
这里有一个more_itertools的方法:
import more_itertools as mits = "A.B!C?D_E@F#"blacklist = ".!?_@#""".join(mit.flatten(mit.split_at(s, pred=lambda x: x in set(blacklist))))# 'ABCDEF'
Here we split upon items found in the blacklist
, flatten the results and join the string.
在这里,我们将在黑名单中找到的项目分开,将结果变平并加入字符串。
#1
246
If you're using python2 and your inputs are strings (not unicodes), the absolutely best method is str.translate
:
如果你使用的是python2,而你的输入是字符串(不是unicodes),那么最好的方法就是str.translate。
>>> chars_to_remove = ['.', '!', '?']>>> subj = 'A.B!C?'>>> subj.translate(None, ''.join(chars_to_remove))'ABC'
Otherwise, there are following options to consider:
否则,可考虑下列选择:
A. Iterate the subject char by char, omit unwanted characters and join
the resulting list:
A.按字符迭代主题字符,省略不需要的字符并加入结果列表:
>>> sc = set(chars_to_remove)>>> ''.join([c for c in subj if c not in sc])'ABC'
(Note that the generator version ''.join(c for c ...)
will be less efficient).
(注意生成器版本)。join(c for c…)会降低效率。
B. Create a regular expression on the fly and re.sub
with an empty string:
B.动态创建正则表达式,re.sub空字符串:
>>> import re>>> rx = '[' + re.escape(''.join(chars_to_remove)) + ']'>>> re.sub(rx, '', subj)'ABC'
(re.escape
ensures that characters like ^
or ]
won't break the regular expression).
(re.escape确保字符^或者]不会打破的正则表达式)。
C. Use the mapping variant of translate
:
C.使用翻译的映射变体:
>>> chars_to_remove = [u'δ', u'Γ', u'ж']>>> subj = u'AжBδCΓ'>>> dd = {ord(c):None for c in chars_to_remove}>>> subj.translate(dd)u'ABC'
Full testing code and timings:
完整的测试代码和时间:
#coding=utf8import redef remove_chars_iter(subj, chars): sc = set(chars) return ''.join([c for c in subj if c not in sc])def remove_chars_re(subj, chars): return re.sub('[' + re.escape(''.join(chars)) + ']', '', subj)def remove_chars_re_unicode(subj, chars): return re.sub(u'(?u)[' + re.escape(''.join(chars)) + ']', '', subj)def remove_chars_translate_bytes(subj, chars): return subj.translate(None, ''.join(chars))def remove_chars_translate_unicode(subj, chars): d = {ord(c):None for c in chars} return subj.translate(d)import timeit, sysdef profile(f): assert f(subj, chars_to_remove) == test t = timeit.timeit(lambda: f(subj, chars_to_remove), number=1000) print ('{0:.3f} {1}'.format(t, f.__name__))print (sys.version)PYTHON2 = sys.version_info[0] == 2print ('\n"plain" string:\n')chars_to_remove = ['.', '!', '?']subj = 'A.B!C?' * 1000test = 'ABC' * 1000profile(remove_chars_iter)profile(remove_chars_re)if PYTHON2: profile(remove_chars_translate_bytes)else: profile(remove_chars_translate_unicode)print ('\nunicode string:\n')if PYTHON2: chars_to_remove = [u'δ', u'Γ', u'ж'] subj = u'AжBδCΓ'else: chars_to_remove = ['δ', 'Γ', 'ж'] subj = 'AжBδCΓ'subj = subj * 1000test = 'ABC' * 1000profile(remove_chars_iter)if PYTHON2: profile(remove_chars_re_unicode)else: profile(remove_chars_re)profile(remove_chars_translate_unicode)
Results:
结果:
2.7.5 (default, Mar 9 2014, 22:15:05) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]"plain" string:0.637 remove_chars_iter0.649 remove_chars_re0.010 remove_chars_translate_bytesunicode string:0.866 remove_chars_iter0.680 remove_chars_re_unicode1.373 remove_chars_translate_unicode---3.4.2 (v3.4.2:ab2c023a9432, Oct 5 2014, 20:42:22) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]"plain" string:0.512 remove_chars_iter0.574 remove_chars_re0.765 remove_chars_translate_unicodeunicode string:0.817 remove_chars_iter0.686 remove_chars_re0.876 remove_chars_translate_unicode
(As a side note, the figure for remove_chars_translate_bytes
might give us a clue why the industry was reluctant to adopt Unicode for such a long time).
(顺便说一句,remove_chars_translate_bytes的图片可能会告诉我们,为什么业界这么长时间不愿意采用Unicode)。
#2
106
You can use str.translate()
:
您可以使用str.translate():
s.translate(None, ",!.;")
Example:
例子:
>>> s = "asjo,fdjk;djaso,oio!kod.kjods;dkps">>> s.translate(None, ",!.;")'asjofdjkdjasooiokodkjodsdkps'
#3
#4
14
''.join(c for c in myString if not c in badTokens)
#5
8
Another approach using regex:
另一种方法使用正则表达式:
''.join(re.split(r'[.;!?,]', s))
#6
7
If you are using python3 and looking for the translate
solution - the function was changed and now takes 1 parameter instead of 2.
如果您正在使用python3并且寻找翻译解决方案——函数被改变了,现在取一个参数而不是2。
That parameter is a table (can be dictionary) where each key is the Unicode ordinal (int) of the character to find and the value is the replacement (can be either a Unicode ordinal or a string to map the key to).
该参数是一个表(可以是字典),其中每个键是要查找的字符的Unicode序号(int),而值是替换的(可以是Unicode的序号,也可以是映射键的字符串)。
Here is a usage example:
下面是一个使用示例:
>>> list = [',', '!', '.', ';']>>> s = "This is, my! str,ing.">>> s.translate({ord(x): '' for x in list})'This is my string'
#7
6
you could use something like this
你可以用这样的东西。
def replace_all(text, dic): for i, j in dic.iteritems(): text = text.replace(i, j) return text
This code is not my own and comes from here its a great article and dicusses in depth doing this
这段代码不是我自己的,从这里开始,它是一篇很好的文章,并且在深度上做了这个。
#8
5
Why not a simple loop?
为什么不做一个简单的循环呢?
for i in replace_list: string = string.replace(i, '')
Also, avoid naming lists 'list'. It overrides the built-in function list
.
此外,避免命名列表“列表”。它覆盖内置函数列表。
#9
3
Also an interesting topic on removal UTF-8 accent form a string converting char to their standard non-accentuated char:
另一个有趣的话题是去除UTF-8重音,将字符转换为标准的非加重字符:
What is the best way to remove accents in a python unicode string?
删除python unicode字符串中的重音的最好方法是什么?
code extract from the topic:
主题代码摘录:
import unicodedatadef remove_accents(input_str): nkfd_form = unicodedata.normalize('NFKD', input_str) return u"".join([c for c in nkfd_form if not unicodedata.combining(c)])
#10
3
Perhaps a more modern and functional way to achieve what you wish:
也许是一种更现代、更实用的方式来实现你的愿望:
>>> subj = 'A.B!C?'>>> list = set([',', '!', '.', ';', '?'])>>> filter(lambda x: x not in list, subj)'ABC'
please note that for this particular purpose it's quite an overkill, but once you need more complex conditions, filter comes handy
请注意,对于这个特殊的用途来说,它是一个相当大的优势,但是一旦您需要更复杂的条件,过滤器就会派上用场
#11
2
simple way,
简单的方法,
import restr = 'this is string ! >><< (foo---> bar) @-tuna-# sandwich-%-is-$-* good'// condense multiple empty spaces into 1str = ' '.join(str.split()// replace empty space with dashstr = str.replace(" ","-")// take out any char that matches regexstr = re.sub('[!@#$%^&*()_+<>]', '', str)
output:
输出:
this-is-string--foo----bar--tuna---sandwich--is---good
这是细绳——foo - - - - - bar金枪鱼三明治——是,好的
#12
1
These days I am diving into scheme, and now I think am good at recursing and eval. HAHAHA. Just share some new ways:
这些天我正在深入研究scheme,现在我认为我擅长递归和eval。哈哈哈。分享一些新的方法:
first ,eval it
首先,eval它
print eval('string%s' % (''.join(['.replace("%s","")'%i for i in replace_list])))
second , recurse it
第二,递归它
def repn(string,replace_list): if replace_list==[]: return string else: return repn(string.replace(replace_list.pop(),""),replace_list)print repn(string,replace_list)
Hey ,don't downvote. I am just want to share some new idea.
嘿,不要downvote。我只是想分享一些新的想法。
#13
1
How about this - a one liner.
这个怎么样——一个衬垫。
reduce(lambda x,y : x.replace(y,"") ,[',', '!', '.', ';'],";Test , , !Stri!ng ..")
#14
1
i think this is simple enough and will do!
我认为这很简单,可以做到!
list = [",",",","!",";",":"] #the list goes on.....theString = "dlkaj;lkdjf'adklfaj;lsd'fa'dfj;alkdjf" #is an example string;newString="" #the unwanted character free stringfor i in range(len(TheString)): if theString[i] in list: newString += "" #concatenate an empty string. else: newString += theString[i]
this is one way to do it. But if you are tired of keeping a list of characters that you want to remove, you can actually do it by using the order number of the strings you iterate through. the order number is the ascii value of that character. the ascii number for 0 as a char is 48 and the ascii number for lower case z is 122 so:
这是一种方法。但是,如果您厌倦了保留想要删除的字符列表,实际上可以通过使用迭代的字符串的序号来实现。订单号是该字符的ascii值。0作为字符的ascii码是48,小写z的ascii码是122,所以:
theString = "lkdsjf;alkd8a'asdjf;lkaheoialkdjf;ad"newString = ""for i in range(len(theString)): if ord(theString[i]) < 48 or ord(theString[i]) > 122: #ord() => ascii num. newString += "" else: newString += theString[i]
#15
0
I am thinking about a solution for this. First I would make the string input as a list. Then I would replace the items of list. Then through using join command, I will return list as a string. The code can be like this:
我正在考虑解决这个问题的办法。首先,我将字符串输入作为一个列表。然后我将替换列表中的项目。然后通过使用join命令,我将返回list作为字符串。代码可以是这样的:
def the_replacer(text): test = [] for m in range(len(text)): test.append(text[m]) if test[m]==','\ or test[m]=='!'\ or test[m]=='.'\ or test[m]=='\''\ or test[m]==';': #.... test[n]='' return ''.join(test)
This would remove anything from the string. What do you think about that?
这将从字符串中删除任何内容。你怎么看?
#16
0
Here is a more_itertools
approach:
这里有一个more_itertools的方法:
import more_itertools as mits = "A.B!C?D_E@F#"blacklist = ".!?_@#""".join(mit.flatten(mit.split_at(s, pred=lambda x: x in set(blacklist))))# 'ABCDEF'
Here we split upon items found in the blacklist
, flatten the results and join the string.
在这里,我们将在黑名单中找到的项目分开,将结果变平并加入字符串。