Python——string之maketrans，translate函数

先来看下关于这两个函数的官方定义：

string.maketrans(from, to)：Return a translation table suitable for passing to translate(), that will map each character in from into the character at the same position in to; from and to must have the same length.

string.translate(s, table[, deletechars])：Delete all characters from s that are in deletechars (if present), and then translate the characters using table, which must be a 256-character string giving the translation for each character value, indexed by its ordinal. If table is None, then only the character deletion step is performed.

下面的代码是对这两个函数进行的封装：

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import string

def translator(frm='', to='', delete='', keep=None):
    if len(to) == 1:
        to = to * len(frm)
    
    trans = string.maketrans(frm, to)
    if keep is not None:
        trans_all = string.maketrans('', '')
        #keep.translate(trans_all, delete)，从要保留的字符中剔除要删除的字符
        #trans_all.translate(trans_all, keep.translate(trans_all, delete))，从翻译表中删除要保留的字符，即取保留字符的补集
        delete = trans_all.translate(trans_all, keep.translate(trans_all, delete))
        
    def translate(s):
        return s.translate(trans, delete)
        
    return translate

if __name__ == '__main__':
    #result:12345678
    digits_only = translator(keep=string.digits)
    print digits_only('Eric chen: 1234-5678')
    
    #result:Eric chen: -
    no_digits = translator(delete=string.digits)
    print no_digits('Eric chen: 1234-5678')
    
    #result:Eric chen: ****-****
    digits_to_hash = translator(frm=string.digits, to='*')
    print digits_to_hash('Eric chen: 1234-5678')

当以string.maketrans('', '')方法调用maketrans时，翻译表正好是有256个字符的字符串t。翻译表生成的字符串（忽略不可打印字符）为“!"#$%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~”，本质上与ASCII表相对应。

其实调用maketrans函数时，已经完成了转换。例如string.maketrans('ABCD', 'abcd'),调用完成后，翻译表生成的包含256个字符的字符串（忽略不可打印字符）为“!"#$%&'()*+,-./0123456789:;<=>?@abcdEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~”，该翻译表中的原“ABCD”的位置已被“abcd”替换。

当你把t作为第一个参数传入translate方法时，原字符串中的每一个字符c，在处理完成后都会被翻译成字符t[ord(c)]。

For Unicode objects, the translate() method does not accept the optional deletechars argument. Instead, it returns a copy of the s where all characters have been mapped through the given translation table which must be a mapping of Unicode ordinals to Unicode ordinals, Unicode strings or None. Unmapped characters are left untouched. Characters mapped to None are deleted.

下面的代码是对unicode字符串进行过滤：

import sets
class Keeper(object):
    def __init__(self, keep):
        self.keep = sets.Set(map(ord, keep))
    
    def __getitem__(self, n):
        if n not in self.keep:
            return None
        return unichr(n)
    
    def __call__(self, s):
        return unicode(s).translate(self)

makeFilter = Keeper

if __name__ == '__main__':
    #result:人民
    just_people = makeFilter(u'人民')
    print just_people(u'*成立了')

    #删除unicode字符
    #result:中华*成立了!
    translate_table = dict((ord(char), None) for char in u'人民')
    print unicode(u'*成立了!').translate(translate_table)
    
    #替换unicode字符
    #result:中华***成立了!
    translate_table = dict((ord(char), u'*') for char in u'人民')
    print unicode(u'*成立了!').translate(translate_table)

Unicode字符串的translate方法只需要一个参数：一个序列或映射，并且根据字符串中的每个字符的码值进行索引。码值不是一个映射的键（或者序列的索引值）的字符会被直接复制，不做改变。与每个字符码对应的值必须是一个unicode字符串（该字符的替换物）或者None（这意味着该字符需要被删除）。通常我们使用dict或list作为unicode字符串的translate方法的参数，来翻译或删除某些字符。

秒客网

Python——string之maketrans，translate函数

相关文章