使用字典的Python regex替换

时间:2020-12-22 18:17:07

I have the following regex to parse access strings inside brackets and remove them:

我有以下regex来解析括号内的访问字符串并删除它们:

>>> a = 'a[b]cdef[g ]hi[ j]klmno[ p ]'
>>> re.sub(r'\[\s?(.*?)\s?\]',r'\1',a)
'abcdefghijklmnop'

But what I want to do is have what is in brackets target a dictionary. Let's say I have the following dictionary:

但是我要做的是把括号中的内容作为字典的目标。假设我有以下的字典:

d = {'b':2,'g':7,'j':10,'p':16}

when I run my desired regex it should print the string: 'a2cdef7hi10klmno16'

当我运行所需的regex时,它应该打印字符串:'a2cdef7hi10klmno16'

However, I cannot simply have the replace part of sub be d['\1'] because there will be a KeyError: '\x01'.

但是,我不能简单地将替换的部分替换为d['\1'],因为会有一个关键错误:'\x01'。

Is there any simple way to replace a pattern with a dictionary responding to a capture in regex?

有没有一种简单的方法可以用字典来代替一个模式来响应正则表达式中的捕获?

4 个解决方案

#1


3  

You can use format, assuming a doesn't contain substrings of the form {...}:

可以使用format,假设a不包含{…}形式的子字符串:

>>> import re
>>> a = 'a[b]cdef[g ]hi[ j]klmno[ p ]'
>>> d = {'b':2,'g':7,'j':10,'p':16}
>>> 
>>> re.sub(r'\[\s?(.*?)\s?\]',r'{\1}',a).format(**d)
'a2cdef7hi10klmno16'

Or you can use a lambda:

或者你可以用lambda:

>>> re.sub(r'\[\s?(.*?)\s?\]', lambda m: str(d[m.group(1)]), a)
'a2cdef7hi10klmno16'

The lambda solution appears to be much faster:

lambda解决方案似乎要快得多:

>>> from timeit import timeit
>>>
>>> setup = """
... import re
... a = 'a[b]cdef[g ]hi[ j]klmno[ p ]'
... d = {'b':2,'g':7,'j':10,'p':16}
... """
>>>
>>> timeit(r"re.sub(r'\[\s?(.*?)\s?\]',r'{\1}',a).format(**d)", setup)
13.796708106994629
>>> timeit(r"re.sub(r'\[\s?(.*?)\s?\]', lambda m: str(d[m.group(1)]), a)", setup)
6.593755006790161

#2


0  

newstring = [(d[i] if i in d else i) for i in string]
re.sub(r'\[\s?(.*?)\s?\]',r'\1',a)

This should do what you want by first substituting the characters, then removing the brackets, assuming the values of the dictionary are also strings. If not, simply replace d[i] with str(d[i]).

这应该首先替换字符,然后删除括号,假设字典的值也是字符串。如果不是,只需用str(d[i])替换d[i]。

#3


0  

The Python regex replace function can take arbitrary replacement functions to replace with:

Python regex替换函数可以使用任意替换函数替换:

import re
d = {'b': 2, 'g': 7, 'j': 10, 'p': 16} 
def repl_fn(matchobj):
  return str(d[matchobj.group(0)])
regex = re.compile('[' + ''.join(d.iterkeys()) + ']')
print regex.sub(repl_fn, 'abcdefghijklmnop')

#4


0  

with regex im not sure. But you can just do this.

使用regex我不确定。但你可以这样做。

a = 'a[b]cdef[g ]hi[ j]klmno[ p ]'
result = re.sub(r'\[\s?(.*?)\s?\]',r'\1',a)
newresult = result
for char in result:
  value = d.get(char)
  if value:
    newresult = re.sub(char, value, newresult)
print newresult

#1


3  

You can use format, assuming a doesn't contain substrings of the form {...}:

可以使用format,假设a不包含{…}形式的子字符串:

>>> import re
>>> a = 'a[b]cdef[g ]hi[ j]klmno[ p ]'
>>> d = {'b':2,'g':7,'j':10,'p':16}
>>> 
>>> re.sub(r'\[\s?(.*?)\s?\]',r'{\1}',a).format(**d)
'a2cdef7hi10klmno16'

Or you can use a lambda:

或者你可以用lambda:

>>> re.sub(r'\[\s?(.*?)\s?\]', lambda m: str(d[m.group(1)]), a)
'a2cdef7hi10klmno16'

The lambda solution appears to be much faster:

lambda解决方案似乎要快得多:

>>> from timeit import timeit
>>>
>>> setup = """
... import re
... a = 'a[b]cdef[g ]hi[ j]klmno[ p ]'
... d = {'b':2,'g':7,'j':10,'p':16}
... """
>>>
>>> timeit(r"re.sub(r'\[\s?(.*?)\s?\]',r'{\1}',a).format(**d)", setup)
13.796708106994629
>>> timeit(r"re.sub(r'\[\s?(.*?)\s?\]', lambda m: str(d[m.group(1)]), a)", setup)
6.593755006790161

#2


0  

newstring = [(d[i] if i in d else i) for i in string]
re.sub(r'\[\s?(.*?)\s?\]',r'\1',a)

This should do what you want by first substituting the characters, then removing the brackets, assuming the values of the dictionary are also strings. If not, simply replace d[i] with str(d[i]).

这应该首先替换字符,然后删除括号,假设字典的值也是字符串。如果不是,只需用str(d[i])替换d[i]。

#3


0  

The Python regex replace function can take arbitrary replacement functions to replace with:

Python regex替换函数可以使用任意替换函数替换:

import re
d = {'b': 2, 'g': 7, 'j': 10, 'p': 16} 
def repl_fn(matchobj):
  return str(d[matchobj.group(0)])
regex = re.compile('[' + ''.join(d.iterkeys()) + ']')
print regex.sub(repl_fn, 'abcdefghijklmnop')

#4


0  

with regex im not sure. But you can just do this.

使用regex我不确定。但你可以这样做。

a = 'a[b]cdef[g ]hi[ j]klmno[ p ]'
result = re.sub(r'\[\s?(.*?)\s?\]',r'\1',a)
newresult = result
for char in result:
  value = d.get(char)
  if value:
    newresult = re.sub(char, value, newresult)
print newresult