Experts of Python regular expressions! I'm trying to change a line in a xml document. The original line is:
Python正则表达式专家!我正在修改xml文档中的一行。最初的线路是:
<Tag name="low" Value="%hello%\dir"/>
The result I want to see is:
我想看到的结果是:
<Tag name="low" Value="C:\art"/>
My failed straight-forward attempt is:
我失败的直接尝试是:
lines = re.sub("%hello%\dir"", "C:\art"/>
This doesn't work. Doesn't change anything in the doc. Something with %?
这并不工作。没有改变文档中的任何东西。与% ?
For testing purposes I tried:
为了测试目的,我尝试:
lines = re.sub("dir", "C:\art", a)
And I get:
我得到:
<Tag name="low" Value="%hello%\C:BELrt"/>
The problem is that \a = BEL.
问题是这样的。
I've tried a bunch of other things, but to no avail. How do I go about this problem?
我试过很多其他的东西,但都没用。我该如何解决这个问题呢?
3 个解决方案
#1
0
You're issue is that you've got some characters which have specific meaning in regex's.
您的问题是您有一些在regex中具有特定意义的字符。
\d
means any digit. %hello%\dir
is then %hello%[0-9]ir
\ d意味着任何数字。%你好% \ dir然后% %[0 - 9]ir
You need to escape those slashes/use a raw string to get around this:
你需要避开这些斜线/使用一根粗线来绕过这个:
a = '''<Tag name="low" Value="%hello%\dir"/>'''
lines = re.sub(r"%hello%\\dir", r"C:\\art", a)
print(lines) #<Tag name="low" Value="C:\\art"/>
#2
0
In Python, use the r
prefix to a literal string to keep from having to escape your slashes. Then escape your slash to avoid \d
matching a digit.
在Python中,将r前缀用于字符串,以避免必须摆脱斜杠。然后转义斜杠以避免与数字匹配。
lines = re.sub(r"%hello%\\dir", r"C:\\art")
#3
0
It is a good question. It shows three issues with a text representation at once:
这是个好问题。它同时显示了文本表示的三个问题:
-
'\a'
Python string literal is a single BELL character.'\a' Python字符串文字是一个钟形字符。
To input backslash followed by letter 'a' in Python source code you need either use raw-literals:
r'\a'
or escape the slash'\\a'
.要在Python源代码中输入反斜杠后跟字母“a”,您需要使用原始文字:r'\a”或转义“\a”。
-
r'\d'
(two characters) has special meaning when interpreted as a regular expression (r'\d'
means match a digit in a regex).r'\d'(两个字符)在被解释为正则表达式时具有特殊的意义(r'\d'意味着匹配正则表达式中的一个数字)。
In addition to rules for Python string literals you also need to escape possible regex metacharacters. You could use
re.escape(your_string)
in general case or justr'\\d'
or'\\\\d'
.'\a'
in therepl
part should also be escaped (twice in your case:r'\\a'
or'\\\\a'
):除了Python字符串字面量的规则之外,您还需要转义可能的regex元字符。您可以使用re.escape(your_string),或者只使用r'\\ \\\\ \\\\ \\\\ \\\\ \\\\ \\\\ \\\。在repl部分也应该转义(在你的例子中有两种情况:r'\\ \\\\ \ '):
>>> old, new = r'%hello%\dir', r'C:\art' >>> print re.sub(re.escape(old), new.encode('string-escape'), xml) <Tag name="low" Value="C:\art"/>
btw, you don't need regular expressions at all in this case:
顺便说一句,在这种情况下根本不需要正则表达式:
>>> print xml.replace(old, new) <Tag name="low" Value="C:\art"/>
-
at last XML attribute value can't contain certain characters that are also should be escaped e.g.,
'&'
,'"'
,"<"
, etc.最后,XML属性值不能包含某些也应该转义的字符,如'&'、' ' '、"<"等。
In general you should not use regex to manipulate XML. Python's stdlib has XML parsers.
一般来说,不应该使用regex来操作XML。Python的stdlib有XML解析器。
>>> import xml.etree.cElementTree as etree
>>> xml = r'<Tag name="low" Value="%hello%\dir"/>'
>>> tag = etree.fromstring(xml)
>>> tag.set('Value', r"C:\art & design")
>>> etree.dump(tag)
<Tag Value="C:\art & design" name="low" />
#1
0
You're issue is that you've got some characters which have specific meaning in regex's.
您的问题是您有一些在regex中具有特定意义的字符。
\d
means any digit. %hello%\dir
is then %hello%[0-9]ir
\ d意味着任何数字。%你好% \ dir然后% %[0 - 9]ir
You need to escape those slashes/use a raw string to get around this:
你需要避开这些斜线/使用一根粗线来绕过这个:
a = '''<Tag name="low" Value="%hello%\dir"/>'''
lines = re.sub(r"%hello%\\dir", r"C:\\art", a)
print(lines) #<Tag name="low" Value="C:\\art"/>
#2
0
In Python, use the r
prefix to a literal string to keep from having to escape your slashes. Then escape your slash to avoid \d
matching a digit.
在Python中,将r前缀用于字符串,以避免必须摆脱斜杠。然后转义斜杠以避免与数字匹配。
lines = re.sub(r"%hello%\\dir", r"C:\\art")
#3
0
It is a good question. It shows three issues with a text representation at once:
这是个好问题。它同时显示了文本表示的三个问题:
-
'\a'
Python string literal is a single BELL character.'\a' Python字符串文字是一个钟形字符。
To input backslash followed by letter 'a' in Python source code you need either use raw-literals:
r'\a'
or escape the slash'\\a'
.要在Python源代码中输入反斜杠后跟字母“a”,您需要使用原始文字:r'\a”或转义“\a”。
-
r'\d'
(two characters) has special meaning when interpreted as a regular expression (r'\d'
means match a digit in a regex).r'\d'(两个字符)在被解释为正则表达式时具有特殊的意义(r'\d'意味着匹配正则表达式中的一个数字)。
In addition to rules for Python string literals you also need to escape possible regex metacharacters. You could use
re.escape(your_string)
in general case or justr'\\d'
or'\\\\d'
.'\a'
in therepl
part should also be escaped (twice in your case:r'\\a'
or'\\\\a'
):除了Python字符串字面量的规则之外,您还需要转义可能的regex元字符。您可以使用re.escape(your_string),或者只使用r'\\ \\\\ \\\\ \\\\ \\\\ \\\\ \\\\ \\\。在repl部分也应该转义(在你的例子中有两种情况:r'\\ \\\\ \ '):
>>> old, new = r'%hello%\dir', r'C:\art' >>> print re.sub(re.escape(old), new.encode('string-escape'), xml) <Tag name="low" Value="C:\art"/>
btw, you don't need regular expressions at all in this case:
顺便说一句,在这种情况下根本不需要正则表达式:
>>> print xml.replace(old, new) <Tag name="low" Value="C:\art"/>
-
at last XML attribute value can't contain certain characters that are also should be escaped e.g.,
'&'
,'"'
,"<"
, etc.最后,XML属性值不能包含某些也应该转义的字符,如'&'、' ' '、"<"等。
In general you should not use regex to manipulate XML. Python's stdlib has XML parsers.
一般来说,不应该使用regex来操作XML。Python的stdlib有XML解析器。
>>> import xml.etree.cElementTree as etree
>>> xml = r'<Tag name="low" Value="%hello%\dir"/>'
>>> tag = etree.fromstring(xml)
>>> tag.set('Value', r"C:\art & design")
>>> etree.dump(tag)
<Tag Value="C:\art & design" name="low" />