I'm having issues with .replace()
. My XML parser does not like '&', but will accept '&\amp;'. I'd like to use .replace('&','&')
but this does not seem to be working. I keep getting the error:
我对。replace()有意见。我的XML解析器不喜欢'&',但会接受'&\amp '。我想用.replace('&','& '),但这似乎行不通。我一直在犯错误:
lxml.etree.XMLSyntaxError: xmlParseEntityRef: no name, line 51, column 41
So far I have tried just a straight forward file=file.replace('&','&')
, but this doesn't work. I've also tried:
到目前为止,我只尝试了一个简单的文件=file.replace('&','&'),但这不起作用。我也试过:
xml_file = infile
file=xml_file.readlines()
for line in file:
for char in line:
char.replace('&','&')
infile=open('a','w')
file='\n'.join(file)
infile.write(file)
infile.close()
infile=open('a','r')
xml_file=infile
What would be the best way to fix my issue?
解决我的问题最好的办法是什么?
4 个解决方案
#1
5
str.replace
creates and returns a new string. It can't alter strings in-place - they're immutable. Try replacing:
replace将创建并返回一个新的字符串。它不能改变现有的字符串——它们是不可变的。试着更换:
file=xml_file.readlines()
with
与
file = [line.replace('&','&') for line in xml_file]
This uses a list comprehension to build a list equivalent to .readlines()
but with the replacement already made.
这使用一个列表理解来构建一个等价于.readlines()的列表,但是替换已经完成。
#2
2
str.replace()
returns new string object with the change made. It does not change data in-place. You are ignoring the return value.
replace()将使用所做的更改返回新的字符串对象。它不会在适当的位置更改数据。您忽略了返回值。
You want to apply it to each line instead:
你想把它应用到每一行:
file = [line.replace('&', '&') for line in file]
You could use the fileinput()
module to do the transformation, and have it handle replacing the original file (a backup will be made):
您可以使用fileinput()模块进行转换,并让它处理替换原始文件(进行备份):
import fileinput
import sys
for line in fileinput.input('filename', inplace=True):
sys.stdout.write(line.replace('&', '&'))
#3
0
Oh... You need to decode HTML notation for special symbols. Python has module to deal with it - HTMLParser
, here some docs.
哦……您需要解码特殊符号的HTML表示法。Python有处理它的模块—HTMLParser,这里有一些文档。
Here is example:
下面是例子:
import HTMLParser
out_file = ....
file = xml_file.readlines()
parsed_lines = []
for line in file:
parsed_lines.append(htmlparser.unescape(line))
#4
0
Slightly off topic, but it might be good to use some escaping?
有点偏离主题,但是使用一些转义是好的吗?
I often use urllib's quote which will put the HTML escaping in and out:
我经常使用urllib的引文,它会把HTML转义的内容输入和输出:
result=urllib.quote("filename&fileextension")
'filename%26fileextension'
urllib.unquote(result)
filename&fileextension
Might help for consistency?
可能有助于一致性?
#1
5
str.replace
creates and returns a new string. It can't alter strings in-place - they're immutable. Try replacing:
replace将创建并返回一个新的字符串。它不能改变现有的字符串——它们是不可变的。试着更换:
file=xml_file.readlines()
with
与
file = [line.replace('&','&') for line in xml_file]
This uses a list comprehension to build a list equivalent to .readlines()
but with the replacement already made.
这使用一个列表理解来构建一个等价于.readlines()的列表,但是替换已经完成。
#2
2
str.replace()
returns new string object with the change made. It does not change data in-place. You are ignoring the return value.
replace()将使用所做的更改返回新的字符串对象。它不会在适当的位置更改数据。您忽略了返回值。
You want to apply it to each line instead:
你想把它应用到每一行:
file = [line.replace('&', '&') for line in file]
You could use the fileinput()
module to do the transformation, and have it handle replacing the original file (a backup will be made):
您可以使用fileinput()模块进行转换,并让它处理替换原始文件(进行备份):
import fileinput
import sys
for line in fileinput.input('filename', inplace=True):
sys.stdout.write(line.replace('&', '&'))
#3
0
Oh... You need to decode HTML notation for special symbols. Python has module to deal with it - HTMLParser
, here some docs.
哦……您需要解码特殊符号的HTML表示法。Python有处理它的模块—HTMLParser,这里有一些文档。
Here is example:
下面是例子:
import HTMLParser
out_file = ....
file = xml_file.readlines()
parsed_lines = []
for line in file:
parsed_lines.append(htmlparser.unescape(line))
#4
0
Slightly off topic, but it might be good to use some escaping?
有点偏离主题,但是使用一些转义是好的吗?
I often use urllib's quote which will put the HTML escaping in and out:
我经常使用urllib的引文,它会把HTML转义的内容输入和输出:
result=urllib.quote("filename&fileextension")
'filename%26fileextension'
urllib.unquote(result)
filename&fileextension
Might help for consistency?
可能有助于一致性?