如何在python中替换'&'到'&' ?

时间:2021-10-01 21:07:38

I'm having issues with .replace(). My XML parser does not like '&', but will accept '&\amp;'. I'd like to use .replace('&','&') but this does not seem to be working. I keep getting the error:

我对。replace()有意见。我的XML解析器不喜欢'&',但会接受'&\amp '。我想用.replace('&','& '),但这似乎行不通。我一直在犯错误:

lxml.etree.XMLSyntaxError: xmlParseEntityRef: no name, line 51, column 41

So far I have tried just a straight forward file=file.replace('&','&'), but this doesn't work. I've also tried:

到目前为止,我只尝试了一个简单的文件=file.replace('&','&'),但这不起作用。我也试过:

xml_file = infile
file=xml_file.readlines()
for line in file:
        for char in line:
                char.replace('&','&')
infile=open('a','w')
file='\n'.join(file)
infile.write(file)
infile.close()
infile=open('a','r')
xml_file=infile

What would be the best way to fix my issue?

解决我的问题最好的办法是什么?

4 个解决方案

#1


5  

str.replace creates and returns a new string. It can't alter strings in-place - they're immutable. Try replacing:

replace将创建并返回一个新的字符串。它不能改变现有的字符串——它们是不可变的。试着更换:

file=xml_file.readlines()

with

file = [line.replace('&','&') for line in xml_file]

This uses a list comprehension to build a list equivalent to .readlines() but with the replacement already made.

这使用一个列表理解来构建一个等价于.readlines()的列表,但是替换已经完成。

#2


2  

str.replace() returns new string object with the change made. It does not change data in-place. You are ignoring the return value.

replace()将使用所做的更改返回新的字符串对象。它不会在适当的位置更改数据。您忽略了返回值。

You want to apply it to each line instead:

你想把它应用到每一行:

file = [line.replace('&', '&') for line in file]

You could use the fileinput() module to do the transformation, and have it handle replacing the original file (a backup will be made):

您可以使用fileinput()模块进行转换,并让它处理替换原始文件(进行备份):

import fileinput
import sys

for line in fileinput.input('filename', inplace=True):
    sys.stdout.write(line.replace('&', '&'))

#3


0  

Oh... You need to decode HTML notation for special symbols. Python has module to deal with it - HTMLParser, here some docs.

哦……您需要解码特殊符号的HTML表示法。Python有处理它的模块—HTMLParser,这里有一些文档。

Here is example:

下面是例子:

import HTMLParser

out_file = ....    
file = xml_file.readlines()
parsed_lines = []
for line in file:
     parsed_lines.append(htmlparser.unescape(line))

#4


0  

Slightly off topic, but it might be good to use some escaping?

有点偏离主题,但是使用一些转义是好的吗?

I often use urllib's quote which will put the HTML escaping in and out:

我经常使用urllib的引文,它会把HTML转义的内容输入和输出:

 result=urllib.quote("filename&fileextension")
 'filename%26fileextension'
 urllib.unquote(result)
 filename&fileextension

Might help for consistency?

可能有助于一致性?

#1


5  

str.replace creates and returns a new string. It can't alter strings in-place - they're immutable. Try replacing:

replace将创建并返回一个新的字符串。它不能改变现有的字符串——它们是不可变的。试着更换:

file=xml_file.readlines()

with

file = [line.replace('&','&') for line in xml_file]

This uses a list comprehension to build a list equivalent to .readlines() but with the replacement already made.

这使用一个列表理解来构建一个等价于.readlines()的列表,但是替换已经完成。

#2


2  

str.replace() returns new string object with the change made. It does not change data in-place. You are ignoring the return value.

replace()将使用所做的更改返回新的字符串对象。它不会在适当的位置更改数据。您忽略了返回值。

You want to apply it to each line instead:

你想把它应用到每一行:

file = [line.replace('&', '&') for line in file]

You could use the fileinput() module to do the transformation, and have it handle replacing the original file (a backup will be made):

您可以使用fileinput()模块进行转换,并让它处理替换原始文件(进行备份):

import fileinput
import sys

for line in fileinput.input('filename', inplace=True):
    sys.stdout.write(line.replace('&', '&'))

#3


0  

Oh... You need to decode HTML notation for special symbols. Python has module to deal with it - HTMLParser, here some docs.

哦……您需要解码特殊符号的HTML表示法。Python有处理它的模块—HTMLParser,这里有一些文档。

Here is example:

下面是例子:

import HTMLParser

out_file = ....    
file = xml_file.readlines()
parsed_lines = []
for line in file:
     parsed_lines.append(htmlparser.unescape(line))

#4


0  

Slightly off topic, but it might be good to use some escaping?

有点偏离主题,但是使用一些转义是好的吗?

I often use urllib's quote which will put the HTML escaping in and out:

我经常使用urllib的引文,它会把HTML转义的内容输入和输出:

 result=urllib.quote("filename&fileextension")
 'filename%26fileextension'
 urllib.unquote(result)
 filename&fileextension

Might help for consistency?

可能有助于一致性?