I have the following code.
我有以下代码。
from xml.dom.minidom import Document
doc = Document()
root = doc.createElement('root')
doc.appendChild(root)
main = doc.createElement('Text')
root.appendChild(main)
text = doc.createTextNode('Some text here')
main.appendChild(text)
print doc.toprettyxml(indent='\t')
The result is:
其结果是:
<?xml version="1.0" ?>
<root>
<Text>
Some text here
</Text>
</root>
This is all fine and dandy, but what if I want the output to look like this?
这些都很好,但是如果我想让输出变成这样呢?
<?xml version="1.0" ?>
<root>
<Text>Some text here</Text>
</root>
Can this easily be done?
这很容易做到吗?
Orjanp...
Orjanp……
6 个解决方案
#1
7
Can this easily be done?
这很容易做到吗?
It depends what exact rule you want, but generally no, you get little control over pretty-printing. If you want a specific format you'll usually have to write your own walker.
这取决于你想要的确切规则,但通常不会,你很难控制漂亮的印刷。如果你想要一个特定的格式,你通常需要自己写步行者。
The DOM Level 3 LS parameter format-pretty-print in pxdom comes pretty close to your example. Its rule is that if an element contains only a single TextNode, no extra whitespace will be added. However it (currently) uses two spaces for an indent rather than four.
pxdom中的DOM Level 3 LS参数format-pretty-print与示例非常接近。它的规则是,如果一个元素只包含一个TextNode,则不会添加额外的空格。然而,它(目前)使用两个空格作为缩进,而不是四个空格。
>>> doc= pxdom.parseString('<a><b>c</b></a>')
>>> doc.domConfig.setParameter('format-pretty-print', True)
>>> print doc.pxdomContent
<?xml version="1.0" encoding="utf-16"?>
<a>
<b>c</b>
</a>
(Adjust encoding and output format for whatever type of serialisation you're doing.)
(为正在进行的任何类型的序列化调整编码和输出格式。)
If that's the rule you want, and you can get away with it, you might also be able to monkey-patch minidom's Element.writexml, eg.:
如果这是您想要的规则,并且您可以使用它,您也可以使用monkey-patch minidom的元素。writexml,即可,如:
>>> from xml.dom import minidom
>>> def newwritexml(self, writer, indent= '', addindent= '', newl= ''):
... if len(self.childNodes)==1 and self.firstChild.nodeType==3:
... writer.write(indent)
... self.oldwritexml(writer) # cancel extra whitespace
... writer.write(newl)
... else:
... self.oldwritexml(writer, indent, addindent, newl)
...
>>> minidom.Element.oldwritexml= minidom.Element.writexml
>>> minidom.Element.writexml= newwritexml
All the usual caveats about the badness of monkey-patching apply.
所有关于monkeypatching的缺点的通常警告都是适用的。
#2
2
I was looking for exactly the same thing, and I came across this post. (the indenting provided in xml.dom.minidom broke another tool that I was using to manipulate the XML, and I needed it to be indented) I tried the accepted solution with a slightly more complex example and this was the result:
我在找同样的东西,我发现了这篇文章。在xml.dom中提供的缩进。minidom破坏了我用来操作XML的另一个工具,我需要它被缩进)我尝试了一个稍微复杂一点的解决方案,结果是:
In [1]: import pxdom
In [2]: xml = '<a><b>fda</b><c><b>fdsa</b></c></a>'
In [3]: doc = pxdom.parseString(xml)
In [4]: doc.domConfig.setParameter('format-pretty-print', True)
In [5]: print doc.pxdomContent
<?xml version="1.0" encoding="utf-16"?>
<a>
<b>fda</b><c>
<b>fdsa</b>
</c>
</a>
The pretty printed XML isn't formatted correctly, and I'm not too happy about monkey patching (i.e. I barely know what it means, and understand it's bad), so I looked for another solution.
漂亮打印的XML格式不正确,我也不太喜欢monkey patching(也就是说,我几乎不知道它是什么意思,也不知道它不好),所以我寻找了另一种解决方案。
I'm writing the output to file, so I can use the xmlindent program for Ubuntu ($sudo aptitude install xmlindent). So I just write the unformatted to the file, then call the xmlindent from within the python program:
我正在编写输出文件,因此我可以使用Ubuntu的xmlindent程序($sudo aptitude install xmlindent)。因此,我只将未格式化的内容写入文件,然后在python程序中调用xmlindent:
from subprocess import Popen, PIPE
Popen(["xmlindent", "-i", "2", "-w", "-f", "-nbe", file_name],
stderr=PIPE,
stdout=PIPE).communicate()
The -w switch causes the file to be overwritten, but annoyingly leaves a named e.g. "myfile.xml~" which you'll probably want to remove. The other switches are there to get the formatting right (for me).
w开关导致文件被覆盖,但令人恼火的是,它留下了一个命名的例子。“myfile。您可能想要删除的xml~。其他的开关是为了正确地格式化(对我来说)。
P.S. xmlindent is a stream formatter, i.e. you can use it as follows:
P.S. xmlindent是一个流格式化程序,即您可以使用它如下:
cat myfile.xml | xmlindent > myfile_indented.xml
So you might be able to use it in a python script without writing to a file if you needed to.
因此,如果需要的话,您可以在python脚本中使用它,而不必编写到文件中。
#3
1
This could be done with toxml(), using regular expressions to tidy things up.
这可以通过toxml()实现,使用正则表达式来整理。
>>> from xml.dom.minidom import Document
>>> import re
>>> doc = Document()
>>> root = doc.createElement('root')
>>> _ = doc.appendChild(root)
>>> main = doc.createElement('Text')
>>> _ = root.appendChild(main)
>>> text = doc.createTextNode('Some text here')
>>> _ = main.appendChild(text)
>>> out = doc.toxml()
>>> niceOut = re.sub(r'><', r'>\n<', re.sub(r'(<\/.*?>)', r'\1\n', out))
>>> print niceOut
<?xml version="1.0" ?>
<root>
<Text>Some text here</Text>
</root>
#4
0
The pyxml package offers a simple solution to this by using the xml.dom.ext.PrettyPrint() function. It can also print to a file descriptor.
pyxml包通过使用xml.dom.ext.PrettyPrint()函数提供了一个简单的解决方案。它还可以打印到文件描述符。
But the pyxml package is no longer maintained.
但是pyxml包不再被维护。
Oerjan Pettersen
Oerjan佩特森工作室内由手工制作完成
#5
0
This solution worked for me without monkey patching or ceasing to use minidom:
这个解决方案对我来说是有效的,没有猴子补丁或停止使用minidom:
from xml.dom.ext import PrettyPrint
from StringIO import StringIO
def toprettyxml_fixed (node, encoding='utf-8'):
tmpStream = StringIO()
PrettyPrint(node, stream=tmpStream, encoding=encoding)
return tmpStream.getvalue()
http://ronrothman.com/public/leftbraned/xml-dom-minidom-toprettyxml-and-silly-whitespace/最佳解决方案
#6
0
Easiest way to do this is to use prettyxml, and remove the \n and \t inside the tags. That way you keep your indent as you required in your example.
最简单的方法是使用prettyxml,并删除标记中的\n和\t。这样,您就可以按照示例中的要求保留缩进。
xml_output = doc.toprettyxml() nojunkintags = re.sub('>(\n|\t)</', '', xml_output) print nojunkintags
xml_output = doc.toprettyxml() nojunkintags = re.sub('>(\n|\t)
#1
7
Can this easily be done?
这很容易做到吗?
It depends what exact rule you want, but generally no, you get little control over pretty-printing. If you want a specific format you'll usually have to write your own walker.
这取决于你想要的确切规则,但通常不会,你很难控制漂亮的印刷。如果你想要一个特定的格式,你通常需要自己写步行者。
The DOM Level 3 LS parameter format-pretty-print in pxdom comes pretty close to your example. Its rule is that if an element contains only a single TextNode, no extra whitespace will be added. However it (currently) uses two spaces for an indent rather than four.
pxdom中的DOM Level 3 LS参数format-pretty-print与示例非常接近。它的规则是,如果一个元素只包含一个TextNode,则不会添加额外的空格。然而,它(目前)使用两个空格作为缩进,而不是四个空格。
>>> doc= pxdom.parseString('<a><b>c</b></a>')
>>> doc.domConfig.setParameter('format-pretty-print', True)
>>> print doc.pxdomContent
<?xml version="1.0" encoding="utf-16"?>
<a>
<b>c</b>
</a>
(Adjust encoding and output format for whatever type of serialisation you're doing.)
(为正在进行的任何类型的序列化调整编码和输出格式。)
If that's the rule you want, and you can get away with it, you might also be able to monkey-patch minidom's Element.writexml, eg.:
如果这是您想要的规则,并且您可以使用它,您也可以使用monkey-patch minidom的元素。writexml,即可,如:
>>> from xml.dom import minidom
>>> def newwritexml(self, writer, indent= '', addindent= '', newl= ''):
... if len(self.childNodes)==1 and self.firstChild.nodeType==3:
... writer.write(indent)
... self.oldwritexml(writer) # cancel extra whitespace
... writer.write(newl)
... else:
... self.oldwritexml(writer, indent, addindent, newl)
...
>>> minidom.Element.oldwritexml= minidom.Element.writexml
>>> minidom.Element.writexml= newwritexml
All the usual caveats about the badness of monkey-patching apply.
所有关于monkeypatching的缺点的通常警告都是适用的。
#2
2
I was looking for exactly the same thing, and I came across this post. (the indenting provided in xml.dom.minidom broke another tool that I was using to manipulate the XML, and I needed it to be indented) I tried the accepted solution with a slightly more complex example and this was the result:
我在找同样的东西,我发现了这篇文章。在xml.dom中提供的缩进。minidom破坏了我用来操作XML的另一个工具,我需要它被缩进)我尝试了一个稍微复杂一点的解决方案,结果是:
In [1]: import pxdom
In [2]: xml = '<a><b>fda</b><c><b>fdsa</b></c></a>'
In [3]: doc = pxdom.parseString(xml)
In [4]: doc.domConfig.setParameter('format-pretty-print', True)
In [5]: print doc.pxdomContent
<?xml version="1.0" encoding="utf-16"?>
<a>
<b>fda</b><c>
<b>fdsa</b>
</c>
</a>
The pretty printed XML isn't formatted correctly, and I'm not too happy about monkey patching (i.e. I barely know what it means, and understand it's bad), so I looked for another solution.
漂亮打印的XML格式不正确,我也不太喜欢monkey patching(也就是说,我几乎不知道它是什么意思,也不知道它不好),所以我寻找了另一种解决方案。
I'm writing the output to file, so I can use the xmlindent program for Ubuntu ($sudo aptitude install xmlindent). So I just write the unformatted to the file, then call the xmlindent from within the python program:
我正在编写输出文件,因此我可以使用Ubuntu的xmlindent程序($sudo aptitude install xmlindent)。因此,我只将未格式化的内容写入文件,然后在python程序中调用xmlindent:
from subprocess import Popen, PIPE
Popen(["xmlindent", "-i", "2", "-w", "-f", "-nbe", file_name],
stderr=PIPE,
stdout=PIPE).communicate()
The -w switch causes the file to be overwritten, but annoyingly leaves a named e.g. "myfile.xml~" which you'll probably want to remove. The other switches are there to get the formatting right (for me).
w开关导致文件被覆盖,但令人恼火的是,它留下了一个命名的例子。“myfile。您可能想要删除的xml~。其他的开关是为了正确地格式化(对我来说)。
P.S. xmlindent is a stream formatter, i.e. you can use it as follows:
P.S. xmlindent是一个流格式化程序,即您可以使用它如下:
cat myfile.xml | xmlindent > myfile_indented.xml
So you might be able to use it in a python script without writing to a file if you needed to.
因此,如果需要的话,您可以在python脚本中使用它,而不必编写到文件中。
#3
1
This could be done with toxml(), using regular expressions to tidy things up.
这可以通过toxml()实现,使用正则表达式来整理。
>>> from xml.dom.minidom import Document
>>> import re
>>> doc = Document()
>>> root = doc.createElement('root')
>>> _ = doc.appendChild(root)
>>> main = doc.createElement('Text')
>>> _ = root.appendChild(main)
>>> text = doc.createTextNode('Some text here')
>>> _ = main.appendChild(text)
>>> out = doc.toxml()
>>> niceOut = re.sub(r'><', r'>\n<', re.sub(r'(<\/.*?>)', r'\1\n', out))
>>> print niceOut
<?xml version="1.0" ?>
<root>
<Text>Some text here</Text>
</root>
#4
0
The pyxml package offers a simple solution to this by using the xml.dom.ext.PrettyPrint() function. It can also print to a file descriptor.
pyxml包通过使用xml.dom.ext.PrettyPrint()函数提供了一个简单的解决方案。它还可以打印到文件描述符。
But the pyxml package is no longer maintained.
但是pyxml包不再被维护。
Oerjan Pettersen
Oerjan佩特森工作室内由手工制作完成
#5
0
This solution worked for me without monkey patching or ceasing to use minidom:
这个解决方案对我来说是有效的,没有猴子补丁或停止使用minidom:
from xml.dom.ext import PrettyPrint
from StringIO import StringIO
def toprettyxml_fixed (node, encoding='utf-8'):
tmpStream = StringIO()
PrettyPrint(node, stream=tmpStream, encoding=encoding)
return tmpStream.getvalue()
http://ronrothman.com/public/leftbraned/xml-dom-minidom-toprettyxml-and-silly-whitespace/最佳解决方案
#6
0
Easiest way to do this is to use prettyxml, and remove the \n and \t inside the tags. That way you keep your indent as you required in your example.
最简单的方法是使用prettyxml,并删除标记中的\n和\t。这样,您就可以按照示例中的要求保留缩进。
xml_output = doc.toprettyxml() nojunkintags = re.sub('>(\n|\t)</', '', xml_output) print nojunkintags
xml_output = doc.toprettyxml() nojunkintags = re.sub('>(\n|\t)