Python:哪个XML解析器支持DTD!ENTITY定义?

时间:2022-09-17 20:28:28

I have the below XML file, currently I am using minidom and I get for the example the documentElement's tagName as being xyz:widget that tells me that it ignores the !ENTITY definitions and thus the!DOCTYPE reference.

我有下面的XML文件,目前我正在使用minidom,我得到的例子是documentElement的tagName为xyz:widget,告诉我它忽略了!ENTITY定义,因此忽略了!DOCTYPE引用。

Which XML parser supports Document Type Definitions so that !ENTITY definitions and !DOCTYPE reference will no be ignored:

哪个XML解析器支持文档类型定义,以便不会忽略!ENTITY定义和!DOCTYPE引用:

<?xml version="1.0" standalone="yes" ?>
<!DOCTYPE widget [
<!ENTITY widgets-ns "http://www.w3.org/ns/widgets">
<!ENTITY pass "pass&amp;.html">
]>
<xyz:widget xmlns:xyz="&widgets-ns;">
  <xyz:content src="&pass;"/>
  <xyz:name>bv</xyz:name>
</xyz:widget>

So that for the above example, you can get using python the XML equivalent:

因此,对于上面的示例,您可以使用python等效的XML:

<widget xmlns="http://www.w3.org/ns/widgets">
  <content src="pass&amp;.html"/>
  <name>bv</name>
</widget>

or to get a DOM that has as a documentElement as widget and its childNodes as content and name, widget attribute as xmlns with value http://www.w3.org/ns/widgets, etc

或者获取一个DOM作为一个documentElement作为widget,它的childNodes作为内容和名称,widget属性作为xmlns,值为http://www.w3.org/ns/widgets等

I probably may not used the correct terminology, but I hope I made myself clear with the help of the above examples.

我可能没有使用正确的术语,但我希望在上述例子的帮助下我明白了。

1 个解决方案

#1


6  

LXML handles this just fine:

LXML处理这个很好:

>>> from lxml import etree
>>> s = """<?xml version="1.0" standalone="yes" ?>
... <!DOCTYPE widget [
... <!ENTITY widgets-ns "http://www.w3.org/ns/widgets">
... <!ENTITY pass "pass&amp;.html">
... ]>
... <xyz:widget xmlns:xyz="&widgets-ns;">
...   <xyz:content src="&pass;"/>
...   <xyz:name>bv</xyz:name>
... </xyz:widget>
... """
>>> etree.fromstring(s)
<Element {http://www.w3.org/ns/widgets}widget at 7f4de2cc58e8>
>>> etree.fromstring(s).xpath("//xyz:content/@src",
...                           namespaces={"xyz": "http://www.w3.org/ns/widgets"})
['pass&.html']

#1


6  

LXML handles this just fine:

LXML处理这个很好:

>>> from lxml import etree
>>> s = """<?xml version="1.0" standalone="yes" ?>
... <!DOCTYPE widget [
... <!ENTITY widgets-ns "http://www.w3.org/ns/widgets">
... <!ENTITY pass "pass&amp;.html">
... ]>
... <xyz:widget xmlns:xyz="&widgets-ns;">
...   <xyz:content src="&pass;"/>
...   <xyz:name>bv</xyz:name>
... </xyz:widget>
... """
>>> etree.fromstring(s)
<Element {http://www.w3.org/ns/widgets}widget at 7f4de2cc58e8>
>>> etree.fromstring(s).xpath("//xyz:content/@src",
...                           namespaces={"xyz": "http://www.w3.org/ns/widgets"})
['pass&.html']