PHP5中不同的XML解析库有什么区别?

时间:2021-06-12 13:24:54

The original question is below, but I changed the title because I think it will be easier to find others with the same doubt. In the end, a XHTML document is a XML document.

最初的问题如下,但我更改了标题,因为我认为找到具有相同疑问的其他人会更容易。最后,XHTML文档是一个XML文档。

It's a beginner question, but I would like to know which do you think is the best library for parsing XHTML documents in PHP5?

这是一个初学者的问题,但我想知道你认为哪个是在PHP5中解析XHTML文档的最佳库?

I have generated the XHTML from HTML files (which where created using Word :S) with Tidy, and know I need to replace some elements from them (like the and element, replace some attributes in

我已经使用Tidy从HTML文件(使用Word:S创建)生成了XHTML,并知道我需要替换它们中的一些元素(比如和元素,替换一些属性)

tags).

标签)。

I haven't used XML very much, there seems to be many options for parsing in PHP (Simple XML, DOM, etc.) and I don't know if all of them can do what I need, an which is the easiest one to use.

我没有非常使用XML,在PHP(Simple XML,DOM等)中解析似乎有很多选项,我不知道是否所有这些都可以做我需要的,这是最简单的一个使用。

Sorry for my English, I'm form Argentina. Thanks!

对不起我的英语,我是阿根廷人。谢谢!

I bit more information: I have a lot of HTML pages, done in Word 97. I used Tidy for cleaning and turning them in XHTML Strict, so now they are all XML compatible. I want to use an XML parser to find some elements and replace them (the logic by which I do this doesn't matter). For example, I want all of the pages to use the same CSS stylesheet and class attributes, for unified appearance. They are all static pages which contains legal documents, nothing strange there. Which of the extensions should I use? Is SimpleXML enough? Should I learn DOM in spite of being more difficult?

我有更多的信息:我有很多HTML页面,在Word 97中完成。我使用Tidy进行清理并在XHTML Strict中进行清理,所以现在它们都是XML兼容的。我想使用XML解析器来查找一些元素并替换它们(我这样做的逻辑并不重要)。例如,我希望所有页面都使用相同的CSS样式表和类属性,以实现统一的外观。它们都是包含法律文件的静态页面,没什么奇怪的。我应该使用哪些扩展程序? SimpleXML够用吗?我是否应该学习DOM而不是更难?

7 个解决方案

#1


4  

Just to clear up the confusion here. PHP has a number of XML libraries, because php4 didn't have very good options in that direction. From PHP5, you have the choice between SimpleXml, DOM and the sax-based expat parser. The latter also existed in php4. php4 also had a DOM extension, which is not the same as php5's.

只是为了解决这里的困惑。 PHP有许多XML库,因为php4在这方面没有很好的选择。从PHP5开始,您可以选择SimpleXml,DOM和基于sax的expat解析器。后者也存在于php4中。 php4也有一个DOM扩展,这与php5的不一样。

DOM and SimpleXml are alternatives to the same problem domain; They læoad the document into memory and let you access it as a tree-structure. DOM is a rather bulky api, but it's also very consistent and it's implemented in many languages, meaning that you can re-use your knowledge across languages (In Javascript for example). SimpleXml may be easier initially.

DOM和SimpleXml是同一问题域的替代品;他们将文档存入内存并让您以树形结构的形式访问它。 DOM是一个相当庞大的api,但它也非常一致,并且它以多种语言实现,这意味着您可以跨语言重用您的知识(例如在Javascript中)。 SimpleXml最初可能更容易。

The SAX parser is a different beast. It treats an xml document as a stream of tags. This is useful if you are dealing with very large documents, since you don't need to hold it all in memory.

SAX解析器是一个不同的野兽。它将xml文档视为标记流。如果您处理非常大的文档,这很有用,因为您不需要将它全部保存在内存中。

For your usage, I would probably use the DOM api.

对于您的使用,我可能会使用DOM api。

#2


6  

You could use SimpleXML, which is included in a default PHP install. This extensions offers easy object-oriented access to XML-structures.

您可以使用SimpleXML,它包含在默认的PHP安装中。此扩展提供了对XML结构的轻松面向对象访问。

There's also DOM XML. A "downside" to this extension is that it is a bit harder to use and that it is not included by default.

还有DOM XML。这个扩展的“缺点”是它使用起来有点困难,默认情况下它不包括在内。

#3


4  

  • DOM is a standard, language-independent API for heirarchical data such as XML which has been standardized by the W3C. It is a rich API with much functionality. It is object based, in that each node is an object.

    DOM是一种标准的,与语言无关的API,用于已经由W3C标准化的XML等分层数据。它是一个功能丰富的API。它是基于对象的,因为每个节点都是一个对象。

    DOM is good when you not only want to read, or write, but you want to do a lot of manipulation of nodes an existing document, such as inserting nodes between others, changing the structure, etc.

    当你不仅想要阅读或写作,而且想要对现有文档中的节点进行大量操作(例如在其他文档之间插入节点,更改结构等)时,DOM是很好的。

  • SimpleXML is a PHP-specific API which is also object-based but is intended to be a lot less 'terse' than the DOM: simple tasks such as finding the value of a node or finding its child elements take a lot less code. Its API is not as rich than DOM, but it still includes features such as XPath lookups, and a basic ability to work with multiple-namespace documents. And, importantly, it still preserves all features of your document such as XML CDATA sections and comments, even though it doesn't include functions to manipulate them.

    SimpleXML是一个特定于PHP的API,它也是基于对象的,但它的目的是比DOM简洁得多:简单的任务,如查找节点的值或查找其子元素,所需的代码少得多。它的API不如DOM丰富,但它仍然包含XPath查找等功能,以及使用多命名空间文档的基本功能。而且,重要的是,它仍然保留了文档的所有功能,例如XML CDATA部分和注释,即使它不包含操作它们的函数。

    SimpleXML is very good for read-only: if all you want to do is read the XML document and convert it to another form, then it'll save you a lot of code. It's also fairly good when you want to generate a document, or do basic manipulations such as adding or changing child elements or attributes, but it can become complicated (but not impossible) to do a lot of manipulation of existing documents. It's not easy, for example, to add a child element in between two others; addChild only inserts after other elements. SimpleXML also cannot do XSLT transformations. It doesn't have things like 'getElementsByTagName' or getElementById', but if you know XPath you can still do that kind of thing with SimpleXML.

    SimpleXML非常适合只读:如果您只想读取XML文档并将其转换为另一种形式,那么它将为您节省大量代码。当您想要生成文档或进行基本操作(例如添加或更改子元素或属性)时,它也相当不错,但是对现有文档进行大量操作会变得复杂(但并非不可能)。例如,在另外两个之间添加子元素并不容易; addChild仅在其他元素之后插入。 SimpleXML也无法进行XSLT转换。它没有'getElementsByTagName'或getElementById'之类的东西,但是如果你知道XPath,你仍然可以使用SimpleXML做这种事情。

    The SimpleXMLElement object is somewhat 'magical'. The properties it exposes if you var_dump/printr/var_export don't correspond to its complete internal representation. It exposes some of its child elements as if they were properties which can be accessed with the -> operator, but still preserves the full document internally, and you can do things like access a child element whose name is a reserved word with the [] operator as if it was an associative array.

    SimpleXMLElement对象有些“神奇”。如果var_dump / printr / var_export与其完整的内部表示不对应,则它公开的属性。它公开了它的一些子元素,好像它们是可以使用 - >运算符访问的属性,但仍然在内部保留了完整的文档,你可以执行诸如使用[]访问名称为保留字的子元素之类的操作。运算符,就好像它是一个关联数组。

You don't have to fully commit to one or the other, because PHP implements the functions:

您不必完全提交其中一个,因为PHP实现了以下功能:

  • simplexml_import_dom(DOMNode)
  • simplexml_import_dom(的DOMNode)
  • dom_import_simplexml(SimpleXMLElement)
  • dom_import_simplexml(的SimpleXMLElement)

This is helpful if you are using SimpleXML and need to work with code that expects a DOM node or vice versa.

如果您使用的是SimpleXML并且需要使用需要DOM节点的代码,反之亦然,这将非常有用。

PHP also offers a third XML library:

PHP还提供了第三个XML库:

  • XML Parser (an implementation of SAX, a language-independent interface, but not referred to by that name in the manual) is a much lower level library, which serves quite a different purpose. It doesn't build objects for you. It basically just makes it easier to write your own XML parser, because it does the job of advancing to the next token, and finding out the type of token, such as what tag name is and whether it's an opening or closing tag, for you. Then you have to write callbacks that should be run each time a token is encountered. All tasks such as representing the document as objects/arrays in a tree, manipulating the document, etc will need to be implemented separately, because all you can do with the XML parser is write a low level parser.

    XML Parser(SAX的一种实现,一种与语言无关的接口,但在手册中没有被该名称引用)是一个低级别的库,它有很多不同的用途。它不会为您构建对象。它基本上只是让你更容易编写自己的XML解析器,因为它完成了前进到下一个令牌的工作,并找到令牌的类型,例如标签名称是什么,以及它是一个开始或结束标签,为你。然后你必须编写每次遇到令牌时应该运行的回调。所有任务(例如将文档表示为树中的对象/数组,操作文档等)都需要单独实现,因为您可以使用XML解析器编写一个低级解析器。

    The XML Parser functions are still quite helpful if you have specific memory or speed requirements. With it, it is possible to write a parser that can parse a very long XML document without holding all of its contents in memory at once. Also, if you not interested in all of the data, and don't need or want it to be put into a tree or set of PHP objects, then it can be quicker. For example, if you want to scan through an XHTML document and find all the links, and you don't care about structure.

    如果您有特定的内存或速度要求,XML Parser功能仍然非常有用。有了它,就可以编写一个解析器,它可以解析一个非常长的XML文档,而不会立即将所有内容保存在内存中。此外,如果您对所有数据不感兴趣,并且不需要或希望将其放入树或一组PHP对象中,那么它可以更快。例如,如果要扫描XHTML文档并查找所有链接,并且您不关心结构。

#4


1  

I prefer SimpleXMLElement as it's pretty easy to use to lop through elements.

我更喜欢SimpleXMLElement,因为它很容易用来浏览元素。

Edit: It says no version info avaliable but it's avaliable in PHP5, at least 5.2.5 but probably earlier.

编辑:它说没有版本信息可用但它在PHP5中可用,至少5.2.5但可能更早。

It's really personal choice though, there's plenty of XML extensions.

这是个人选择,但有很多XML扩展。

Bear in mind many XML parsers will balk if you have invalid markup - XHTML should be XML but not always!

请记住,如果您的标记无效,许多XML解析器都会犹豫不决 - XHTML应该是XML,但并非总是如此!

#5


0  

It's been a long time (2 years or more) since I worked with XML parsing in PHP, but I always had good, usable results from the XML_Parser Pear package. Having said that, I have had minimal exposure to PHP5, so I don't really know if there are better, inbuilt alternatives these days.

自从我在PHP中使用XML解析以来已经很长时间了(2年或更长时间),但我总是从XML_Parser Pear包中获得了良好的,可用的结果。话虽如此,我对PHP5的接触很少,所以我现在还不知道这些天是否有更好的内置替代品。

#6


0  

I did a little bit of XML parsing in PHP5 last year and decided to use a combination of SimpleXML.

去年我在PHP5中做了一些XML解析,并决定使用SimpleXML的组合。

DOM is a bit more useful if you want to create a new XML tree or add to an existing one, its slightly more flexible.

如果您想要创建新的XML树或添加到现有的XML树,DOM会更有用,它会稍微灵活一些。

#7


0  

It really depends on what you're trying to accomplish. For pulling rather large amounts of data, I.E many records of say, product information from a store website, I'd probably use Expat, since its supposedly a bit faster... Personally, I've has XML's large enough to create a noticeable performance boost. At those quantities you might as well be using SQL.

这实际上取决于你想要完成的事情。为了提取相当大量的数据,IE很多记录,比如来自商店网站的产品信息,我可能会使用Expat,因为它的速度要快一点......就个人而言,我已经拥有足够大的XML来创建一个引人注目的性能提升。在那些数量上你也可以使用SQL。

I recommend using SimpleXML. It's pretty intuitive, easy to use/write. Also, works great with XPath.

我建议使用SimpleXML。它非常直观,易于使用/编写。此外,与XPath一起使用效果很好。

Never really got to use DOM much, but if you're using the XML Parser for something as large as you're describing you might want to use it, since its a bit more functional than SimpleXML.

从来没有真正使用过多的DOM,但是如果你正在使用XML Parser来处理你所描述的那么大的东西,你可能想要使用它,因为它比SimpleXML更有用。

You can read about all three at W3C Schools:

您可以在W3C学校阅读所有三个:

http://www.w3schools.com/php/php_xml_parser_expat.asp

http://www.w3schools.com/php/php_xml_parser_expat.asp

http://www.w3schools.com/php/php_xml_simplexml.asp

http://www.w3schools.com/php/php_xml_simplexml.asp

http://www.w3schools.com/php/php_xml_dom.asp

http://www.w3schools.com/php/php_xml_dom.asp

#1


4  

Just to clear up the confusion here. PHP has a number of XML libraries, because php4 didn't have very good options in that direction. From PHP5, you have the choice between SimpleXml, DOM and the sax-based expat parser. The latter also existed in php4. php4 also had a DOM extension, which is not the same as php5's.

只是为了解决这里的困惑。 PHP有许多XML库,因为php4在这方面没有很好的选择。从PHP5开始,您可以选择SimpleXml,DOM和基于sax的expat解析器。后者也存在于php4中。 php4也有一个DOM扩展,这与php5的不一样。

DOM and SimpleXml are alternatives to the same problem domain; They læoad the document into memory and let you access it as a tree-structure. DOM is a rather bulky api, but it's also very consistent and it's implemented in many languages, meaning that you can re-use your knowledge across languages (In Javascript for example). SimpleXml may be easier initially.

DOM和SimpleXml是同一问题域的替代品;他们将文档存入内存并让您以树形结构的形式访问它。 DOM是一个相当庞大的api,但它也非常一致,并且它以多种语言实现,这意味着您可以跨语言重用您的知识(例如在Javascript中)。 SimpleXml最初可能更容易。

The SAX parser is a different beast. It treats an xml document as a stream of tags. This is useful if you are dealing with very large documents, since you don't need to hold it all in memory.

SAX解析器是一个不同的野兽。它将xml文档视为标记流。如果您处理非常大的文档,这很有用,因为您不需要将它全部保存在内存中。

For your usage, I would probably use the DOM api.

对于您的使用,我可能会使用DOM api。

#2


6  

You could use SimpleXML, which is included in a default PHP install. This extensions offers easy object-oriented access to XML-structures.

您可以使用SimpleXML,它包含在默认的PHP安装中。此扩展提供了对XML结构的轻松面向对象访问。

There's also DOM XML. A "downside" to this extension is that it is a bit harder to use and that it is not included by default.

还有DOM XML。这个扩展的“缺点”是它使用起来有点困难,默认情况下它不包括在内。

#3


4  

  • DOM is a standard, language-independent API for heirarchical data such as XML which has been standardized by the W3C. It is a rich API with much functionality. It is object based, in that each node is an object.

    DOM是一种标准的,与语言无关的API,用于已经由W3C标准化的XML等分层数据。它是一个功能丰富的API。它是基于对象的,因为每个节点都是一个对象。

    DOM is good when you not only want to read, or write, but you want to do a lot of manipulation of nodes an existing document, such as inserting nodes between others, changing the structure, etc.

    当你不仅想要阅读或写作,而且想要对现有文档中的节点进行大量操作(例如在其他文档之间插入节点,更改结构等)时,DOM是很好的。

  • SimpleXML is a PHP-specific API which is also object-based but is intended to be a lot less 'terse' than the DOM: simple tasks such as finding the value of a node or finding its child elements take a lot less code. Its API is not as rich than DOM, but it still includes features such as XPath lookups, and a basic ability to work with multiple-namespace documents. And, importantly, it still preserves all features of your document such as XML CDATA sections and comments, even though it doesn't include functions to manipulate them.

    SimpleXML是一个特定于PHP的API,它也是基于对象的,但它的目的是比DOM简洁得多:简单的任务,如查找节点的值或查找其子元素,所需的代码少得多。它的API不如DOM丰富,但它仍然包含XPath查找等功能,以及使用多命名空间文档的基本功能。而且,重要的是,它仍然保留了文档的所有功能,例如XML CDATA部分和注释,即使它不包含操作它们的函数。

    SimpleXML is very good for read-only: if all you want to do is read the XML document and convert it to another form, then it'll save you a lot of code. It's also fairly good when you want to generate a document, or do basic manipulations such as adding or changing child elements or attributes, but it can become complicated (but not impossible) to do a lot of manipulation of existing documents. It's not easy, for example, to add a child element in between two others; addChild only inserts after other elements. SimpleXML also cannot do XSLT transformations. It doesn't have things like 'getElementsByTagName' or getElementById', but if you know XPath you can still do that kind of thing with SimpleXML.

    SimpleXML非常适合只读:如果您只想读取XML文档并将其转换为另一种形式,那么它将为您节省大量代码。当您想要生成文档或进行基本操作(例如添加或更改子元素或属性)时,它也相当不错,但是对现有文档进行大量操作会变得复杂(但并非不可能)。例如,在另外两个之间添加子元素并不容易; addChild仅在其他元素之后插入。 SimpleXML也无法进行XSLT转换。它没有'getElementsByTagName'或getElementById'之类的东西,但是如果你知道XPath,你仍然可以使用SimpleXML做这种事情。

    The SimpleXMLElement object is somewhat 'magical'. The properties it exposes if you var_dump/printr/var_export don't correspond to its complete internal representation. It exposes some of its child elements as if they were properties which can be accessed with the -> operator, but still preserves the full document internally, and you can do things like access a child element whose name is a reserved word with the [] operator as if it was an associative array.

    SimpleXMLElement对象有些“神奇”。如果var_dump / printr / var_export与其完整的内部表示不对应,则它公开的属性。它公开了它的一些子元素,好像它们是可以使用 - >运算符访问的属性,但仍然在内部保留了完整的文档,你可以执行诸如使用[]访问名称为保留字的子元素之类的操作。运算符,就好像它是一个关联数组。

You don't have to fully commit to one or the other, because PHP implements the functions:

您不必完全提交其中一个,因为PHP实现了以下功能:

  • simplexml_import_dom(DOMNode)
  • simplexml_import_dom(的DOMNode)
  • dom_import_simplexml(SimpleXMLElement)
  • dom_import_simplexml(的SimpleXMLElement)

This is helpful if you are using SimpleXML and need to work with code that expects a DOM node or vice versa.

如果您使用的是SimpleXML并且需要使用需要DOM节点的代码,反之亦然,这将非常有用。

PHP also offers a third XML library:

PHP还提供了第三个XML库:

  • XML Parser (an implementation of SAX, a language-independent interface, but not referred to by that name in the manual) is a much lower level library, which serves quite a different purpose. It doesn't build objects for you. It basically just makes it easier to write your own XML parser, because it does the job of advancing to the next token, and finding out the type of token, such as what tag name is and whether it's an opening or closing tag, for you. Then you have to write callbacks that should be run each time a token is encountered. All tasks such as representing the document as objects/arrays in a tree, manipulating the document, etc will need to be implemented separately, because all you can do with the XML parser is write a low level parser.

    XML Parser(SAX的一种实现,一种与语言无关的接口,但在手册中没有被该名称引用)是一个低级别的库,它有很多不同的用途。它不会为您构建对象。它基本上只是让你更容易编写自己的XML解析器,因为它完成了前进到下一个令牌的工作,并找到令牌的类型,例如标签名称是什么,以及它是一个开始或结束标签,为你。然后你必须编写每次遇到令牌时应该运行的回调。所有任务(例如将文档表示为树中的对象/数组,操作文档等)都需要单独实现,因为您可以使用XML解析器编写一个低级解析器。

    The XML Parser functions are still quite helpful if you have specific memory or speed requirements. With it, it is possible to write a parser that can parse a very long XML document without holding all of its contents in memory at once. Also, if you not interested in all of the data, and don't need or want it to be put into a tree or set of PHP objects, then it can be quicker. For example, if you want to scan through an XHTML document and find all the links, and you don't care about structure.

    如果您有特定的内存或速度要求,XML Parser功能仍然非常有用。有了它,就可以编写一个解析器,它可以解析一个非常长的XML文档,而不会立即将所有内容保存在内存中。此外,如果您对所有数据不感兴趣,并且不需要或希望将其放入树或一组PHP对象中,那么它可以更快。例如,如果要扫描XHTML文档并查找所有链接,并且您不关心结构。

#4


1  

I prefer SimpleXMLElement as it's pretty easy to use to lop through elements.

我更喜欢SimpleXMLElement,因为它很容易用来浏览元素。

Edit: It says no version info avaliable but it's avaliable in PHP5, at least 5.2.5 but probably earlier.

编辑:它说没有版本信息可用但它在PHP5中可用,至少5.2.5但可能更早。

It's really personal choice though, there's plenty of XML extensions.

这是个人选择,但有很多XML扩展。

Bear in mind many XML parsers will balk if you have invalid markup - XHTML should be XML but not always!

请记住,如果您的标记无效,许多XML解析器都会犹豫不决 - XHTML应该是XML,但并非总是如此!

#5


0  

It's been a long time (2 years or more) since I worked with XML parsing in PHP, but I always had good, usable results from the XML_Parser Pear package. Having said that, I have had minimal exposure to PHP5, so I don't really know if there are better, inbuilt alternatives these days.

自从我在PHP中使用XML解析以来已经很长时间了(2年或更长时间),但我总是从XML_Parser Pear包中获得了良好的,可用的结果。话虽如此,我对PHP5的接触很少,所以我现在还不知道这些天是否有更好的内置替代品。

#6


0  

I did a little bit of XML parsing in PHP5 last year and decided to use a combination of SimpleXML.

去年我在PHP5中做了一些XML解析,并决定使用SimpleXML的组合。

DOM is a bit more useful if you want to create a new XML tree or add to an existing one, its slightly more flexible.

如果您想要创建新的XML树或添加到现有的XML树,DOM会更有用,它会稍微灵活一些。

#7


0  

It really depends on what you're trying to accomplish. For pulling rather large amounts of data, I.E many records of say, product information from a store website, I'd probably use Expat, since its supposedly a bit faster... Personally, I've has XML's large enough to create a noticeable performance boost. At those quantities you might as well be using SQL.

这实际上取决于你想要完成的事情。为了提取相当大量的数据,IE很多记录,比如来自商店网站的产品信息,我可能会使用Expat,因为它的速度要快一点......就个人而言,我已经拥有足够大的XML来创建一个引人注目的性能提升。在那些数量上你也可以使用SQL。

I recommend using SimpleXML. It's pretty intuitive, easy to use/write. Also, works great with XPath.

我建议使用SimpleXML。它非常直观,易于使用/编写。此外,与XPath一起使用效果很好。

Never really got to use DOM much, but if you're using the XML Parser for something as large as you're describing you might want to use it, since its a bit more functional than SimpleXML.

从来没有真正使用过多的DOM,但是如果你正在使用XML Parser来处理你所描述的那么大的东西,你可能想要使用它,因为它比SimpleXML更有用。

You can read about all three at W3C Schools:

您可以在W3C学校阅读所有三个:

http://www.w3schools.com/php/php_xml_parser_expat.asp

http://www.w3schools.com/php/php_xml_parser_expat.asp

http://www.w3schools.com/php/php_xml_simplexml.asp

http://www.w3schools.com/php/php_xml_simplexml.asp

http://www.w3schools.com/php/php_xml_dom.asp

http://www.w3schools.com/php/php_xml_dom.asp