XPath和Regexp哪个更快?

时间:2021-08-17 03:00:29

I am making an add-on for firefox and it loads a html page using ajax (add-on has it's XUL panel).

我正在为firefox做一个附加组件,它使用ajax加载一个html页面(附加组件有一个XUL面板)。

Now at this point, i did not search for a ways of creating a document object and placing the ajax request contents into it and then using xPath to find what i need.
Instead i am loading the contents and parsing it as text with regular expresion.

现在,我没有搜索创建文档对象并将ajax请求内容放入其中,然后使用xPath查找所需内容的方法。相反,我将加载内容并将其解析为文本,并使用常规的expresion。

But i got a question. Which would be better to use, xPath or regular expression? Which is faster to perform?

但我有个问题。使用xPath或正则表达式,哪个更好?哪个更快?

The HTML page would consist of hundreds of elements which contain same text, and what i basically want to do is count how many elements are there.

HTML页面将由数百个包含相同文本的元素组成,我主要想做的是计算有多少个元素。

I want my add-on to work as fast as possible and i do not know the mechanics behind regexp or xPath, so i don't know which is more effective.

我希望我的附加组件能够尽可能快地工作,而且我不知道regexp或xPath背后的机制,所以我不知道哪个更有效。

Hope i was clear. Thanks

希望我是清楚的。谢谢

1 个解决方案

#1


17  

Whenever you are dealing with XML, use XPath (or XSLT, XQuery, SAX, DOM or any other XML-aware method to go through your data). Do never use regular expressions for this task.

无论何时处理XML,都要使用XPath(或XSLT、XQuery、SAX、DOM或任何其他支持XML的方法来检查数据)。不要对这个任务使用正则表达式。

Why? XML processing is intricate and dealing with all its oddities, external/parsed/unparsed entities, DTD's, processing instructions, whitespace handling, collapsing, unicode normalization, CDATA sections etc makes it very hard to create a reliable regex-way of getting your data. Just consider that it has taken the industry years to learn how to best parse XML, should be enough reason not to try to do this by yourself.

为什么?XML处理是复杂的,处理它所有的古怪、外部/解析/未解析实体、DTD、处理指令、空白处理、崩溃、unicode规范化、CDATA节等使创建可靠的获取数据的regex方法变得非常困难。只要考虑到它已经花费了业界数年的时间来学习如何最好地解析XML,就应该有足够的理由不自己尝试这么做。

Answering your q.: when it comes to speed (which should not be your primary concern here), it highly depends on the implementation of either the XPath or Regex compiler / processor. Sometimes, XPath will be faster (i.e., when using keys, if possible, or compiled XSLT), other times, regexes will be faster (if you can use a precompiled regex and your query is easy). But regexes are never easy with HTML/XML simply because of the matching nested parentheses (tags) problem, which cannot be reliably solved with regexes alone.

回答你的问。:当谈到速度时(在这里不应该是您的主要关注点),它在很大程度上取决于XPath或Regex编译器/处理器的实现。有时,XPath会更快(例如。如果可能的话,在使用键或编译后的XSLT时,其他时候,regexes将会更快(如果您可以使用预编译的regex,并且您的查询很容易)。但是,使用HTML/XML时,regexes从来都不容易,这仅仅是因为匹配的嵌套圆括号(标记)问题,仅使用regexes是无法可靠地解决这个问题的。

If input is huge, regex will tend to be faster, unless the XPath implementation can do streaming processing (which I believe is not the method inside Firefox).

如果输入量很大,regex将趋向于更快,除非XPath实现可以进行流处理(我认为这不是Firefox中的方法)。

You wrote:

你写的:

"which is more effective"*

“更有效”

the one that brings you quickest to a reliable and stable implementation that's comparatively speedy. Use XPath. It's what's used inside Firefox and other browsers as well if you need your code to run from a browser.

一个能让你最快找到可靠稳定的实现的方法,这是比较快的。使用XPath。如果需要在浏览器中运行代码,它也可以在Firefox和其他浏览器中使用。

#1


17  

Whenever you are dealing with XML, use XPath (or XSLT, XQuery, SAX, DOM or any other XML-aware method to go through your data). Do never use regular expressions for this task.

无论何时处理XML,都要使用XPath(或XSLT、XQuery、SAX、DOM或任何其他支持XML的方法来检查数据)。不要对这个任务使用正则表达式。

Why? XML processing is intricate and dealing with all its oddities, external/parsed/unparsed entities, DTD's, processing instructions, whitespace handling, collapsing, unicode normalization, CDATA sections etc makes it very hard to create a reliable regex-way of getting your data. Just consider that it has taken the industry years to learn how to best parse XML, should be enough reason not to try to do this by yourself.

为什么?XML处理是复杂的,处理它所有的古怪、外部/解析/未解析实体、DTD、处理指令、空白处理、崩溃、unicode规范化、CDATA节等使创建可靠的获取数据的regex方法变得非常困难。只要考虑到它已经花费了业界数年的时间来学习如何最好地解析XML,就应该有足够的理由不自己尝试这么做。

Answering your q.: when it comes to speed (which should not be your primary concern here), it highly depends on the implementation of either the XPath or Regex compiler / processor. Sometimes, XPath will be faster (i.e., when using keys, if possible, or compiled XSLT), other times, regexes will be faster (if you can use a precompiled regex and your query is easy). But regexes are never easy with HTML/XML simply because of the matching nested parentheses (tags) problem, which cannot be reliably solved with regexes alone.

回答你的问。:当谈到速度时(在这里不应该是您的主要关注点),它在很大程度上取决于XPath或Regex编译器/处理器的实现。有时,XPath会更快(例如。如果可能的话,在使用键或编译后的XSLT时,其他时候,regexes将会更快(如果您可以使用预编译的regex,并且您的查询很容易)。但是,使用HTML/XML时,regexes从来都不容易,这仅仅是因为匹配的嵌套圆括号(标记)问题,仅使用regexes是无法可靠地解决这个问题的。

If input is huge, regex will tend to be faster, unless the XPath implementation can do streaming processing (which I believe is not the method inside Firefox).

如果输入量很大,regex将趋向于更快,除非XPath实现可以进行流处理(我认为这不是Firefox中的方法)。

You wrote:

你写的:

"which is more effective"*

“更有效”

the one that brings you quickest to a reliable and stable implementation that's comparatively speedy. Use XPath. It's what's used inside Firefox and other browsers as well if you need your code to run from a browser.

一个能让你最快找到可靠稳定的实现的方法,这是比较快的。使用XPath。如果需要在浏览器中运行代码,它也可以在Firefox和其他浏览器中使用。