我怎样才能让Hpricot与HTML5玩得很好?

时间:2021-02-09 16:23:19

I am using Hpricot to parse a theme file. I have noticed, however, that if I feed a valid HTML5 document into Hpricot(), it auto-closes HTML5 tags (like <section>), and messes with the DOCTYPE.

我正在使用Hpricot来解析主题文件。但是,我注意到,如果我将有效的HTML5文档提供给Hpricot(),它会自动关闭HTML5标记(如

),并与DOCTYPE混淆。

Are there any extensions to Hpricot, or perhaps a flag I need to set, that will allow HTML5 documents to be parsed correctly?

Hpricot是否有任何扩展,或者可能是我需要设置的标志,这将允许正确解析HTML5文档?

2 个解决方案

#1


2  

I know it kind of works around the direct question but I would suggest you try Nokogiri http://nokogiri.org/ as mentioned in some of the comments on your question post. I've had no issues with it parsing any HTML/XML like structured text, including HTML5.

我知道这有关于直接问题的工作,但我建议你尝试Nokogiri http://nokogiri.org/,正如你在问题帖子中的一些评论中提到的那样。我解决任何HTML / XML(如结构化文本,包括HTML5)都没有问题。

#2


0  

I think Hpricot's to_original_html method is exactly what you're looking for.

我认为Hpricot的to_original_html方法正是您正在寻找的方法。

From the docs, to_original_html

从docs到to_original_html

Attempts to preserve the original HTML of the document, only outputing new tags for elements which have changed.

尝试保留文档的原始HTML,仅为已更改的元素输出新标记。

#1


2  

I know it kind of works around the direct question but I would suggest you try Nokogiri http://nokogiri.org/ as mentioned in some of the comments on your question post. I've had no issues with it parsing any HTML/XML like structured text, including HTML5.

我知道这有关于直接问题的工作,但我建议你尝试Nokogiri http://nokogiri.org/,正如你在问题帖子中的一些评论中提到的那样。我解决任何HTML / XML(如结构化文本,包括HTML5)都没有问题。

#2


0  

I think Hpricot's to_original_html method is exactly what you're looking for.

我认为Hpricot的to_original_html方法正是您正在寻找的方法。

From the docs, to_original_html

从docs到to_original_html

Attempts to preserve the original HTML of the document, only outputing new tags for elements which have changed.

尝试保留文档的原始HTML,仅为已更改的元素输出新标记。