我如何纺织和消毒HTML？

Now i ran into some stupid situation. I want the users to be able to use textile, but they shouldn't mess around with my valid HTML around their entry. So I have to escape the HTML somehow.

现在我遇到了一些愚蠢的情况。我希望用户能够使用纺织品,但他们不应该在他们的条目周围乱用我的有效HTML。所以我必须以某种方式逃避HTML。

html_escape(textilize("</body>Foo")) would break textile while

html_escape(textilize(“ Foo”))会破坏纺织品
textilize(html_escape("</body>Foo")) would work, but breaks various Textile features like links (written like "Linkname":http://www.wheretogo.com/), since the quotes would be transformed into " and thus not detected by textile anymore.

textilize(html_escape(“ Foo”))可以工作,但打破各种纺织品功能,如链接(写作“链接名称”:http://www.wheretogo.com/),因为引号将转换为“ ;因此不再被纺织品检测到。
sanitize doesn't do a better job.

消毒不会做得更好。

Any suggestions on that one? I would prefer not to use Tidy for this problem. Thanks in advance.

有关那个的任何建议吗?我宁愿不使用Tidy来解决这个问题。提前致谢。

3 个解决方案

#1

For those who run into the same problem: If you are using the RedCloth gem you can just define your own method (in one of your helpers).

对于那些遇到同样问题的人:如果您正在使用RedCloth gem,您可以定义自己的方法(在您的一个帮助程序中)。

def safe_textilize( s )
  if s && s.respond_to?(:to_s)
    doc = RedCloth.new( s.to_s )
    doc.filter_html = true
    doc.to_html
  end
end

Excerpt from the Documentation:

摘自文档:

Accessors for setting security restrictions.

用于设置安全限制的访问器。

This is a nice thing if you‘re using RedCloth for formatting in public places (e.g. Wikis) where you don‘t want users to abuse HTML for bad things.

如果您在公共场所(例如Wiki)中使用RedCloth进行格式化,这是一件好事,在这些场所您不希望用户滥用HTML来处理不良内容。

If filter_html is set, HTML which wasn‘t created by the Textile processor will be escaped. Alternatively, if sanitize_html is set, HTML can pass through the Textile processor but unauthorized tags and attributes will be removed.

如果设置了filter_html,则将转义未由Textile处理器创建的HTML。或者,如果设置了sanitize_html,HTML可以通过Textile处理器,但将删除未经授权的标签和属性。

#2

This works for me and guards against every XSS attack I've tried including onmouse... handlers in pre and code blocks:

这对我来说很有用,可以防范我尝试过的所有XSS攻击,包括onmouse ...前置和代码块中的处理程序:

<%= RedCloth.new( sanitize( @comment.body ), [:filter_html, :filter_styles, :filter_classes, :filter_ids] ).to_html -%>

The initial sanitize removes a lot of potential XSS exploits including mouseovers.

初始清理消除了许多潜在的XSS攻击,包括鼠标悬停。

As far as I can tell :filter_html escapes most html tags apart from code and pre. The other filters are there because I don't want users applying any classes, ids and styles.

据我所知:filter_html除了代码和pre之外还逃脱了大多数html标签。其他过滤器是因为我不希望用户应用任何类,ID和样式。

I just tested my comments page with your example

我刚用你的例子测试了我的评论页面

"</body>Foo"

and it completely removed the rogue body tag

它完全删除了流氓身体标签

I am using Redcloth version 4.2.3 and Rails version 2.3.5

我使用的是Redcloth版本4.2.3和Rails版本2.3.5

#3

Looks like textile simply doesn't support what you want.

看起来纺织品根本不支持你想要的东西。

You really want to only allow a carefully controlled subset of HTML, but textile is designed to allow arbitrary HTML. I don't think you can use textile at all in this situation (unless it supports that kind of restriction).

您真的只想允许精心控制的HTML子集,但纺织品旨在允许任意HTML。在这种情况下,我认为你根本不能使用纺织品(除非它支持这种限制)。

What you need is probably a special "restricted" version of textile, that only allows "safe" markup (defining that however might already be tricky). I do not know if that exists, however.

你需要的可能是一个特殊的“限制”版本的纺织品,只允许“安全”标记(定义但可能已经很棘手)。但是,我不知道这是否存在。

You might have a look at BBCode, that allows to restrict the possible markup.

您可以查看BBCode,它允许限制可能的标记。

#1