如何使用Python在JSON中的HTML标记中转义'/'?

时间:2022-03-29 06:00:22

Note: This question is very close to Embedding JSON objects in script tags, but the responses to that question provides what I already know (that in JSON / == \/). I want to know how to do that escaping.

注意:这个问题非常接近于在脚本标记中嵌入JSON对象,但是对该问题的响应提供了我已经知道的内容(在JSON / == \ /中)。我想知道如何逃避。

The HTML spec prohibits closed HTML tags anywhere within a <script> element. So, this causes parse errors:

HTML规范禁止在

<script>
var assets = [{
  "asset_created": null, 
  "asset_id": "575155948f7d4c4ebccb02d4e8f84d2f", 
  "body": "<script></script>"
}];
</script>

In my case, I'm generating the invalid situation by rendering a JSON string inside a Django template, i.e.:

在我的例子中,我通过在Django模板中呈现JSON字符串来生成无效情况,即:

<script>
var assets = {{ json_string }};
</script>

I know that JSON parses \/ the same as /, so if I can just escape my closing HTML tags in the JSON string, I'll be good. But, I'm not sure of the best way to do this.

我知道JSON解析\ /与/相同,所以如果我可以在JSON字符串中转义关闭的HTML标记,我会很好。但是,我不确定这样做的最好方法。

My naive approach would just be this:

我的天真方法就是这样:

json_string = '[{"asset_created": null, "asset_id": "575155948f7d4c4ebccb02d4e8f84d2f", "body": "<script></script>"}]'
escaped_json_string = json_string.replace('</', r'<\/')

Is there a better way? Or any gotchas that I'm overlooking?

有没有更好的办法?或者我忽略的任何陷阱?

1 个解决方案

#1


6  

Updated Answer

Okay I assumed a few things incorrectly. For escaping the JSON, the simplejson library has a method JSONEncoderForHTML than can be used. You may need to install it via pip or easy_install if the code doesn't work. Then you can do something like this:

好吧,我假设了一些不正确的事情。为了转义JSON,simplejson库有一个比可以使用的方法JSONEncoderForHTML。如果代码不起作用,您可能需要通过pip或easy_install安装它。然后你可以做这样的事情:

import simplejson
asset_json=simplejson.loads(json_string)
encoded=simplejson.encoder.JSONEncoderForHTML().encode(assets_json)

which encoded will give you this:

哪个编码会给你这个:

'{"asset_id": "575155948f7d4c4ebccb02d4e8f84d2f", "body": "\\u003cscript\\u003e\\u003c/script\\u003e", "asset_created": null}'

This is a more overall solution than the slash replace as it handles other encoding caveats as well.

这是一个比斜杠替换更全面的解决方案,因为它也处理其他编码警告。

The loads part is a side-effect of having the JSON already encoded. This can be avoided by not using DJango if possible to generate the JSON and instead using simplejson:

加载部分是已经编码JSON的副作用。如果可能的话,不使用DJango来生成JSON而不是使用simplejson可以避免这种情况:

simplejson.dumps(your_object_to_encode, cls=simplejson.encoder.JSONEncoderForHTML)

Old Answer

Try wrapping your script in CDATA:

尝试将您的脚本包装在CDATA中:

<script>
//<![CDATA[
var assets = [{
  "asset_created": null, 
  "asset_id": "575155948f7d4c4ebccb02d4e8f84d2f", 
  "body": "<script></script>"
}];
//]]>
</script>

It's meant to flag the parser on this sort of thing. Otherwise you'll need to use the character escapes that have been mentioned.

它意味着在这类事情上标记解析器。否则,您将需要使用已提到的字符转义。

#1


6  

Updated Answer

Okay I assumed a few things incorrectly. For escaping the JSON, the simplejson library has a method JSONEncoderForHTML than can be used. You may need to install it via pip or easy_install if the code doesn't work. Then you can do something like this:

好吧,我假设了一些不正确的事情。为了转义JSON,simplejson库有一个比可以使用的方法JSONEncoderForHTML。如果代码不起作用,您可能需要通过pip或easy_install安装它。然后你可以做这样的事情:

import simplejson
asset_json=simplejson.loads(json_string)
encoded=simplejson.encoder.JSONEncoderForHTML().encode(assets_json)

which encoded will give you this:

哪个编码会给你这个:

'{"asset_id": "575155948f7d4c4ebccb02d4e8f84d2f", "body": "\\u003cscript\\u003e\\u003c/script\\u003e", "asset_created": null}'

This is a more overall solution than the slash replace as it handles other encoding caveats as well.

这是一个比斜杠替换更全面的解决方案,因为它也处理其他编码警告。

The loads part is a side-effect of having the JSON already encoded. This can be avoided by not using DJango if possible to generate the JSON and instead using simplejson:

加载部分是已经编码JSON的副作用。如果可能的话,不使用DJango来生成JSON而不是使用simplejson可以避免这种情况:

simplejson.dumps(your_object_to_encode, cls=simplejson.encoder.JSONEncoderForHTML)

Old Answer

Try wrapping your script in CDATA:

尝试将您的脚本包装在CDATA中:

<script>
//<![CDATA[
var assets = [{
  "asset_created": null, 
  "asset_id": "575155948f7d4c4ebccb02d4e8f84d2f", 
  "body": "<script></script>"
}];
//]]>
</script>

It's meant to flag the parser on this sort of thing. Otherwise you'll need to use the character escapes that have been mentioned.

它意味着在这类事情上标记解析器。否则,您将需要使用已提到的字符转义。