Javascript解析错误的“\u2028”unicode字符。

Whenever I use the \u2028 character literal in my javascript source with the content type set to "text/html; charset=utf-8" I get a javascript parse errors.

每当我在javascript源代码中使用\u2028字符文本时，内容类型设置为“text/html”;我得到一个javascript解析错误。

Example:

例子:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">

<html lang="en">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <title>json</title>

    <script type="text/javascript" charset="utf-8">
    var string = '     ';
    </script>
</head>
<body>

</body>
</html>

If the <meta http-equiv> is left out everything works as expected. I've tested this on Safari and Firefox, both exhibit the same problem.

如果被忽略，那么一切都会按照预期运行。我已经在Safari和Firefox上测试过了，它们都有同样的问题。

Any ideas on why this is happening and how to properly fix this (without removing the encoding)?

关于为什么会发生这种情况，以及如何正确地修复这种情况(不删除编码)，您有什么想法吗?

Edit: After some more research, the specific problem was that the problem character was returned using JSONP. This was then interpreted by the browser, which reads u2028 as a newline and throws an error about an invalid newline in a string.

编辑:经过更多的研究，具体的问题是问题字符使用JSONP返回。然后浏览器将其解释为换行符，它将u2028读为换行符，并在字符串中抛出一个关于无效换行符的错误。

4 个解决方案

#1

Yes, it's a feature of the JavaScript language, documented in the ECMAScript standard (3rd edition section 7.3), that the U+2028 and U+2029 characters count as line endings. Consequently a JavaScript parser will treat any unencoded U+2028/9 character in the same way as a newline. Since you can't put a newline inside a string literal, you get a syntax error.

是的，这是JavaScript语言的一个特性，在ECMAScript标准(第3版第7.3节)中有记载，U+2028和U+2029字符作为行尾。因此，JavaScript解析器将以与换行相同的方式处理任何未编码的U+2028/9字符。因为你不能在字符串字面上放一条新行，你会得到一个语法错误。

This is an unfortunate oversight in the design of JSON: it is not actually a proper subset of JavaScript. Raw U+2028/9 characters are valid in string literals in JSON, and will be accepted by JSON.parse, but not so in JavaScript itself.

这是JSON设计中的一个不幸的疏忽:它实际上不是JavaScript的适当子集。原始U+2028/9字符在JSON的字符串中是有效的，JSON将接受这些字符。解析，但在JavaScript本身中不是这样。

Hence it is only safe to generate JavaScript code using a JSON parser if you're sure it explicitly \u-escapes those characters. Some do, some don't; many \u-escape all non-ASCII characters, which avoids the problem.

因此，只有当您确定使用JSON解析器显式地u-escape这些字符时，才可以安全地生成JavaScript代码。一些,不;许多\u-escape所有非ascii字符，这避免了问题。

#2

Alright,to answer my own question.

好吧，回答我自己的问题。

Normally a JSON parser strips out these problem characters, because I was retrieving JSONP I wasn't using a JSON parser, in stead the browser tried to parse the JSON itself as soon as the callback was called.

通常，JSON解析器会删除这些问题字符，因为我正在检索没有使用JSON解析器的JSONP，而浏览器在调用回调时试图解析JSON本身。

The only way to fix it was to make sure the server never returns these characters when requesting a JSONP resource.

修复它的唯一方法是在请求JSONP资源时确保服务器不会返回这些字符。

p.s. My question was about u2028, according to Douglas Crockford's json2 library all of the following characters can cause these problems:

注:我的问题是关于u2028，根据Douglas Crockford的json2图书馆，所有以下的字符都可能导致这些问题:

'\u0000\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff'

“\ u0000 \ u00ad \ u0600 - \ u0604 \ u070f \ u17b4 \ u17b5 \ u200c - \ u200f \ u2028 \ u202f \ u2060 - \ u206f \ ufeff \ ufff0 \ uffff '

#3

Could you just use \u2028, instead of real character?, because U+2028 is unicode line seperator, browsers would think that as real line break character like \n.

你可以用\u2028代替真正的角色吗?因为U+2028是unicode的行分隔符，所以浏览器会认为这是真正的换行符，比如\n。

We cannot do like

我们不能做

x = "

"

Right? but we do x = "\n", so might be same concept.

对吧?但是我们做x = "\n"，所以可能是相同的概念。

#4

-4

Well, that makes sense, since you are telling the browser that the HTML and script are both using UTF-8, but then you specify a character that is not UTF-8 encoded. When you specify "charset=UTF-8", you are respoonsible for making sure the bytes transmitted to the browser are actually UTF-8. The web server and and browser will not do it for you in this situation.

这是有道理的，因为您告诉浏览器HTML和脚本都使用UTF-8，但是您指定的字符不是UTF-8编码的。当您指定“charset=UTF-8”时，您应该确保发送到浏览器的字节实际上是UTF-8。在这种情况下，web服务器和浏览器不会为您做这些。

#1