Whenever I use the \u2028 character literal in my javascript source with the content type set to "text/html; charset=utf-8" I get a javascript parse errors.
每当我在javascript源代码中使用\u2028字符文本时,内容类型设置为“text/html”;我得到一个javascript解析错误。
Example:
例子:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>json</title>
<script type="text/javascript" charset="utf-8">
var string = '
';
</script>
</head>
<body>
</body>
</html>
If the <meta http-equiv>
is left out everything works as expected. I've tested this on Safari and Firefox, both exhibit the same problem.
如果 被忽略,那么一切都会按照预期运行。我已经在Safari和Firefox上测试过了,它们都有同样的问题。
Any ideas on why this is happening and how to properly fix this (without removing the encoding)?
关于为什么会发生这种情况,以及如何正确地修复这种情况(不删除编码),您有什么想法吗?
Edit: After some more research, the specific problem was that the problem character was returned using JSONP. This was then interpreted by the browser, which reads u2028 as a newline and throws an error about an invalid newline in a string.
编辑:经过更多的研究,具体的问题是问题字符使用JSONP返回。然后浏览器将其解释为换行符,它将u2028读为换行符,并在字符串中抛出一个关于无效换行符的错误。
4 个解决方案
#1
70
Yes, it's a feature of the JavaScript language, documented in the ECMAScript standard (3rd edition section 7.3), that the U+2028 and U+2029 characters count as line endings. Consequently a JavaScript parser will treat any unencoded U+2028/9 character in the same way as a newline. Since you can't put a newline inside a string literal, you get a syntax error.
是的,这是JavaScript语言的一个特性,在ECMAScript标准(第3版第7.3节)中有记载,U+2028和U+2029字符作为行尾。因此,JavaScript解析器将以与换行相同的方式处理任何未编码的U+2028/9字符。因为你不能在字符串字面上放一条新行,你会得到一个语法错误。
This is an unfortunate oversight in the design of JSON: it is not actually a proper subset of JavaScript. Raw U+2028/9 characters are valid in string literals in JSON, and will be accepted by JSON.parse
, but not so in JavaScript itself.
这是JSON设计中的一个不幸的疏忽:它实际上不是JavaScript的适当子集。原始U+2028/9字符在JSON的字符串中是有效的,JSON将接受这些字符。解析,但在JavaScript本身中不是这样。
Hence it is only safe to generate JavaScript code using a JSON parser if you're sure it explicitly \u
-escapes those characters. Some do, some don't; many \u
-escape all non-ASCII characters, which avoids the problem.
因此,只有当您确定使用JSON解析器显式地u-escape这些字符时,才可以安全地生成JavaScript代码。一些,不;许多\u-escape所有非ascii字符,这避免了问题。
#2
11
Alright,to answer my own question.
好吧,回答我自己的问题。
Normally a JSON parser strips out these problem characters, because I was retrieving JSONP I wasn't using a JSON parser, in stead the browser tried to parse the JSON itself as soon as the callback was called.
通常,JSON解析器会删除这些问题字符,因为我正在检索没有使用JSON解析器的JSONP,而浏览器在调用回调时试图解析JSON本身。
The only way to fix it was to make sure the server never returns these characters when requesting a JSONP resource.
修复它的唯一方法是在请求JSONP资源时确保服务器不会返回这些字符。
p.s. My question was about u2028, according to Douglas Crockford's json2 library all of the following characters can cause these problems:
注:我的问题是关于u2028,根据Douglas Crockford的json2图书馆,所有以下的字符都可能导致这些问题:
'\u0000\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff'
“\ u0000 \ u00ad \ u0600 - \ u0604 \ u070f \ u17b4 \ u17b5 \ u200c - \ u200f \ u2028 \ u202f \ u2060 - \ u206f \ ufeff \ ufff0 \ uffff '
#3
2
Could you just use \u2028
, instead of real character?, because U+2028 is unicode line seperator, browsers would think that as real line break character like \n
.
你可以用\u2028代替真正的角色吗?因为U+2028是unicode的行分隔符,所以浏览器会认为这是真正的换行符,比如\n。
We cannot do like
我们不能做
x = "
"
Right? but we do x = "\n"
, so might be same concept.
对吧?但是我们做x = "\n",所以可能是相同的概念。
#4
-4
Well, that makes sense, since you are telling the browser that the HTML and script are both using UTF-8, but then you specify a character that is not UTF-8 encoded. When you specify "charset=UTF-8", you are respoonsible for making sure the bytes transmitted to the browser are actually UTF-8. The web server and and browser will not do it for you in this situation.
这是有道理的,因为您告诉浏览器HTML和脚本都使用UTF-8,但是您指定的字符不是UTF-8编码的。当您指定“charset=UTF-8”时,您应该确保发送到浏览器的字节实际上是UTF-8。在这种情况下,web服务器和浏览器不会为您做这些。
#1
70
Yes, it's a feature of the JavaScript language, documented in the ECMAScript standard (3rd edition section 7.3), that the U+2028 and U+2029 characters count as line endings. Consequently a JavaScript parser will treat any unencoded U+2028/9 character in the same way as a newline. Since you can't put a newline inside a string literal, you get a syntax error.
是的,这是JavaScript语言的一个特性,在ECMAScript标准(第3版第7.3节)中有记载,U+2028和U+2029字符作为行尾。因此,JavaScript解析器将以与换行相同的方式处理任何未编码的U+2028/9字符。因为你不能在字符串字面上放一条新行,你会得到一个语法错误。
This is an unfortunate oversight in the design of JSON: it is not actually a proper subset of JavaScript. Raw U+2028/9 characters are valid in string literals in JSON, and will be accepted by JSON.parse
, but not so in JavaScript itself.
这是JSON设计中的一个不幸的疏忽:它实际上不是JavaScript的适当子集。原始U+2028/9字符在JSON的字符串中是有效的,JSON将接受这些字符。解析,但在JavaScript本身中不是这样。
Hence it is only safe to generate JavaScript code using a JSON parser if you're sure it explicitly \u
-escapes those characters. Some do, some don't; many \u
-escape all non-ASCII characters, which avoids the problem.
因此,只有当您确定使用JSON解析器显式地u-escape这些字符时,才可以安全地生成JavaScript代码。一些,不;许多\u-escape所有非ascii字符,这避免了问题。
#2
11
Alright,to answer my own question.
好吧,回答我自己的问题。
Normally a JSON parser strips out these problem characters, because I was retrieving JSONP I wasn't using a JSON parser, in stead the browser tried to parse the JSON itself as soon as the callback was called.
通常,JSON解析器会删除这些问题字符,因为我正在检索没有使用JSON解析器的JSONP,而浏览器在调用回调时试图解析JSON本身。
The only way to fix it was to make sure the server never returns these characters when requesting a JSONP resource.
修复它的唯一方法是在请求JSONP资源时确保服务器不会返回这些字符。
p.s. My question was about u2028, according to Douglas Crockford's json2 library all of the following characters can cause these problems:
注:我的问题是关于u2028,根据Douglas Crockford的json2图书馆,所有以下的字符都可能导致这些问题:
'\u0000\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff'
“\ u0000 \ u00ad \ u0600 - \ u0604 \ u070f \ u17b4 \ u17b5 \ u200c - \ u200f \ u2028 \ u202f \ u2060 - \ u206f \ ufeff \ ufff0 \ uffff '
#3
2
Could you just use \u2028
, instead of real character?, because U+2028 is unicode line seperator, browsers would think that as real line break character like \n
.
你可以用\u2028代替真正的角色吗?因为U+2028是unicode的行分隔符,所以浏览器会认为这是真正的换行符,比如\n。
We cannot do like
我们不能做
x = "
"
Right? but we do x = "\n"
, so might be same concept.
对吧?但是我们做x = "\n",所以可能是相同的概念。
#4
-4
Well, that makes sense, since you are telling the browser that the HTML and script are both using UTF-8, but then you specify a character that is not UTF-8 encoded. When you specify "charset=UTF-8", you are respoonsible for making sure the bytes transmitted to the browser are actually UTF-8. The web server and and browser will not do it for you in this situation.
这是有道理的,因为您告诉浏览器HTML和脚本都使用UTF-8,但是您指定的字符不是UTF-8编码的。当您指定“charset=UTF-8”时,您应该确保发送到浏览器的字节实际上是UTF-8。在这种情况下,web服务器和浏览器不会为您做这些。