I'm building up a row to insert in a table using jQuery by creating a html string, i.e.
我正在构建一个行,通过创建一个html字符串(例如
var row = "";
row += "<tr>";
row += "<td>Name</td>";
row += "<td><input value='"+data.name+"'/></td>";
row += "</tr>";
data.name
is a string returned from an ajax call which could contain any characters. If it contains a single quote, '
, it will break the HTML by defining the end of the attribute value.
name是ajax调用返回的字符串,可以包含任何字符。如果它包含一个引号,',它将通过定义属性值的末尾来破坏HTML。
How can I ensure that the string is rendered correctly in the browser?
如何确保在浏览器中正确呈现字符串?
6 个解决方案
#1
29
You just need to swap any '
characters with the equivalent HTML entity character code:
您只需要用等价的HTML实体字符代码交换任何“字符”:
data.name.replace(/'/g, "'");
Alternatively, you could create the whole thing using jQuery's DOM manipulation methods:
或者,您可以使用jQuery的DOM操作方法创建整个内容:
var row = $("<tr>").append("<td>Name</td><td></td>");
$("<input>", { value: data.name }).appendTo(row.children("td:eq(1)"));
#2
65
Actually you may need one of these two functions (this depends on the context of use). These functions handle all kind of string quotes, and also protect from the HTML/XML syntax.
实际上,您可能需要这两个函数中的一个(这取决于使用的上下文)。这些函数处理所有类型的字符串引号,并且还保护不受HTML/XML语法的影响。
1. The quoteattr()
function for embeding text into HTML/XML:
The quoteattr()
function is used in a context, where the result will not be evaluated by javascript but must be interpreted by an XML or HTML parser, and it must absolutely avoid breaking the syntax of an element attribute.
quoteattr()函数用于上下文中,其中结果不会由javascript计算,而是必须由XML或HTML解析器解释,它必须绝对避免破坏元素属性的语法。
Newlines are natively preserved if generating the content of a text elements. However, if you're generating the value of an attribute this assigned value will be normalized by the DOM as soon as it will be set, so all whitespaces (SPACE, TAB, CR, LF) will be compressed, stripping leading and trailing whitespaces and reducing all middle sequences of whitespaces into a single SPACE.
如果生成文本元素的内容,则会保留新行。然而,如果你生成一个属性的值指定值将被规范化DOM就将被设置,所以所有空白(空格、制表符、铬、低频)将被压缩,剥离前导和尾随空白,减少中间的空白成一个单一的空间序列。
But there's an exception: the CR character will be preserved and not treated as whitespace, only if it is represented with a numeric character reference! The result will be valid for all element attributes, with the exception of attributes of type NMTOKEN or ID, or NMTOKENS: the presence of the referenced CR will make the assigned value invalid for those attributes (for example the id="..." attribute of HTML elements): this value being invalid, will be ignored by the DOM. But in other attributes (of type CDATA), all CR characters represented by a numeric character reference will be preserved and not normalized. Note that this trick will not work to preserve other whitespaces (SPACE, TAB, LF), even if they are represented by NCR, because the normalization of all whitespaces (with the exception of the NCR to CR) is mandatory in all attributes.
但是有一个例外:只有用数字字符引用表示CR字符时,才会保留CR字符而不将其视为空格!结果将是有效的对所有元素的属性,除了NMTOKEN或ID类型的属性,或NMTOKEN:引用CR的存在将使这些属性的指定值无效(例如HTML元素的ID = "…"属性):这个值是无效的,将被忽略的DOM。但在其他属性(类型CDATA)中,由数字字符引用表示的所有CR字符都将被保留,而非规范化。请注意,这个技巧将不会用于保存其他的空白(空格、制表符、LF),即使它们是用NCR表示的,因为所有的属性都是强制性的(除了NCR到CR的例外)。
Note that this function itself does not perform any HTML/XML normalization of whitespaces, so it remains safe when generating the content of a text element (don't pass the second preserveCR parameter for such case).
请注意,这个函数本身不执行任何白空间的HTML/XML规范化,因此在生成文本元素的内容时仍然是安全的(对于这种情况,不要传递第二个防腐剂r参数)。
So if you pass an optional second parameter (whose default will be treated as if it was false) and if that parameter evaluates as true, newlines will be preserved using this NCR, when you want to generate a literal attribute value, and this attribute is of type CDATA (for example a title="..." attribute) and not of type ID, IDLIST, NMTOKEN or NMTOKENS (for example an id="..." attribute).
如果你通过一个可选的第二个参数(其违约将被视为如果它是假的)如果这个参数的值为true,换行将被保留下来使用这个NCR,当你想生成一个文字属性值,和这个属性的类型是CDATA(例如标题= "…"属性),而不是类型的ID,IDLIST NMTOKEN或NMTOKEN(例如一个ID = "…"属性)。
function quoteattr(s, preserveCR) {
preserveCR = preserveCR ? ' ' : '\n';
return ('' + s) /* Forces the conversion to string. */
.replace(/&/g, '&') /* This MUST be the 1st replacement. */
.replace(/'/g, ''') /* The 4 other predefined entities, required. */
.replace(/"/g, '"')
.replace(/</g, '<')
.replace(/>/g, '>')
/*
You may add other replacements here for HTML only
(but it's not necessary).
Or for XML, only if the named entities are defined in its DTD.
*/
.replace(/\r\n/g, preserveCR) /* Must be before the next replacement. */
.replace(/[\r\n]/g, preserveCR);
;
}
Warning! This function still does not check the source string (which is just, in Javascript, an unrestricted stream of 16-bit code units) for its validity in a file that must be a valid plain text source and also as valid source for an HTML/XML document.
警告!该函数仍然不检查源字符串(在Javascript中,它只是一个16位代码单元的无限制流)在文件中的有效性,该文件必须是一个有效的纯文本源,并且也是HTML/XML文档的有效源。
- It should be updated to detect and reject (by an exception):
- any code units representing code points assigned to non-characters (like \uFFFE and \uFFFF): this is an Unicode requirement only for valid plain-texts;
- 表示分配给非字符的代码点的任何代码单元(如\uFFFE和\uFFFF):这是对有效的明文的Unicode要求;
- any surrogate code units which are incorrectly paired to form a valid pair for an UTF-16-encoded code point: this is an Unicode requirement for valid plain-texts;
- 对于utf -16编码的代码点,任何不正确地成对形成有效对的代理代码单元:这是对有效明文的Unicode要求;
- any valid pair of surrogate code units representing a valid Unicode code point in supplementary planes, but which is assigned to non-characters (like U+10FFFE or U+10FFFF): this is an Unicode requirement only for valid plain-texts;
- 任何有效的代理代码单元表示辅助平面上的有效的Unicode代码点,但是分配给非字符(如U+10FFFE或U+10FFFF):这是只针对有效的纯文本的Unicode要求;
- most C0 and C1 controls (in the ranges \u0000..\u1F and \u007F..\u009F with the exception of TAB and newline controls): this is not an Unicode requirement but an additional requirement for valid HTML/XML.
- 大多数C0和C1控件(在\u0000范围内)。\ u1F和\ u007F . .\u009F(除了制表符和换行控件):这不是Unicode要求,而是有效的HTML/XML的附加要求。
- 它应该被更新以检测和拒绝(由一个例外):表示分配给非字符的代码点的任何代码单元(如\uFFFE和\uFFFF):这是仅对有效的纯文本的Unicode要求;任何不正确成对的代理代码单元对utf -16编码的代码点形成有效的一对:这是对有效的纯文本的Unicode要求;表示补充层中有效的Unicode代码点的任何有效代理代码单元对,但它被分配给非字符(如U+10FFFE或U+10FFFF):这是仅对有效明文的Unicode要求;大多数C0和C1控件(在\u0000范围内)。\ u1F和\ u007F . .\u009F(除了制表符和换行控件):这不是Unicode要求,而是有效的HTML/XML的附加要求。
- Despite of this limitation, the code above is almost what you'll want to do. Normally. Modern javascript engine should provide this function natively in the default system object, but in most cases, it does not completely ensure the strict plain-text validity, not the HTML/XML validity. But the HTML/XML document object from which your Javascript code will be called, should redefine this native function.
- 尽管有这些限制,上面的代码几乎就是您想要做的。正常。现代javascript引擎应该在默认的系统对象中提供这个功能,但是在大多数情况下,它不能完全保证严格的明文有效性,而不能保证HTML/XML的有效性。但是调用Javascript代码的HTML/XML文档对象应该重新定义这个本机函数。
- This limitation is usually not a problem in most cases, because the source string are the result of computing from sources strings coming from the HTML/XML DOM.
- 这种限制在大多数情况下通常都不是问题,因为源字符串是来自HTML/XML DOM的源字符串计算的结果。
- But this may fail if the javascript extract substrings and break pairs of surrogates, or if it generates text from computed numeric sources (converting any 16-bit code value into a string containing that one-code unit, and appending those short strings, or inserting these short strings via replacement operations): if you try to insert the encoded string into a HTML/XML DOM text element or in an HTML/XML attribute value or element name, the DOM will itself reject this insertion and will throw an exception; if your javascript inserts the resulting string in a local binary file or sends it via a binary network socket, there will be no exception thrown for this emission. Such non-plain text strings would also be the result of reading from a binary file (such as an PNG, GIF or JPEG image file) or from your javascript reading from a binary-safe network socket (such that the IO stream passes 16-bit code units rather than just 8-bit units: most binary I/O streams are byte-based anyway, and text I/O streams need that you specify a charset to decode files into plain-text, so that invalid encodings found in the text stream will throw an I/O exception in your script).
- 但是,如果javascript提取子字符串并中断代理对,或者从计算的数字源生成文本(将任何16位的代码值转换为包含一个代码单元的字符串,并附加这些短字符串,或者通过替换操作插入这些短字符串),这可能会失败:如果您试图将编码的字符串插入到HTML/XML DOM文本元素或HTML/XML属性值或元素名称中,DOM本身将拒绝此插入并抛出异常;如果您的javascript将结果字符串插入到本地二进制文件中,或者通过二进制网络套接字发送,则不会为该发射抛出任何异常。这样non-plain文本字符串也会阅读的结果从一个二进制文件(如一个PNG、GIF或JPEG图像文件)或从您的javascript阅读从一个二进制安全网络套接字(比如IO流将16位代码单元而不是8位单位:大多数二进制文件I / O流byte-based无论如何,和文本I / O流需要你指定字符集解码为纯文本文件,因此无效编码文本流中会抛出一个I / O异常脚本)。
Note that this function, the way it is implemented (if it is augmented to correct the limitations noted in the warning above), can be safely used as well to quote also the content of a literal text element in HTML/XML (to avoid leaving some interpretable HTML/XML elements from the source string value), not just the content of a literal attribute value ! So it should be better named quoteml()
; the name quoteattr()
is kept only by tradition.
注意这个函数,它实现的方式(如果它是增强纠正在上面的警告)提到的限制,可以安全地使用引用的内容文字文本元素的HTML / XML(避免留下一些可判断的HTML / XML元素从源字符串值),而不仅仅是文字的内容属性值!所以它应该被更好地命名为quoteml();quoteattr()这个名字是传统的。
This is the case in your example:
这就是你的例子:
data.value = "It's just a \"sample\" <test>.\n\tTry & see yourself!";
var row = '';
row += '<tr>';
row += '<td>Name</td>';
row += '<td><input value="' + quoteattr(data.value) + '" /></td>';
row += '</tr>';
Alternative to quoteattr()
, using only the DOM API:
The alternative, if the HTML code you generate will be part of the current HTML document, is to create each HTML element individually, using the DOM methods of the document, such that you can set its attribute values directly through the DOM API, instead of inserting the full HTML content using the innerHTML property of a single element :
的选择,如果您生成的HTML代码将当前HTML文档的一部分,分别是创建每个HTML元素,使用DOM文档的方法,这样你可以设置它的属性值直接通过DOM API,而不是插入完整的HTML内容使用单个元素的innerHTML属性:
data.value = "It's just a \"sample\" <test>.\n\tTry & see yourself!";
var row = document.createElement('tr');
var cell = document.createElement('td');
cell.innerText = 'Name';
row.appendChild(cell);
cell = document.createElement('td');
var input = document.createElement('input');
input.setAttribute('value', data.value);
cell.appendChild(input);
tr.appendChild(cell);
/*
The HTML code is generated automatically and is now accessible in the
row.innerHTML property, which you are not required to insert in the
current document.
But you can continue by appending tr into a 'tbody' element object, and then
insert this into a new 'table' element object, which ou can append or insert
as a child of a DOM object of your document.
*/
Note that this alternative does not attempt to preserve newlines present in the data.value, becase you're generating the content of a text element, not an attribute value here. If you really want to generate an attribute value preserving newlines using
, see the start of section 1, and the code within quoteattr()
above.
注意,此替代方法不尝试保存数据中出现的新行。值,因为您生成的是文本元素的内容,而不是属性值。如果您真的想要使用#13生成一个属性值保存新行,请参阅第1节的开头,以及上面quoteattr()中的代码。
2. The escape()
function for embedding into a javascript/JSON literal string:
In other cases, you'll use the escape()
function below when the intent is to quote a string that will be part of a generated javascript code fragment, that you also want to be preserved (that may optionally also be first parsed by an HTML/XML parser in which a larger javascript code could be inserted):
在其他情况下,您将使用下面的逃避()函数时,目的是引用一个字符串,将生成的javascript代码片段的一部分,你还想保留(也可以选择是首先解析HTML / XML解析器在一个更大的可能会插入javascript代码):
function escape(s) {
return ('' + s) /* Forces the conversion to string. */
.replace(/\\/g, '\\\\') /* This MUST be the 1st replacement. */
.replace(/\t/g, '\\t') /* These 2 replacements protect whitespaces. */
.replace(/\n/g, '\\n')
.replace(/\u00A0/g, '\\u00A0') /* Useful but not absolutely necessary. */
.replace(/&/g, '\\x26') /* These 5 replacements protect from HTML/XML. */
.replace(/'/g, '\\x27')
.replace(/"/g, '\\x22')
.replace(/</g, '\\x3C')
.replace(/>/g, '\\x3E')
;
}
Warning! This source code does not check for the validity of the encoded document as a valid plain-text document. However it should never raise an exception (except for out of memory condition): Javascript/JSON source strings are just unrestricted streams of 16-bit code units and do not need to be valid plain-text or are not restricted by HTML/XML document syntax. This means that the code is incomplete, and should also replace:
警告!此源代码不检查编码文档作为有效纯文本文档的有效性。但是,它不应该引发异常(内存不足的情况除外):Javascript/JSON源字符串只是16位代码单元的无限制流,不需要是有效的纯文本,也不受HTML/XML文档语法的限制。这意味着该守则不完整,也应取代:
- all other code units representing C0 and C1 controls (with the exception of TAB and LF, handled above, but that may be left intact without substituting them) using the \xNN notation;
- 所有其他表示C0和C1控件的代码单元(除了上面处理的选项卡和LF之外,但不替换它们可能保持完整)使用\xNN表示法;
- all code units that are assigned to non-characters in Unicode, which should be replaced using the \uNNNN notation (for example \uFFFE or \uFFFF);
- 所有在Unicode中分配给非字符的代码单元,应该使用\uNNNN表示法(例如\uFFFE或\uFFFF)替换;
- all code units usable as Unicode surrogates in the range \uD800..\DFFF, like this:
- if they are not correctly paired into a valid UTF-16 pair representing a valid Unicode code point in the full range U+0000..U+10FFFF, these surrogate code units should be individually replaced using the notation \uDNNN;
- 如果它们没有被正确地配对成一个有效的UTF-16对,表示在全范围U+0000中有效的Unicode码点。U+10FFFF,这些代理代码单元应该单独使用符号\uDNNN替换;
- else if if the code point that the code unit pair represents is not valid in Unicode plain-text, because the code point is assigned to a non-character, the two code points should be replaced using the notation \U00NNNNNN;
- 否则,如果代码单元对表示的代码点在Unicode纯文本中无效,因为代码点被分配给非字符,那么应该使用记号(\U00NNNNNN)替换这两个代码点;
- 所有代码单元可用作Unicode代理,范围为\uD800。\DFFF:如果它们不能正确地配对成有效的UTF-16对,表示在U+0000范围内有效的Unicode代码点。U+10FFFF,这些代理代码单元应该单独使用符号\uDNNN替换;否则,如果代码单元对表示的代码点在Unicode纯文本中无效,因为代码点被分配给非字符,那么应该使用记号(\U00NNNNNN)替换这两个代码点;
- finally, if the code point represented by the code unit (or the pair of code units representing a code point in a supplementary plane), independantly of if that code point is assigned or reserved/unassigned, is also invalid in HTML/XML source documents (see their specification), the code point should be replaced using the \uNNNN notation (if the code point is in the BMP) or the \u00NNNNNN (if the code point is in a supplementary plane) ;
- 最后,如果代码点所代表的代码单元(或副代码单元代表一个代码点补充平面),独立于如果代码点分配或保留/未赋值的,也是无效的HTML / XML源文档中(见他们的规范),代码点应该取代使用\ uNNNN符号(BMP)如果代码点或\ u00NNNNNN(如果代码点补充平面);
Note also that the 5 last replacements are not really necessary. But it you don't include them, you'll sometimes need to use the <![CDATA[ ... ]]>
compatibility "hack" in some cases, such as further including the generated javascript in HTML or XML (see the example below where this "hack" is used in a <script>...</script>
HTML element).
请注意,最后的5个替换实际上并不是必需的。但是如果您不包含它们,您有时需要使用…在某些情况下,>兼容“hack”,例如进一步在HTML或XML中包含生成的javascript(参见下面的示例,在<script>…< /脚本> HTML元素)。</p>
The escape()
function has the advantage that it does not insert any HTML/XML character reference, the result will be first interpreted by Javascript and it will keep later at runtime the exact string length when the resulting string will be evaluated by the javascript engine. It saves you from having to manage mixed context throughout your application code (see the final section about them and about the related security considerations). Notably because if you use quoteattr()
in this context, the javascript evaluated and executed later would have to explicitty handle character references to redecode them, something that would not be appropriate. Usage cases include:
escape()函数的优点是它不插入任何HTML/XML字符引用,结果将首先由Javascript解释,当Javascript引擎对结果字符串进行计算时,它将在运行时保持准确的字符串长度。它使您不必在整个应用程序代码中管理混合上下文(请参阅有关它们的最后一节以及有关安全性的注意事项)。值得注意的是,如果您在此上下文中使用quoteattr(),那么稍后评估和执行的javascript将必须处理字符引用以重新解码它们,这是不合适的。使用情况包括:
- when the replaced string will be inserted in a generated javascript event handler surrounded by some other HTML code where the javascript fragment will contain attributes surrounded by literal quotes).
- 当替换的字符串被插入到一个生成的javascript事件处理程序中,该事件处理程序由一些其他HTML代码包围,其中javascript片段将包含由文字引号包围的属性)。
- when the replaced string will be part of a settimeout() parameter which will be later eval()ed by the Javascript engine.
- 当替换的字符串将成为settimeout()参数的一部分时,该参数稍后将由Javascript引擎进行eval()。
Example 1 (generating only JavaScript, no HTML content generated):
var title = "It's a \"title\"!";
var msg = "Both strings contain \"quotes\" & 'apostrophes'...";
setTimeout(
'__forceCloseDialog("myDialog", "' +
escape(title) + '", "' +
escape(msg) + '")',
2000);
Exemple 2 (generating valid HTML):
var msg =
"It's just a \"sample\" <test>.\n\tTry & see yourself!";
/* This is similar to the above, but this JavaScript code will be reinserted below: */
var scriptCode =
'alert("' +
escape(msg) + /* important here!, because part of a JS string literal */
'");';
/* First case (simple when inserting in a text element): */
document.write(
'<script type="text/javascript">' +
'\n//<![CDATA[\n' + /* (not really necessary but improves compatibility) */
scriptCode +
'\n//]]>\n' + /* (not really necessary but improves compatibility) */
'</script>');
/* Second case (more complex when inserting in an HTML attribute value): */
document.write(
'<span onclick="' +
quoteattr(scriptCode) + /* important here, because part of an HTML attribute */
'">Click here !</span>');
In this second example, you see that both encoding functions are simultaneously used on the part of the generated text that is embedded in JavasSript literals (using escape()
), with the the generated JavaScript code (containing the generated string literal) being itself embedded again and reencoded using quoteattr()
, because that JavaScript code is inserted in an HTML attribute (in the second case).
在第二个例子中,你可以看到这两个编码的功能同时使用生成的文本嵌入在JavasSript文字(使用转义()),与生成的JavaScript代码(包含生成的字符串)再次被本身嵌入并使用quoteattr reencoded(),因为JavaScript代码插入在HTML属性(在第二种情况下)。
3. General considerations for safely encoding texts to embed in syntaxic contexts:
So in summary,
所以总的来说,
- the
quotattr()
function must be used when generating the contant of an HTML/XML attribute literal, where the surrounding quotes are added externally within a concatenation to produce a complete HTML/XML code. - 在生成HTML/XML属性文本的内容时,必须使用quotattr()函数,其中在外部将引号添加到一个连接中,以生成完整的HTML/XML代码。
- the
escape()
function must be used when generating the content of a JavaScript string constant literal, where the surrounding quotes are added externally within a concatenation to produce a complete HTML/XML code. - 在生成JavaScript字符串常量文本内容时,必须使用escape()函数,其中将在外部连接中添加周围的引号,以生成完整的HTML/XML代码。
- If used carefully, and everywhere you will find variable contents to safely insert into another context, and under only these rules (with the functions implemented exactly like above which takes care of "special characters" used in both contexts), you may mix both via multiple escaping, and the transform will still be safe, and will not require additional code to decode them in the application using those literals. Do not use these functions.
- 如果谨慎使用,你会发现到处都安全地插入到另一个上下文变量内容,只在这些规则(就像上面实现的函数负责“特殊字符”用于上下文),你可以通过多个混合逃离,变换仍然是安全的,不需要额外的代码在应用程序中使用这些文字解码它们。不要使用这些函数。
Those functions are only safe in those strict contexts (i.e. only HTML/XML attribute values for quoteattr()
, and only Javascript string literals for escape()
).
这些函数只在那些严格的上下文中是安全的(例如,quoteattr()只使用HTML/XML属性值,escape()只使用Javascript字符串常量)。
There are other contexts using different quoting and escaping mechanisms (e.g. SQL string literals, or Visual Basic string literals, or regular expression literals, or text fields of CSV datafiles, or MIME header values), which will each require their own distinct escaping function used only in these contexts:
还有其他使用不同的引用和转义机制的上下文(例如SQL string literals,或者Visual Basic string literals,或者正则表达式literals,或者CSV数据文件的文本字段,或者MIME header值),它们都需要各自独立的转义函数,只在这些上下文中使用:
- Never assume that
quoteattr()
orescape()
will be safe or will not alter the semantic of the escaped string, before checking first, that the syntax of (respectively) HTML/XML attribute values or JavaScript string litterals will be natively understood and supported in those contexts. - 在检查之前,不要假设quoteattr()或escape()将是安全的,或者不会改变转义字符串的语义,(分别)HTML/XML属性值或JavaScript字符串分隔符的语法将在这些上下文中被本机理解和支持。
- For example the syntax of Javascript string literals generated by
escape()
is also appropriate and natively supported in the two other contexts of string literals used in Java programming source code, or text values in JSON data. - 例如,由escape()生成的Javascript字符串的语法在Java编程源代码中使用的字符串文本的另外两种上下文环境中也得到了适当的支持,或者在JSON数据中使用文本值。
But the reverse is not always true. For example:
但事实并非总是如此。例如:
- Interpreting the encoded escaped literals initially generated for other contexts than Javascript string literals (including for example string literals in PHP source code), is not always safe for direct use as Javascript literals. through the javascript
eval()
system function to decode those generated string literals that were not escaped usingescape()
, because those other string literals may contain other special characters generated specificly to those other initial contexts, which will be incorrectly interpreted by Javascript, this could include additionnal escapes such as "\Uxxxxxxxx
", or "\e
", or "${var}
" and "$$
", or the inclusion of additional concatenation operators such as' + "
which changes the quoting style, or of "transparent" delimiters, such as "<!--
" and "-->
" or "<[DATA[
" and "]]>
" (that may be found and safe within a different only complex context supporting multiple escaping syntaxes: see below the last paragraph of this section about mixed contexts). - 解释最初为Javascript字符串常量(包括PHP源代码中的字符串常量)以外的其他上下文生成的已编码转义文字,并不总是作为Javascript常量直接使用安全的。通过javascript eval()系统功能来解码这些生成的字符串没有逃过使用转义(),因为这些其他字符串可能包含特殊字符生成日志其他初始上下文,将由javascript错误的解释,这可能包括additionnal转义,如“\ Uxxxxxxxx”,或“\ e”,或“$ { var }”和“$ $”,或加入更多的连接操作符如“+”改变引用的风格,或“透明”分隔符,如“ ”或“<[DATA] [and "] >”(可能在支持多个转义语法的不同复杂上下文中找到并安全:请参阅本节最后一段关于混合上下文的内容)。
- The same will apply to the interpretation/decoding of encoded escaped literals that were initially generated for other contexts that HTML/XML attributes values in documents created using their standard textual representation (for example, trying to interpret the string literals that were generated for embedding in a non standard binary format representation of HTML/XML documents!)
- 同样适用于解释编码/解码了文字,最初生成的HTML / XML属性值对于其他上下文使用标准文本表示创建的文档(例如,试图解释为嵌入到生成的字符串是一个非标准的二进制格式表示HTML / XML文档!)
- This will also apply to the interpretation/decoding with the javascript function
eval()
of string literals that were only safely generated for inclusion in HTML/XML attribute literals usingquotteattr()
, which will not be safe, because the contexts have been incorrectly mixed. - 这也适用于使用javascript函数eval()对字符串文本进行解释/解码,这些字符串仅使用quotteattr()安全地生成以包含在HTML/XML属性文本中,这并不安全,因为上下文混合不正确。
- This will also apply to the interpretation/decoding with an HTML/XML text document parser of attribute value literals that were only safely generated for inclusion in a Javascript string literal using
escape()
, which will not be safe, because the contexts have also been incorrectly mixed. - 这也适用于使用HTML/XML文本文档解析器进行解释/解码,该解析器包含属性值文字,仅使用escape()安全地生成,以便包含在Javascript字符串文字中,这并不安全,因为上下文也被错误地混合在一起。
4. Safely decoding the value of embedded syntaxic literals:
If you want to decode or interpret string literals in contexts were the decoded resulting string values will be used interchangeably and undistinctly without change in another context, so called mixed contexts (including, for example: naming some identifiers in HTML/XML with string literals initially dafely encoded with quotteattr()
; naming some programming variables for Javascript from strings initially safely encoded with escape()
; and so on...), you'll need to prepare and use a new escaping function (which will also check the validity of the string value before encoding it, or reject it, or truncate/simplify/filter it), as well as a new decoding function (which will also carefully avoid interpreting valid but unsafe sequences, only accepted internally but not acceptable for unsafe external sources, which also means that decoding function such as eval()
in javascript must be absolutely avoided for decoding JSON data sources, for which you'll need to use a safer native JSON decoder; a native JSON decoder will not be interpreting valid Javascript sequences, such as the inclusion of quoting delimiters in the literal expression, operators, or sequences like "{$var}
"), to enforce the safety of such mapping!
如果您想解码或解释上下文中的字符串文字,那么解码后的字符串值将在另一个上下文中交替地、不明显地使用,也就是所谓的混合上下文中(包括,例如:在HTML/XML中命名一些标识符,使用字符串文字,最初是用quotteattr()笨拙地编码的;从最初使用escape()安全地编码的字符串中命名一些Javascript编程变量;等等…),你将需要准备和使用新的转义函数(还将检查字符串值的有效性在编码之前,或者拒绝,或者截断/简化/过滤),以及一个新的解码函数(这也会小心避免解释有效但不安全的序列,只接受内部但不可接受的不安全的外部资源,这也意味着,要解码JSON数据源,必须绝对避免使用javascript中的eval()等解码函数,因此需要使用更安全的本机JSON解码器;本机JSON解码器将不会解释有效的Javascript序列,例如在文字表达式、操作符或像“{$var}”这样的序列中包含引号分隔符,以加强这种映射的安全性!
These last considerations about the decoding of literals in mixed contexts, that were only safely encoded with any syntax for the transport of data to be safe only a a more restrictive single context, is absolutely critical for the security of your application or web service. Never mix those contexts between the encoding place and the decoding place, if those places do not belong to the same security realm (but even in that case, using mixed contexts is always very dangerous, it is very difficult to track precisely in your code.
对于在混合上下文中解码文字,这些最后的考虑对于应用程序或web服务的安全性来说是绝对关键的,因为只有使用任何用于传输数据的语法才能保证安全的。如果这些地方不属于相同的安全领域,那么不要在编码位置和解码位置之间混合使用这些上下文(但是即使在这种情况下,使用混合上下文总是非常危险的,在代码中精确跟踪是非常困难的。
For this reason I recommend you never use or assume mixed contexts anywhere in your application: instead write a safe encoding and decoding function for a single precide context that has precise length and validity rules on the decoded string values, and precise length and validity rules on the encoded string string literals. Ban those mixed contexts: for each change of context, use another matching pair of encoding/decoding functions (which function is used in this pair depends on which context is embedded in the other context; and the pair of matching functions is also specific to each pair of contexts).
因为这个原因我建议你不要使用或认为混合环境中任何地方在您的应用程序:而不是写一个安全的编码和解码函数为单个precide上下文,精确的长度和有效性规则解码字符串值,和精确的长度和有效性规则字符串编码的字符串。禁止混合上下文:对于上下文的每个更改,使用另一对匹配的编码/解码函数(在这一对中使用的函数取决于在另一个上下文中嵌入的上下文;并且配对函数也特定于每一对上下文)。
This means that:
这意味着:
- To safely decode an HTML/XML attribute value literal that has been initially encoded with
quoteattr()
, you must '''not''' assume that it has been encoded using other named entities whose value will depend on a specific DTD defining it. You must instead initialize the HTML/XML parser to support only the few default named character entities generated byquoteattr()
and optionally the numeric character entities (which are also safe is such context: thequoteattr()
function only generates a few of them but could generate more of these numeric character references, but must not generate other named character entities which are not predefined in the default DTD). All other named entities must be rejected by your parser, as being invalid in the source string literal to decode. Alternatively you'll get better performance by defining anunquoteattr
function (which will reject any presence of literal quotes within the source string, as well as unsupported named entities). - 要安全地解码最初用quoteattr()编码的HTML/XML属性值文本,您必须“'not "假设它是使用其他命名实体编码的,这些实体的值将取决于定义它的特定DTD。你必须初始化HTML / XML解析器只支持quoteattr所产生的一些默认的命名字符实体()和可选的数字字符实体(这也是安全的是上下文:quoteattr只()函数生成一个很少但能产生更多的数字字符引用,但不能产生其他命名字符实体没有预定义的默认DTD)。所有其他命名实体必须被解析器拒绝,因为在要解码的源字符串文本中无效。或者,通过定义一个unquoteattr函数(它将拒绝源字符串中出现的任何文字引号,以及不支持的命名实体),您将获得更好的性能。
- To safely decode a Javascript string literal (or JSON string literal) that has been initially encoded with
escape()
, you must use the safe JavaScriptunescape()
function, but not the unsafe Javascripteval()
function! - 要安全地解码最初用escape()编码的Javascript字符串文字(或JSON字符串文字),您必须使用安全的Javascript unescape()函数,而不是不安全的Javascript eval()函数!
Examples for these two associated safe decoding functions follow.
下面是这两个相关安全解码函数的示例。
5. The unquoteattr()
function to parse text embedded in HTML/XML text elements or attribute values literals:
function unquoteattr(s) {
/*
Note: this can be implemented more efficiently by a loop searching for
ampersands, from start to end of ssource string, and parsing the
character(s) found immediately after after the ampersand.
*/
s = ('' + s); /* Forces the conversion to string type. */
/*
You may optionally start by detecting CDATA sections (like
`<![CDATA[` ... `]]>`), whose contents must not be reparsed by the
following replacements, but separated, filtered out of the CDATA
delimiters, and then concatenated into an output buffer.
The following replacements are only for sections of source text
found *outside* such CDATA sections, that will be concatenated
in the output buffer only after all the following replacements and
security checkings.
This will require a loop starting here.
The following code is only for the alternate sections that are
not within the detected CDATA sections.
*/
/* Decode by reversing the initial order of replacements. */
s = s
.replace(/\r\n/g, '\n') /* To do before the next replacement. */
.replace(/[\r\n]/, '\n')
.replace(/ /g, '\n') /* These 3 replacements keep whitespaces. */
.replace(/[03];/g, '\n')
.replace(/	/g, '\t')
.replace(/>/g, '>') /* The 4 other predefined entities required. */
.replace(/</g, '<')
.replace(/"/g, '"')
.replace(/'/g, "'")
;
/*
You may add other replacements here for predefined HTML entities only
(but it's not necessary). Or for XML, only if the named entities are
defined in *your* assumed DTD.
But you can add these replacements only if these entities will *not*
be replaced by a string value containing *any* ampersand character.
Do not decode the '&' sequence here !
If you choose to support more numeric character entities, their
decoded numeric value *must* be assigned characters or unassigned
Unicode code points, but *not* surrogates or assigned non-characters,
and *not* most C0 and C1 controls (except a few ones that are valid
in HTML/XML text elements and attribute values: TAB, LF, CR, and
NL='\x85').
If you find valid Unicode code points that are invalid characters
for XML/HTML, this function *must* reject the source string as
invalid and throw an exception.
In addition, the four possible representations of newlines (CR, LF,
CR+LF, or NL) *must* be decoded only as if they were '\n' (U+000A).
See the XML/HTML reference specifications !
*/
/* Required check for security! */
var found = /&[^;])*;?/.match(s);
if (found.length >0 && found[0] != '&')
throw 'unsafe entity found in the attribute literal content';
/* This MUST be the last replacement. */
s = s.replace(/&/g, '&');
/*
The loop needed to support CDATA sections will end here.
This is where you'll concatenate the replaced sections (CDATA or
not), if you have splitted the source string to detect and support
these CDATA sections.
Note that all backslashes found in CDATA sections do NOT have the
semantic of escapes, and are *safe*.
On the opposite, CDATA sections not properly terminated by a
matching `]]>` section terminator are *unsafe*, and must be rejected
before reaching this final point.
*/
return s;
}
Note that this function does not parse the surrounding quote delimiters which are used to surround HTML attribute values. This function can in fact decode any HTML/XML text element content as well, possibly containing literal quotes, which are safe. It's your reponsability of parsing the HTML code to extract quoted strings used in HTML/XML attributes, and to strip those matching quote delimiters before calling the unquoteattr()
function.
注意,此函数不解析用于包围HTML属性值的引号分隔符。这个函数实际上也可以解码任何HTML/XML文本元素内容,可能包含文本引号,这是安全的。解析HTML代码以提取HTML/XML属性中使用的引号字符串,并在调用unquoteattr()函数之前删除这些匹配的引号分隔符,这是您的一种响应。
6. The unescape()
function to parse text contents embedded in Javascript/JSON literals:
function unescape(s) {
/*
Note: this can be implemented more efficiently by a loop searching for
backslashes, from start to end of source string, and parsing and
dispatching the character found immediately after the backslash, if it
must be followed by additional characters such as an octal or
hexadecimal 7-bit ASCII-only encoded character, or an hexadecimal Unicode
encoded valid code point, or a valid pair of hexadecimal UTF-16-encoded
code units representing a single Unicode code point.
8-bit encoded code units for non-ASCII characters should not be used, but
if they are, they should be decoded into a 16-bit code units keeping their
numeric value, i.e. like the numeric value of an equivalent Unicode
code point (which means ISO 8859-1, not Windows 1252, including C1 controls).
Note that Javascript or JSON does NOT require code units to be paired when
they encode surrogates; and Javascript/JSON will also accept any Unicode
code point in the valid range representable as UTF-16 pairs, including
NULL, all controls, and code units assigned to non-characters.
This means that all code points in \U00000000..\U0010FFFF are valid,
as well as all 16-bit code units in \u0000..\uFFFF, in any order.
It's up to your application to restrict these valid ranges if needed.
*/
s = ('' + s) /* Forces the conversion to string. */
/* Decode by reversing the initial order of replacements */
.replace(/\\x3E/g, '>')
.replace(/\\x3C/g, '<')
.replace(/\\x22/g, '"')
.replace(/\\x27/g, "'")
.replace(/\\x26/g, '&') /* These 5 replacements protect from HTML/XML. */
.replace(/\\u00A0/g, '\u00A0') /* Useful but not absolutely necessary. */
.replace(/\\n/g, '\n')
.replace(/\\t/g, '\t') /* These 2 replacements protect whitespaces. */
;
/*
You may optionally add here support for other numerical or symbolic
character escapes.
But you can add these replacements only if these entities will *not*
be replaced by a string value containing *any* backslash character.
Do not decode to any doubled backslashes here !
*/
/* Required check for security! */
var found = /\\[^\\])?/.match(s);
if (found.length > 0 && found[0] != '\\\\')
throw 'Unsafe or unsupported escape found in the literal string content';
/* This MUST be the last replacement. */
return s.replace(/\\\\/g, '\\');
}
Note that this function does not parse the surrounding quote delimiters which are used to surround Javascript or JSON string litterals. It's your reponsability of parsing the Javascript or JSON source code to extract quoted strings literals, and to strip those matching quote delimiters before calling the unescape()
function.
注意,这个函数没有解析周围的引号分隔符,它们被用来包围Javascript或JSON字符串的litterals。在调用unescape()函数之前,您可以对Javascript或JSON源代码进行解析,以提取所引用的字符串文本,并在调用unescape()函数时删除这些匹配的引号分隔符。
#3
8
" = " or "
' = '
Examples:
例子:
<div attr="Tim "The Toolman" Taylor"
<div attr='Tim "The Toolman" Taylor'
<div attr="Tim 'The Toolman' Taylor"
<div attr='Tim 'The Toolman' Taylor'
In JavaScript strings, you use \ to escape the quote character:
在JavaScript字符串中,您使用\来避免引用字符:
var s = "Tim \"The Toolman\" Taylor";
var s = 'Tim \'The Toolman\' Taylor';
So, quote your attribute values with " and use a function like this:
因此,引用你的属性值,并使用如下函数:
function escapeAttrNodeValue(value) {
return value.replace(/(&)|(")|(\u00A0)/g, function(match, amp, quote) {
if (amp) return "&";
if (quote) return """;
return " ";
});
}
#4
3
I think you could do:
我认为你可以做到:
var row = "";
row += "<tr>";
row += "<td>Name</td>";
row += "<td><input value=\""+data.name+"\"/></td>";
row += "</tr>";
If you are worried about in data.name
which is existing single quote.
如果你担心在data.name中存在单引号。
In best case, you could create an INPUT
element then setValue(data.name)
for it.
在最好的情况下,您可以为它创建一个输入元素然后setValue(data.name)。
#5
3
My answer is partially based on Andy E and I still recommend reading what verdy_p wrote, but here it is
我的回答部分基于Andy E,我仍然建议阅读verdy_p所写的内容,但是在这里
$("<a>", { href: 'very<script>\'b"ad' }).text('click me')[0].outerHTML
Disclaimer: this is answer not to exact question, but just "how to escape attribute"
免责声明:这不是对确切问题的回答,而是“如何转义属性”
#6
1
The given answers seem rather complicated, so for my use case I have tried the built in encodeURIComponent
and decodeURIComponent
and have found they worked well.
给定的答案似乎相当复杂,因此对于我的用例,我尝试了内置的encodeURIComponent和decodeURIComponent,发现它们工作得很好。
#1
29
You just need to swap any '
characters with the equivalent HTML entity character code:
您只需要用等价的HTML实体字符代码交换任何“字符”:
data.name.replace(/'/g, "'");
Alternatively, you could create the whole thing using jQuery's DOM manipulation methods:
或者,您可以使用jQuery的DOM操作方法创建整个内容:
var row = $("<tr>").append("<td>Name</td><td></td>");
$("<input>", { value: data.name }).appendTo(row.children("td:eq(1)"));
#2
65
Actually you may need one of these two functions (this depends on the context of use). These functions handle all kind of string quotes, and also protect from the HTML/XML syntax.
实际上,您可能需要这两个函数中的一个(这取决于使用的上下文)。这些函数处理所有类型的字符串引号,并且还保护不受HTML/XML语法的影响。
1. The quoteattr()
function for embeding text into HTML/XML:
The quoteattr()
function is used in a context, where the result will not be evaluated by javascript but must be interpreted by an XML or HTML parser, and it must absolutely avoid breaking the syntax of an element attribute.
quoteattr()函数用于上下文中,其中结果不会由javascript计算,而是必须由XML或HTML解析器解释,它必须绝对避免破坏元素属性的语法。
Newlines are natively preserved if generating the content of a text elements. However, if you're generating the value of an attribute this assigned value will be normalized by the DOM as soon as it will be set, so all whitespaces (SPACE, TAB, CR, LF) will be compressed, stripping leading and trailing whitespaces and reducing all middle sequences of whitespaces into a single SPACE.
如果生成文本元素的内容,则会保留新行。然而,如果你生成一个属性的值指定值将被规范化DOM就将被设置,所以所有空白(空格、制表符、铬、低频)将被压缩,剥离前导和尾随空白,减少中间的空白成一个单一的空间序列。
But there's an exception: the CR character will be preserved and not treated as whitespace, only if it is represented with a numeric character reference! The result will be valid for all element attributes, with the exception of attributes of type NMTOKEN or ID, or NMTOKENS: the presence of the referenced CR will make the assigned value invalid for those attributes (for example the id="..." attribute of HTML elements): this value being invalid, will be ignored by the DOM. But in other attributes (of type CDATA), all CR characters represented by a numeric character reference will be preserved and not normalized. Note that this trick will not work to preserve other whitespaces (SPACE, TAB, LF), even if they are represented by NCR, because the normalization of all whitespaces (with the exception of the NCR to CR) is mandatory in all attributes.
但是有一个例外:只有用数字字符引用表示CR字符时,才会保留CR字符而不将其视为空格!结果将是有效的对所有元素的属性,除了NMTOKEN或ID类型的属性,或NMTOKEN:引用CR的存在将使这些属性的指定值无效(例如HTML元素的ID = "…"属性):这个值是无效的,将被忽略的DOM。但在其他属性(类型CDATA)中,由数字字符引用表示的所有CR字符都将被保留,而非规范化。请注意,这个技巧将不会用于保存其他的空白(空格、制表符、LF),即使它们是用NCR表示的,因为所有的属性都是强制性的(除了NCR到CR的例外)。
Note that this function itself does not perform any HTML/XML normalization of whitespaces, so it remains safe when generating the content of a text element (don't pass the second preserveCR parameter for such case).
请注意,这个函数本身不执行任何白空间的HTML/XML规范化,因此在生成文本元素的内容时仍然是安全的(对于这种情况,不要传递第二个防腐剂r参数)。
So if you pass an optional second parameter (whose default will be treated as if it was false) and if that parameter evaluates as true, newlines will be preserved using this NCR, when you want to generate a literal attribute value, and this attribute is of type CDATA (for example a title="..." attribute) and not of type ID, IDLIST, NMTOKEN or NMTOKENS (for example an id="..." attribute).
如果你通过一个可选的第二个参数(其违约将被视为如果它是假的)如果这个参数的值为true,换行将被保留下来使用这个NCR,当你想生成一个文字属性值,和这个属性的类型是CDATA(例如标题= "…"属性),而不是类型的ID,IDLIST NMTOKEN或NMTOKEN(例如一个ID = "…"属性)。
function quoteattr(s, preserveCR) {
preserveCR = preserveCR ? ' ' : '\n';
return ('' + s) /* Forces the conversion to string. */
.replace(/&/g, '&') /* This MUST be the 1st replacement. */
.replace(/'/g, ''') /* The 4 other predefined entities, required. */
.replace(/"/g, '"')
.replace(/</g, '<')
.replace(/>/g, '>')
/*
You may add other replacements here for HTML only
(but it's not necessary).
Or for XML, only if the named entities are defined in its DTD.
*/
.replace(/\r\n/g, preserveCR) /* Must be before the next replacement. */
.replace(/[\r\n]/g, preserveCR);
;
}
Warning! This function still does not check the source string (which is just, in Javascript, an unrestricted stream of 16-bit code units) for its validity in a file that must be a valid plain text source and also as valid source for an HTML/XML document.
警告!该函数仍然不检查源字符串(在Javascript中,它只是一个16位代码单元的无限制流)在文件中的有效性,该文件必须是一个有效的纯文本源,并且也是HTML/XML文档的有效源。
- It should be updated to detect and reject (by an exception):
- any code units representing code points assigned to non-characters (like \uFFFE and \uFFFF): this is an Unicode requirement only for valid plain-texts;
- 表示分配给非字符的代码点的任何代码单元(如\uFFFE和\uFFFF):这是对有效的明文的Unicode要求;
- any surrogate code units which are incorrectly paired to form a valid pair for an UTF-16-encoded code point: this is an Unicode requirement for valid plain-texts;
- 对于utf -16编码的代码点,任何不正确地成对形成有效对的代理代码单元:这是对有效明文的Unicode要求;
- any valid pair of surrogate code units representing a valid Unicode code point in supplementary planes, but which is assigned to non-characters (like U+10FFFE or U+10FFFF): this is an Unicode requirement only for valid plain-texts;
- 任何有效的代理代码单元表示辅助平面上的有效的Unicode代码点,但是分配给非字符(如U+10FFFE或U+10FFFF):这是只针对有效的纯文本的Unicode要求;
- most C0 and C1 controls (in the ranges \u0000..\u1F and \u007F..\u009F with the exception of TAB and newline controls): this is not an Unicode requirement but an additional requirement for valid HTML/XML.
- 大多数C0和C1控件(在\u0000范围内)。\ u1F和\ u007F . .\u009F(除了制表符和换行控件):这不是Unicode要求,而是有效的HTML/XML的附加要求。
- 它应该被更新以检测和拒绝(由一个例外):表示分配给非字符的代码点的任何代码单元(如\uFFFE和\uFFFF):这是仅对有效的纯文本的Unicode要求;任何不正确成对的代理代码单元对utf -16编码的代码点形成有效的一对:这是对有效的纯文本的Unicode要求;表示补充层中有效的Unicode代码点的任何有效代理代码单元对,但它被分配给非字符(如U+10FFFE或U+10FFFF):这是仅对有效明文的Unicode要求;大多数C0和C1控件(在\u0000范围内)。\ u1F和\ u007F . .\u009F(除了制表符和换行控件):这不是Unicode要求,而是有效的HTML/XML的附加要求。
- Despite of this limitation, the code above is almost what you'll want to do. Normally. Modern javascript engine should provide this function natively in the default system object, but in most cases, it does not completely ensure the strict plain-text validity, not the HTML/XML validity. But the HTML/XML document object from which your Javascript code will be called, should redefine this native function.
- 尽管有这些限制,上面的代码几乎就是您想要做的。正常。现代javascript引擎应该在默认的系统对象中提供这个功能,但是在大多数情况下,它不能完全保证严格的明文有效性,而不能保证HTML/XML的有效性。但是调用Javascript代码的HTML/XML文档对象应该重新定义这个本机函数。
- This limitation is usually not a problem in most cases, because the source string are the result of computing from sources strings coming from the HTML/XML DOM.
- 这种限制在大多数情况下通常都不是问题,因为源字符串是来自HTML/XML DOM的源字符串计算的结果。
- But this may fail if the javascript extract substrings and break pairs of surrogates, or if it generates text from computed numeric sources (converting any 16-bit code value into a string containing that one-code unit, and appending those short strings, or inserting these short strings via replacement operations): if you try to insert the encoded string into a HTML/XML DOM text element or in an HTML/XML attribute value or element name, the DOM will itself reject this insertion and will throw an exception; if your javascript inserts the resulting string in a local binary file or sends it via a binary network socket, there will be no exception thrown for this emission. Such non-plain text strings would also be the result of reading from a binary file (such as an PNG, GIF or JPEG image file) or from your javascript reading from a binary-safe network socket (such that the IO stream passes 16-bit code units rather than just 8-bit units: most binary I/O streams are byte-based anyway, and text I/O streams need that you specify a charset to decode files into plain-text, so that invalid encodings found in the text stream will throw an I/O exception in your script).
- 但是,如果javascript提取子字符串并中断代理对,或者从计算的数字源生成文本(将任何16位的代码值转换为包含一个代码单元的字符串,并附加这些短字符串,或者通过替换操作插入这些短字符串),这可能会失败:如果您试图将编码的字符串插入到HTML/XML DOM文本元素或HTML/XML属性值或元素名称中,DOM本身将拒绝此插入并抛出异常;如果您的javascript将结果字符串插入到本地二进制文件中,或者通过二进制网络套接字发送,则不会为该发射抛出任何异常。这样non-plain文本字符串也会阅读的结果从一个二进制文件(如一个PNG、GIF或JPEG图像文件)或从您的javascript阅读从一个二进制安全网络套接字(比如IO流将16位代码单元而不是8位单位:大多数二进制文件I / O流byte-based无论如何,和文本I / O流需要你指定字符集解码为纯文本文件,因此无效编码文本流中会抛出一个I / O异常脚本)。
Note that this function, the way it is implemented (if it is augmented to correct the limitations noted in the warning above), can be safely used as well to quote also the content of a literal text element in HTML/XML (to avoid leaving some interpretable HTML/XML elements from the source string value), not just the content of a literal attribute value ! So it should be better named quoteml()
; the name quoteattr()
is kept only by tradition.
注意这个函数,它实现的方式(如果它是增强纠正在上面的警告)提到的限制,可以安全地使用引用的内容文字文本元素的HTML / XML(避免留下一些可判断的HTML / XML元素从源字符串值),而不仅仅是文字的内容属性值!所以它应该被更好地命名为quoteml();quoteattr()这个名字是传统的。
This is the case in your example:
这就是你的例子:
data.value = "It's just a \"sample\" <test>.\n\tTry & see yourself!";
var row = '';
row += '<tr>';
row += '<td>Name</td>';
row += '<td><input value="' + quoteattr(data.value) + '" /></td>';
row += '</tr>';
Alternative to quoteattr()
, using only the DOM API:
The alternative, if the HTML code you generate will be part of the current HTML document, is to create each HTML element individually, using the DOM methods of the document, such that you can set its attribute values directly through the DOM API, instead of inserting the full HTML content using the innerHTML property of a single element :
的选择,如果您生成的HTML代码将当前HTML文档的一部分,分别是创建每个HTML元素,使用DOM文档的方法,这样你可以设置它的属性值直接通过DOM API,而不是插入完整的HTML内容使用单个元素的innerHTML属性:
data.value = "It's just a \"sample\" <test>.\n\tTry & see yourself!";
var row = document.createElement('tr');
var cell = document.createElement('td');
cell.innerText = 'Name';
row.appendChild(cell);
cell = document.createElement('td');
var input = document.createElement('input');
input.setAttribute('value', data.value);
cell.appendChild(input);
tr.appendChild(cell);
/*
The HTML code is generated automatically and is now accessible in the
row.innerHTML property, which you are not required to insert in the
current document.
But you can continue by appending tr into a 'tbody' element object, and then
insert this into a new 'table' element object, which ou can append or insert
as a child of a DOM object of your document.
*/
Note that this alternative does not attempt to preserve newlines present in the data.value, becase you're generating the content of a text element, not an attribute value here. If you really want to generate an attribute value preserving newlines using
, see the start of section 1, and the code within quoteattr()
above.
注意,此替代方法不尝试保存数据中出现的新行。值,因为您生成的是文本元素的内容,而不是属性值。如果您真的想要使用#13生成一个属性值保存新行,请参阅第1节的开头,以及上面quoteattr()中的代码。
2. The escape()
function for embedding into a javascript/JSON literal string:
In other cases, you'll use the escape()
function below when the intent is to quote a string that will be part of a generated javascript code fragment, that you also want to be preserved (that may optionally also be first parsed by an HTML/XML parser in which a larger javascript code could be inserted):
在其他情况下,您将使用下面的逃避()函数时,目的是引用一个字符串,将生成的javascript代码片段的一部分,你还想保留(也可以选择是首先解析HTML / XML解析器在一个更大的可能会插入javascript代码):
function escape(s) {
return ('' + s) /* Forces the conversion to string. */
.replace(/\\/g, '\\\\') /* This MUST be the 1st replacement. */
.replace(/\t/g, '\\t') /* These 2 replacements protect whitespaces. */
.replace(/\n/g, '\\n')
.replace(/\u00A0/g, '\\u00A0') /* Useful but not absolutely necessary. */
.replace(/&/g, '\\x26') /* These 5 replacements protect from HTML/XML. */
.replace(/'/g, '\\x27')
.replace(/"/g, '\\x22')
.replace(/</g, '\\x3C')
.replace(/>/g, '\\x3E')
;
}
Warning! This source code does not check for the validity of the encoded document as a valid plain-text document. However it should never raise an exception (except for out of memory condition): Javascript/JSON source strings are just unrestricted streams of 16-bit code units and do not need to be valid plain-text or are not restricted by HTML/XML document syntax. This means that the code is incomplete, and should also replace:
警告!此源代码不检查编码文档作为有效纯文本文档的有效性。但是,它不应该引发异常(内存不足的情况除外):Javascript/JSON源字符串只是16位代码单元的无限制流,不需要是有效的纯文本,也不受HTML/XML文档语法的限制。这意味着该守则不完整,也应取代:
- all other code units representing C0 and C1 controls (with the exception of TAB and LF, handled above, but that may be left intact without substituting them) using the \xNN notation;
- 所有其他表示C0和C1控件的代码单元(除了上面处理的选项卡和LF之外,但不替换它们可能保持完整)使用\xNN表示法;
- all code units that are assigned to non-characters in Unicode, which should be replaced using the \uNNNN notation (for example \uFFFE or \uFFFF);
- 所有在Unicode中分配给非字符的代码单元,应该使用\uNNNN表示法(例如\uFFFE或\uFFFF)替换;
- all code units usable as Unicode surrogates in the range \uD800..\DFFF, like this:
- if they are not correctly paired into a valid UTF-16 pair representing a valid Unicode code point in the full range U+0000..U+10FFFF, these surrogate code units should be individually replaced using the notation \uDNNN;
- 如果它们没有被正确地配对成一个有效的UTF-16对,表示在全范围U+0000中有效的Unicode码点。U+10FFFF,这些代理代码单元应该单独使用符号\uDNNN替换;
- else if if the code point that the code unit pair represents is not valid in Unicode plain-text, because the code point is assigned to a non-character, the two code points should be replaced using the notation \U00NNNNNN;
- 否则,如果代码单元对表示的代码点在Unicode纯文本中无效,因为代码点被分配给非字符,那么应该使用记号(\U00NNNNNN)替换这两个代码点;
- 所有代码单元可用作Unicode代理,范围为\uD800。\DFFF:如果它们不能正确地配对成有效的UTF-16对,表示在U+0000范围内有效的Unicode代码点。U+10FFFF,这些代理代码单元应该单独使用符号\uDNNN替换;否则,如果代码单元对表示的代码点在Unicode纯文本中无效,因为代码点被分配给非字符,那么应该使用记号(\U00NNNNNN)替换这两个代码点;
- finally, if the code point represented by the code unit (or the pair of code units representing a code point in a supplementary plane), independantly of if that code point is assigned or reserved/unassigned, is also invalid in HTML/XML source documents (see their specification), the code point should be replaced using the \uNNNN notation (if the code point is in the BMP) or the \u00NNNNNN (if the code point is in a supplementary plane) ;
- 最后,如果代码点所代表的代码单元(或副代码单元代表一个代码点补充平面),独立于如果代码点分配或保留/未赋值的,也是无效的HTML / XML源文档中(见他们的规范),代码点应该取代使用\ uNNNN符号(BMP)如果代码点或\ u00NNNNNN(如果代码点补充平面);
Note also that the 5 last replacements are not really necessary. But it you don't include them, you'll sometimes need to use the <![CDATA[ ... ]]>
compatibility "hack" in some cases, such as further including the generated javascript in HTML or XML (see the example below where this "hack" is used in a <script>...</script>
HTML element).
请注意,最后的5个替换实际上并不是必需的。但是如果您不包含它们,您有时需要使用…在某些情况下,>兼容“hack”,例如进一步在HTML或XML中包含生成的javascript(参见下面的示例,在<script>…< /脚本> HTML元素)。</p>
The escape()
function has the advantage that it does not insert any HTML/XML character reference, the result will be first interpreted by Javascript and it will keep later at runtime the exact string length when the resulting string will be evaluated by the javascript engine. It saves you from having to manage mixed context throughout your application code (see the final section about them and about the related security considerations). Notably because if you use quoteattr()
in this context, the javascript evaluated and executed later would have to explicitty handle character references to redecode them, something that would not be appropriate. Usage cases include:
escape()函数的优点是它不插入任何HTML/XML字符引用,结果将首先由Javascript解释,当Javascript引擎对结果字符串进行计算时,它将在运行时保持准确的字符串长度。它使您不必在整个应用程序代码中管理混合上下文(请参阅有关它们的最后一节以及有关安全性的注意事项)。值得注意的是,如果您在此上下文中使用quoteattr(),那么稍后评估和执行的javascript将必须处理字符引用以重新解码它们,这是不合适的。使用情况包括:
- when the replaced string will be inserted in a generated javascript event handler surrounded by some other HTML code where the javascript fragment will contain attributes surrounded by literal quotes).
- 当替换的字符串被插入到一个生成的javascript事件处理程序中,该事件处理程序由一些其他HTML代码包围,其中javascript片段将包含由文字引号包围的属性)。
- when the replaced string will be part of a settimeout() parameter which will be later eval()ed by the Javascript engine.
- 当替换的字符串将成为settimeout()参数的一部分时,该参数稍后将由Javascript引擎进行eval()。
Example 1 (generating only JavaScript, no HTML content generated):
var title = "It's a \"title\"!";
var msg = "Both strings contain \"quotes\" & 'apostrophes'...";
setTimeout(
'__forceCloseDialog("myDialog", "' +
escape(title) + '", "' +
escape(msg) + '")',
2000);
Exemple 2 (generating valid HTML):
var msg =
"It's just a \"sample\" <test>.\n\tTry & see yourself!";
/* This is similar to the above, but this JavaScript code will be reinserted below: */
var scriptCode =
'alert("' +
escape(msg) + /* important here!, because part of a JS string literal */
'");';
/* First case (simple when inserting in a text element): */
document.write(
'<script type="text/javascript">' +
'\n//<![CDATA[\n' + /* (not really necessary but improves compatibility) */
scriptCode +
'\n//]]>\n' + /* (not really necessary but improves compatibility) */
'</script>');
/* Second case (more complex when inserting in an HTML attribute value): */
document.write(
'<span onclick="' +
quoteattr(scriptCode) + /* important here, because part of an HTML attribute */
'">Click here !</span>');
In this second example, you see that both encoding functions are simultaneously used on the part of the generated text that is embedded in JavasSript literals (using escape()
), with the the generated JavaScript code (containing the generated string literal) being itself embedded again and reencoded using quoteattr()
, because that JavaScript code is inserted in an HTML attribute (in the second case).
在第二个例子中,你可以看到这两个编码的功能同时使用生成的文本嵌入在JavasSript文字(使用转义()),与生成的JavaScript代码(包含生成的字符串)再次被本身嵌入并使用quoteattr reencoded(),因为JavaScript代码插入在HTML属性(在第二种情况下)。
3. General considerations for safely encoding texts to embed in syntaxic contexts:
So in summary,
所以总的来说,
- the
quotattr()
function must be used when generating the contant of an HTML/XML attribute literal, where the surrounding quotes are added externally within a concatenation to produce a complete HTML/XML code. - 在生成HTML/XML属性文本的内容时,必须使用quotattr()函数,其中在外部将引号添加到一个连接中,以生成完整的HTML/XML代码。
- the
escape()
function must be used when generating the content of a JavaScript string constant literal, where the surrounding quotes are added externally within a concatenation to produce a complete HTML/XML code. - 在生成JavaScript字符串常量文本内容时,必须使用escape()函数,其中将在外部连接中添加周围的引号,以生成完整的HTML/XML代码。
- If used carefully, and everywhere you will find variable contents to safely insert into another context, and under only these rules (with the functions implemented exactly like above which takes care of "special characters" used in both contexts), you may mix both via multiple escaping, and the transform will still be safe, and will not require additional code to decode them in the application using those literals. Do not use these functions.
- 如果谨慎使用,你会发现到处都安全地插入到另一个上下文变量内容,只在这些规则(就像上面实现的函数负责“特殊字符”用于上下文),你可以通过多个混合逃离,变换仍然是安全的,不需要额外的代码在应用程序中使用这些文字解码它们。不要使用这些函数。
Those functions are only safe in those strict contexts (i.e. only HTML/XML attribute values for quoteattr()
, and only Javascript string literals for escape()
).
这些函数只在那些严格的上下文中是安全的(例如,quoteattr()只使用HTML/XML属性值,escape()只使用Javascript字符串常量)。
There are other contexts using different quoting and escaping mechanisms (e.g. SQL string literals, or Visual Basic string literals, or regular expression literals, or text fields of CSV datafiles, or MIME header values), which will each require their own distinct escaping function used only in these contexts:
还有其他使用不同的引用和转义机制的上下文(例如SQL string literals,或者Visual Basic string literals,或者正则表达式literals,或者CSV数据文件的文本字段,或者MIME header值),它们都需要各自独立的转义函数,只在这些上下文中使用:
- Never assume that
quoteattr()
orescape()
will be safe or will not alter the semantic of the escaped string, before checking first, that the syntax of (respectively) HTML/XML attribute values or JavaScript string litterals will be natively understood and supported in those contexts. - 在检查之前,不要假设quoteattr()或escape()将是安全的,或者不会改变转义字符串的语义,(分别)HTML/XML属性值或JavaScript字符串分隔符的语法将在这些上下文中被本机理解和支持。
- For example the syntax of Javascript string literals generated by
escape()
is also appropriate and natively supported in the two other contexts of string literals used in Java programming source code, or text values in JSON data. - 例如,由escape()生成的Javascript字符串的语法在Java编程源代码中使用的字符串文本的另外两种上下文环境中也得到了适当的支持,或者在JSON数据中使用文本值。
But the reverse is not always true. For example:
但事实并非总是如此。例如:
- Interpreting the encoded escaped literals initially generated for other contexts than Javascript string literals (including for example string literals in PHP source code), is not always safe for direct use as Javascript literals. through the javascript
eval()
system function to decode those generated string literals that were not escaped usingescape()
, because those other string literals may contain other special characters generated specificly to those other initial contexts, which will be incorrectly interpreted by Javascript, this could include additionnal escapes such as "\Uxxxxxxxx
", or "\e
", or "${var}
" and "$$
", or the inclusion of additional concatenation operators such as' + "
which changes the quoting style, or of "transparent" delimiters, such as "<!--
" and "-->
" or "<[DATA[
" and "]]>
" (that may be found and safe within a different only complex context supporting multiple escaping syntaxes: see below the last paragraph of this section about mixed contexts). - 解释最初为Javascript字符串常量(包括PHP源代码中的字符串常量)以外的其他上下文生成的已编码转义文字,并不总是作为Javascript常量直接使用安全的。通过javascript eval()系统功能来解码这些生成的字符串没有逃过使用转义(),因为这些其他字符串可能包含特殊字符生成日志其他初始上下文,将由javascript错误的解释,这可能包括additionnal转义,如“\ Uxxxxxxxx”,或“\ e”,或“$ { var }”和“$ $”,或加入更多的连接操作符如“+”改变引用的风格,或“透明”分隔符,如“ ”或“<[DATA] [and "] >”(可能在支持多个转义语法的不同复杂上下文中找到并安全:请参阅本节最后一段关于混合上下文的内容)。
- The same will apply to the interpretation/decoding of encoded escaped literals that were initially generated for other contexts that HTML/XML attributes values in documents created using their standard textual representation (for example, trying to interpret the string literals that were generated for embedding in a non standard binary format representation of HTML/XML documents!)
- 同样适用于解释编码/解码了文字,最初生成的HTML / XML属性值对于其他上下文使用标准文本表示创建的文档(例如,试图解释为嵌入到生成的字符串是一个非标准的二进制格式表示HTML / XML文档!)
- This will also apply to the interpretation/decoding with the javascript function
eval()
of string literals that were only safely generated for inclusion in HTML/XML attribute literals usingquotteattr()
, which will not be safe, because the contexts have been incorrectly mixed. - 这也适用于使用javascript函数eval()对字符串文本进行解释/解码,这些字符串仅使用quotteattr()安全地生成以包含在HTML/XML属性文本中,这并不安全,因为上下文混合不正确。
- This will also apply to the interpretation/decoding with an HTML/XML text document parser of attribute value literals that were only safely generated for inclusion in a Javascript string literal using
escape()
, which will not be safe, because the contexts have also been incorrectly mixed. - 这也适用于使用HTML/XML文本文档解析器进行解释/解码,该解析器包含属性值文字,仅使用escape()安全地生成,以便包含在Javascript字符串文字中,这并不安全,因为上下文也被错误地混合在一起。
4. Safely decoding the value of embedded syntaxic literals:
If you want to decode or interpret string literals in contexts were the decoded resulting string values will be used interchangeably and undistinctly without change in another context, so called mixed contexts (including, for example: naming some identifiers in HTML/XML with string literals initially dafely encoded with quotteattr()
; naming some programming variables for Javascript from strings initially safely encoded with escape()
; and so on...), you'll need to prepare and use a new escaping function (which will also check the validity of the string value before encoding it, or reject it, or truncate/simplify/filter it), as well as a new decoding function (which will also carefully avoid interpreting valid but unsafe sequences, only accepted internally but not acceptable for unsafe external sources, which also means that decoding function such as eval()
in javascript must be absolutely avoided for decoding JSON data sources, for which you'll need to use a safer native JSON decoder; a native JSON decoder will not be interpreting valid Javascript sequences, such as the inclusion of quoting delimiters in the literal expression, operators, or sequences like "{$var}
"), to enforce the safety of such mapping!
如果您想解码或解释上下文中的字符串文字,那么解码后的字符串值将在另一个上下文中交替地、不明显地使用,也就是所谓的混合上下文中(包括,例如:在HTML/XML中命名一些标识符,使用字符串文字,最初是用quotteattr()笨拙地编码的;从最初使用escape()安全地编码的字符串中命名一些Javascript编程变量;等等…),你将需要准备和使用新的转义函数(还将检查字符串值的有效性在编码之前,或者拒绝,或者截断/简化/过滤),以及一个新的解码函数(这也会小心避免解释有效但不安全的序列,只接受内部但不可接受的不安全的外部资源,这也意味着,要解码JSON数据源,必须绝对避免使用javascript中的eval()等解码函数,因此需要使用更安全的本机JSON解码器;本机JSON解码器将不会解释有效的Javascript序列,例如在文字表达式、操作符或像“{$var}”这样的序列中包含引号分隔符,以加强这种映射的安全性!
These last considerations about the decoding of literals in mixed contexts, that were only safely encoded with any syntax for the transport of data to be safe only a a more restrictive single context, is absolutely critical for the security of your application or web service. Never mix those contexts between the encoding place and the decoding place, if those places do not belong to the same security realm (but even in that case, using mixed contexts is always very dangerous, it is very difficult to track precisely in your code.
对于在混合上下文中解码文字,这些最后的考虑对于应用程序或web服务的安全性来说是绝对关键的,因为只有使用任何用于传输数据的语法才能保证安全的。如果这些地方不属于相同的安全领域,那么不要在编码位置和解码位置之间混合使用这些上下文(但是即使在这种情况下,使用混合上下文总是非常危险的,在代码中精确跟踪是非常困难的。
For this reason I recommend you never use or assume mixed contexts anywhere in your application: instead write a safe encoding and decoding function for a single precide context that has precise length and validity rules on the decoded string values, and precise length and validity rules on the encoded string string literals. Ban those mixed contexts: for each change of context, use another matching pair of encoding/decoding functions (which function is used in this pair depends on which context is embedded in the other context; and the pair of matching functions is also specific to each pair of contexts).
因为这个原因我建议你不要使用或认为混合环境中任何地方在您的应用程序:而不是写一个安全的编码和解码函数为单个precide上下文,精确的长度和有效性规则解码字符串值,和精确的长度和有效性规则字符串编码的字符串。禁止混合上下文:对于上下文的每个更改,使用另一对匹配的编码/解码函数(在这一对中使用的函数取决于在另一个上下文中嵌入的上下文;并且配对函数也特定于每一对上下文)。
This means that:
这意味着:
- To safely decode an HTML/XML attribute value literal that has been initially encoded with
quoteattr()
, you must '''not''' assume that it has been encoded using other named entities whose value will depend on a specific DTD defining it. You must instead initialize the HTML/XML parser to support only the few default named character entities generated byquoteattr()
and optionally the numeric character entities (which are also safe is such context: thequoteattr()
function only generates a few of them but could generate more of these numeric character references, but must not generate other named character entities which are not predefined in the default DTD). All other named entities must be rejected by your parser, as being invalid in the source string literal to decode. Alternatively you'll get better performance by defining anunquoteattr
function (which will reject any presence of literal quotes within the source string, as well as unsupported named entities). - 要安全地解码最初用quoteattr()编码的HTML/XML属性值文本,您必须“'not "假设它是使用其他命名实体编码的,这些实体的值将取决于定义它的特定DTD。你必须初始化HTML / XML解析器只支持quoteattr所产生的一些默认的命名字符实体()和可选的数字字符实体(这也是安全的是上下文:quoteattr只()函数生成一个很少但能产生更多的数字字符引用,但不能产生其他命名字符实体没有预定义的默认DTD)。所有其他命名实体必须被解析器拒绝,因为在要解码的源字符串文本中无效。或者,通过定义一个unquoteattr函数(它将拒绝源字符串中出现的任何文字引号,以及不支持的命名实体),您将获得更好的性能。
- To safely decode a Javascript string literal (or JSON string literal) that has been initially encoded with
escape()
, you must use the safe JavaScriptunescape()
function, but not the unsafe Javascripteval()
function! - 要安全地解码最初用escape()编码的Javascript字符串文字(或JSON字符串文字),您必须使用安全的Javascript unescape()函数,而不是不安全的Javascript eval()函数!
Examples for these two associated safe decoding functions follow.
下面是这两个相关安全解码函数的示例。
5. The unquoteattr()
function to parse text embedded in HTML/XML text elements or attribute values literals:
function unquoteattr(s) {
/*
Note: this can be implemented more efficiently by a loop searching for
ampersands, from start to end of ssource string, and parsing the
character(s) found immediately after after the ampersand.
*/
s = ('' + s); /* Forces the conversion to string type. */
/*
You may optionally start by detecting CDATA sections (like
`<![CDATA[` ... `]]>`), whose contents must not be reparsed by the
following replacements, but separated, filtered out of the CDATA
delimiters, and then concatenated into an output buffer.
The following replacements are only for sections of source text
found *outside* such CDATA sections, that will be concatenated
in the output buffer only after all the following replacements and
security checkings.
This will require a loop starting here.
The following code is only for the alternate sections that are
not within the detected CDATA sections.
*/
/* Decode by reversing the initial order of replacements. */
s = s
.replace(/\r\n/g, '\n') /* To do before the next replacement. */
.replace(/[\r\n]/, '\n')
.replace(/ /g, '\n') /* These 3 replacements keep whitespaces. */
.replace(/[03];/g, '\n')
.replace(/	/g, '\t')
.replace(/>/g, '>') /* The 4 other predefined entities required. */
.replace(/</g, '<')
.replace(/"/g, '"')
.replace(/'/g, "'")
;
/*
You may add other replacements here for predefined HTML entities only
(but it's not necessary). Or for XML, only if the named entities are
defined in *your* assumed DTD.
But you can add these replacements only if these entities will *not*
be replaced by a string value containing *any* ampersand character.
Do not decode the '&' sequence here !
If you choose to support more numeric character entities, their
decoded numeric value *must* be assigned characters or unassigned
Unicode code points, but *not* surrogates or assigned non-characters,
and *not* most C0 and C1 controls (except a few ones that are valid
in HTML/XML text elements and attribute values: TAB, LF, CR, and
NL='\x85').
If you find valid Unicode code points that are invalid characters
for XML/HTML, this function *must* reject the source string as
invalid and throw an exception.
In addition, the four possible representations of newlines (CR, LF,
CR+LF, or NL) *must* be decoded only as if they were '\n' (U+000A).
See the XML/HTML reference specifications !
*/
/* Required check for security! */
var found = /&[^;])*;?/.match(s);
if (found.length >0 && found[0] != '&')
throw 'unsafe entity found in the attribute literal content';
/* This MUST be the last replacement. */
s = s.replace(/&/g, '&');
/*
The loop needed to support CDATA sections will end here.
This is where you'll concatenate the replaced sections (CDATA or
not), if you have splitted the source string to detect and support
these CDATA sections.
Note that all backslashes found in CDATA sections do NOT have the
semantic of escapes, and are *safe*.
On the opposite, CDATA sections not properly terminated by a
matching `]]>` section terminator are *unsafe*, and must be rejected
before reaching this final point.
*/
return s;
}
Note that this function does not parse the surrounding quote delimiters which are used to surround HTML attribute values. This function can in fact decode any HTML/XML text element content as well, possibly containing literal quotes, which are safe. It's your reponsability of parsing the HTML code to extract quoted strings used in HTML/XML attributes, and to strip those matching quote delimiters before calling the unquoteattr()
function.
注意,此函数不解析用于包围HTML属性值的引号分隔符。这个函数实际上也可以解码任何HTML/XML文本元素内容,可能包含文本引号,这是安全的。解析HTML代码以提取HTML/XML属性中使用的引号字符串,并在调用unquoteattr()函数之前删除这些匹配的引号分隔符,这是您的一种响应。
6. The unescape()
function to parse text contents embedded in Javascript/JSON literals:
function unescape(s) {
/*
Note: this can be implemented more efficiently by a loop searching for
backslashes, from start to end of source string, and parsing and
dispatching the character found immediately after the backslash, if it
must be followed by additional characters such as an octal or
hexadecimal 7-bit ASCII-only encoded character, or an hexadecimal Unicode
encoded valid code point, or a valid pair of hexadecimal UTF-16-encoded
code units representing a single Unicode code point.
8-bit encoded code units for non-ASCII characters should not be used, but
if they are, they should be decoded into a 16-bit code units keeping their
numeric value, i.e. like the numeric value of an equivalent Unicode
code point (which means ISO 8859-1, not Windows 1252, including C1 controls).
Note that Javascript or JSON does NOT require code units to be paired when
they encode surrogates; and Javascript/JSON will also accept any Unicode
code point in the valid range representable as UTF-16 pairs, including
NULL, all controls, and code units assigned to non-characters.
This means that all code points in \U00000000..\U0010FFFF are valid,
as well as all 16-bit code units in \u0000..\uFFFF, in any order.
It's up to your application to restrict these valid ranges if needed.
*/
s = ('' + s) /* Forces the conversion to string. */
/* Decode by reversing the initial order of replacements */
.replace(/\\x3E/g, '>')
.replace(/\\x3C/g, '<')
.replace(/\\x22/g, '"')
.replace(/\\x27/g, "'")
.replace(/\\x26/g, '&') /* These 5 replacements protect from HTML/XML. */
.replace(/\\u00A0/g, '\u00A0') /* Useful but not absolutely necessary. */
.replace(/\\n/g, '\n')
.replace(/\\t/g, '\t') /* These 2 replacements protect whitespaces. */
;
/*
You may optionally add here support for other numerical or symbolic
character escapes.
But you can add these replacements only if these entities will *not*
be replaced by a string value containing *any* backslash character.
Do not decode to any doubled backslashes here !
*/
/* Required check for security! */
var found = /\\[^\\])?/.match(s);
if (found.length > 0 && found[0] != '\\\\')
throw 'Unsafe or unsupported escape found in the literal string content';
/* This MUST be the last replacement. */
return s.replace(/\\\\/g, '\\');
}
Note that this function does not parse the surrounding quote delimiters which are used to surround Javascript or JSON string litterals. It's your reponsability of parsing the Javascript or JSON source code to extract quoted strings literals, and to strip those matching quote delimiters before calling the unescape()
function.
注意,这个函数没有解析周围的引号分隔符,它们被用来包围Javascript或JSON字符串的litterals。在调用unescape()函数之前,您可以对Javascript或JSON源代码进行解析,以提取所引用的字符串文本,并在调用unescape()函数时删除这些匹配的引号分隔符。
#3
8
" = " or "
' = '
Examples:
例子:
<div attr="Tim "The Toolman" Taylor"
<div attr='Tim "The Toolman" Taylor'
<div attr="Tim 'The Toolman' Taylor"
<div attr='Tim 'The Toolman' Taylor'
In JavaScript strings, you use \ to escape the quote character:
在JavaScript字符串中,您使用\来避免引用字符:
var s = "Tim \"The Toolman\" Taylor";
var s = 'Tim \'The Toolman\' Taylor';
So, quote your attribute values with " and use a function like this:
因此,引用你的属性值,并使用如下函数:
function escapeAttrNodeValue(value) {
return value.replace(/(&)|(")|(\u00A0)/g, function(match, amp, quote) {
if (amp) return "&";
if (quote) return """;
return " ";
});
}
#4
3
I think you could do:
我认为你可以做到:
var row = "";
row += "<tr>";
row += "<td>Name</td>";
row += "<td><input value=\""+data.name+"\"/></td>";
row += "</tr>";
If you are worried about in data.name
which is existing single quote.
如果你担心在data.name中存在单引号。
In best case, you could create an INPUT
element then setValue(data.name)
for it.
在最好的情况下,您可以为它创建一个输入元素然后setValue(data.name)。
#5
3
My answer is partially based on Andy E and I still recommend reading what verdy_p wrote, but here it is
我的回答部分基于Andy E,我仍然建议阅读verdy_p所写的内容,但是在这里
$("<a>", { href: 'very<script>\'b"ad' }).text('click me')[0].outerHTML
Disclaimer: this is answer not to exact question, but just "how to escape attribute"
免责声明:这不是对确切问题的回答,而是“如何转义属性”
#6
1
The given answers seem rather complicated, so for my use case I have tried the built in encodeURIComponent
and decodeURIComponent
and have found they worked well.
给定的答案似乎相当复杂,因此对于我的用例,我尝试了内置的encodeURIComponent和decodeURIComponent,发现它们工作得很好。