在输入标签内的HTML名称属性中允许哪些字符?

时间:2022-02-26 22:32:02

I have a PHP script that will generate <input>s dynamically, so I was wondering if I needed to filter any characters in the name attribute.

我有一个PHP脚本,它将动态地生成s,所以我想知道是否需要过滤name属性中的任何字符。

I know that the name has to start with a letter, but I don't know any other rules. I figure square brackets must be allowed, since PHP uses these to create arrays from form data. How about parentheses? Spaces?

我知道名字必须以字母开头,但我不知道其他的规则。我认为必须允许方括号,因为PHP使用方括号从表单数据创建数组。括号呢?空间吗?

5 个解决方案

#1


28  

The only real restriction on what characters can appear in form control names is when a form is submitted with GET

对于窗体控件名称中出现的字符,唯一的真正限制是在使用GET提交窗体时

"The "get" method restricts form data set values to ASCII characters." reference

“get”方法将表单数据集值限制为ASCII字符

There's a good thread on it here.

这里有一根很好的线。

#2


47  

Note, that not all characters are submitted for name attributes of form fields (even when using POST)!

注意,并不是所有字符都提交给表单字段的名称属性(即使使用POST)!

White-space characters are trimmed and inner white-space characters as well the character . are replaced by _. (Tested in Chrome 23, Firefox 13 and Internet Explorer 9, all Win7.)

空白字符是裁剪和内部空白字符以及字符。取而代之的是_。(在Chrome 23、firefox13和ie9中测试过,都是Win7。)

#3


37  

Any character you can include in an [X]HTML file is fine to put in an <input name>. As Allain's comment says, <input name> is defined as containing CDATA, so the only things you can't put in there are the control codes and invalid codepoints that the underlying standard (SGML or XML) disallows.

可以在[X]HTML文件中包含的任何字符都可以输入。正如Allain的评论所言,被定义为包含CDATA,所以您不能在其中输入的只有底层标准(SGML或XML)不允许的控制代码和无效代码点。

Allain quoted W3 from the HTML4 spec:

Allain引用HTML4规范中的W3:

Note. The "get" method restricts form data set values to ASCII characters. Only the "post" method (with enctype="multipart/form-data") is specified to cover the entire ISO10646 character set.

请注意。“get”方法将表单数据集值限制为ASCII字符。只指定“post”方法(使用enctype=“multipart/form-data”)来覆盖整个ISO10646字符集。

However this isn't really true in practice.

然而,这在实践中并不是真的。

The theory is that application/x-www-form-urlencoded data doesn't have a mechanism to specify an encoding for the form's names or values, so using non-ASCII characters in either is “not specified” as working and you should use POSTed multipart/form-data instead.

其原理是,应用程序/x-www-form- urlencodes数据没有为表单的名称或值指定编码的机制,因此使用其中任何一个中的非ascii字符都“未指定”有效,您应该使用已发布的多部分/表单数据。

Unfortunately, in the real world, no browser specifies an encoding for fields even when it theoretically could, in the subpart headers of a multipart/form-data POST request body. (I believe Mozilla tried to implement it once, but backed out as it broke servers.)

不幸的是,在现实世界中,没有任何浏览器会在多部分/表单-数据后请求体的子部分头中指定字段的编码,即使理论上可以。(我认为Mozilla曾经尝试过实现它,但在服务器崩溃时退出了。)

And no browser implements the astonishingly complex and ugly RFC2231 standard that would be necessary to insert encoded non-ASCII field names into the multipart's subpart headers. In any case, the HTML spec that defines multipart/form-data doesn't directly say that RFC2231 should be used, and, again, it would break servers if you tried.

而且没有浏览器实现令人惊讶的复杂和丑陋的RFC2231标准,需要将编码的非ascii字段名插入到multipart的子部分头中。在任何情况下,定义多部分/表单数据的HTML规范都没有直接规定应该使用RFC2231,而且,如果您尝试过,它也会破坏服务器。

So the reality of the situation is there is no way to know what encoding is being used for the names and values in a form submission, no matter what type of form it is. What browsers will do with field names and values that contain non-ASCII characters is the same for GET and both types of POST form: it encodes them using the encoding the page containing the form used. Non-ASCII GET form names are no more broken than everything else.

因此,实际情况是,无论表单是什么类型,都无法知道表单提交中的名称和值使用了什么编码。浏览器将如何处理包含非ascii字符的字段名和值,对于GET和这两种类型的POST表单都是一样的:它使用包含所使用表单的页面对它们进行编码。非ascii格式名并不比其他任何东西都更糟糕。

DLH:

DLH:

So name has a different data type for than it does for other elements?

所以名称与其他元素的数据类型不同?

Actually the only element whose name attribute is not CDATA is <meta>. See the HTML4 spec's attribute list for all the different uses of name; it's an overloaded attribute name, having many different meanings on the different elements. This is generally considered a bad thing.

实际上,名称属性不是CDATA的唯一元素是 。查看HTML4规范的属性列表,了解名称的所有不同用法;它是一个重载的属性名,在不同的元素上有许多不同的含义。这通常被认为是一件坏事。

However, typically these days you would avoid name except on form fields (where it's a control name) and param (where it's a plugin-specific parameter identifier). That's only two meanings to grapple with. The old-school use of name for identifying elements like <form> or <a> on the page should be avoided (use id instead).

但是,通常情况下,除了表单字段(它是一个控制名称)和param(它是一个特定于插件的参数标识符)之外,通常会避免使用名称。这只是两个需要解决的问题。应该避免使用旧式的名称来标识页面上的元素,如

(使用id代替)。

#4


4  

While Allain's comment did answer OP's direct question and bobince provided some brilliant in-depth information, I believe many people come here seeking answer to more specific question: "Can I use a dot character in form's input name attribute?"

虽然Allain的评论确实回答了OP的直接问题,bobince提供了一些非常深入的信息,但我相信很多人来这里是为了回答更具体的问题:“我能在form的输入名称属性中使用一个点字符吗?”

As this thread came up as first result when I searched for this knowledge I guessed I may as well share what I found.

当这条线作为第一个结果出现时,当我搜索这个知识时,我猜我也可以分享我的发现。

Firstly, Matthias' claimed that:

首先,马赛厄斯声称:

character . are replaced by _

的性格。取而代之的是_

This is untrue. I don't know if browser's actually did this kind of operation back in 2013 - though, I doubt that. Browsers send dot characters as they are(talking about POST data)! You can check it in developer tools of any decent browser.

这是不真实的。我不知道2013年浏览器是否真的做过这种操作——不过,我对此表示怀疑。浏览器发送点字符,就像它们本身一样(谈论后数据)!您可以在任何优秀浏览器的开发工具中检查它。

Please, notice that tiny little comment by abluejelly, that probably is missed by many:

请注意abluejelly的小注释,它可能被许多人忽略了:

I'd like to note that this is a server-specific thing, not a browser thing. Tested on Win7 FF3/3.5/31, IE5/7/8/9/10/Edge, Chrome39, and Safari Windows 5, and all of them sent " test this.stuff" (four leading spaces) as the name in POST to the ASP.NET dev server bundled with VS2012.

我想指出的是,这是一个特定于服务器的东西,而不是浏览器的东西。在Win7 FF3/3.5/31、IE5/7/8/9/10/Edge、Chrome39和Safari Windows 5上进行了测试,所有这些测试都“测试了这个”。“东西”(四个空格)作为在ASP中的名字。NET dev服务器与VS2012绑定。

I checked it with Apache HTTP server(v2.4.25) and indeed input name like "foo.bar" is changed to "foo_bar". But in a name like "foo[foo.bar]" that dot is not replaced by _!

我使用Apache HTTP服务器(v2.4.25)检查了它,并确实输入了“foo”之类的名称。“bar”被更改为“foo_bar”。但是名字是foo。“那个点没有被_取代!”

My conclusion: You can use dots but I wouldn't use it as this may lead to some unexpected behaviours depending on HTTP server used.

我的结论是:您可以使用点,但我不会使用它,因为这可能会导致一些意外的行为,这取决于使用的HTTP服务器。

#5


0  

Do you mean the id and name attributes of the HTML input tag?

您是指HTML输入标记的id和名称属性吗?

If so, I'd be very tempted to restrict (or convert) allowed "input" name characters into only a-z (A-Z), 0-9 and a limited range of punctuation (".", ",", etc.), if only to limit the potential for XSS exploits, etc.

如果是这样的话,我很可能会限制(或转换)只允许“输入”名称字符到a-z (a-z)、0-9和有限范围的标点(“”)。“,”,等等),如果只是限制XSS攻击的可能性,等等。

Additionally, why let the user control any aspect of the input tag? (Might it not ultimately be easier from a validation perspective to keep the input tag names are 'custom_1', 'custom_2', etc. and then map these as required.)

另外,为什么要让用户控制输入标签的任何方面?(从验证的角度来看,保留输入标记名是“custom_1”、“custom_2”等,然后根据需要映射这些名称,这最终可能不会更简单。)

#1


28  

The only real restriction on what characters can appear in form control names is when a form is submitted with GET

对于窗体控件名称中出现的字符,唯一的真正限制是在使用GET提交窗体时

"The "get" method restricts form data set values to ASCII characters." reference

“get”方法将表单数据集值限制为ASCII字符

There's a good thread on it here.

这里有一根很好的线。

#2


47  

Note, that not all characters are submitted for name attributes of form fields (even when using POST)!

注意,并不是所有字符都提交给表单字段的名称属性(即使使用POST)!

White-space characters are trimmed and inner white-space characters as well the character . are replaced by _. (Tested in Chrome 23, Firefox 13 and Internet Explorer 9, all Win7.)

空白字符是裁剪和内部空白字符以及字符。取而代之的是_。(在Chrome 23、firefox13和ie9中测试过,都是Win7。)

#3


37  

Any character you can include in an [X]HTML file is fine to put in an <input name>. As Allain's comment says, <input name> is defined as containing CDATA, so the only things you can't put in there are the control codes and invalid codepoints that the underlying standard (SGML or XML) disallows.

可以在[X]HTML文件中包含的任何字符都可以输入。正如Allain的评论所言,被定义为包含CDATA,所以您不能在其中输入的只有底层标准(SGML或XML)不允许的控制代码和无效代码点。

Allain quoted W3 from the HTML4 spec:

Allain引用HTML4规范中的W3:

Note. The "get" method restricts form data set values to ASCII characters. Only the "post" method (with enctype="multipart/form-data") is specified to cover the entire ISO10646 character set.

请注意。“get”方法将表单数据集值限制为ASCII字符。只指定“post”方法(使用enctype=“multipart/form-data”)来覆盖整个ISO10646字符集。

However this isn't really true in practice.

然而,这在实践中并不是真的。

The theory is that application/x-www-form-urlencoded data doesn't have a mechanism to specify an encoding for the form's names or values, so using non-ASCII characters in either is “not specified” as working and you should use POSTed multipart/form-data instead.

其原理是,应用程序/x-www-form- urlencodes数据没有为表单的名称或值指定编码的机制,因此使用其中任何一个中的非ascii字符都“未指定”有效,您应该使用已发布的多部分/表单数据。

Unfortunately, in the real world, no browser specifies an encoding for fields even when it theoretically could, in the subpart headers of a multipart/form-data POST request body. (I believe Mozilla tried to implement it once, but backed out as it broke servers.)

不幸的是,在现实世界中,没有任何浏览器会在多部分/表单-数据后请求体的子部分头中指定字段的编码,即使理论上可以。(我认为Mozilla曾经尝试过实现它,但在服务器崩溃时退出了。)

And no browser implements the astonishingly complex and ugly RFC2231 standard that would be necessary to insert encoded non-ASCII field names into the multipart's subpart headers. In any case, the HTML spec that defines multipart/form-data doesn't directly say that RFC2231 should be used, and, again, it would break servers if you tried.

而且没有浏览器实现令人惊讶的复杂和丑陋的RFC2231标准,需要将编码的非ascii字段名插入到multipart的子部分头中。在任何情况下,定义多部分/表单数据的HTML规范都没有直接规定应该使用RFC2231,而且,如果您尝试过,它也会破坏服务器。

So the reality of the situation is there is no way to know what encoding is being used for the names and values in a form submission, no matter what type of form it is. What browsers will do with field names and values that contain non-ASCII characters is the same for GET and both types of POST form: it encodes them using the encoding the page containing the form used. Non-ASCII GET form names are no more broken than everything else.

因此,实际情况是,无论表单是什么类型,都无法知道表单提交中的名称和值使用了什么编码。浏览器将如何处理包含非ascii字符的字段名和值,对于GET和这两种类型的POST表单都是一样的:它使用包含所使用表单的页面对它们进行编码。非ascii格式名并不比其他任何东西都更糟糕。

DLH:

DLH:

So name has a different data type for than it does for other elements?

所以名称与其他元素的数据类型不同?

Actually the only element whose name attribute is not CDATA is <meta>. See the HTML4 spec's attribute list for all the different uses of name; it's an overloaded attribute name, having many different meanings on the different elements. This is generally considered a bad thing.

实际上,名称属性不是CDATA的唯一元素是 。查看HTML4规范的属性列表,了解名称的所有不同用法;它是一个重载的属性名,在不同的元素上有许多不同的含义。这通常被认为是一件坏事。

However, typically these days you would avoid name except on form fields (where it's a control name) and param (where it's a plugin-specific parameter identifier). That's only two meanings to grapple with. The old-school use of name for identifying elements like <form> or <a> on the page should be avoided (use id instead).

但是,通常情况下,除了表单字段(它是一个控制名称)和param(它是一个特定于插件的参数标识符)之外,通常会避免使用名称。这只是两个需要解决的问题。应该避免使用旧式的名称来标识页面上的元素,如

(使用id代替)。

#4


4  

While Allain's comment did answer OP's direct question and bobince provided some brilliant in-depth information, I believe many people come here seeking answer to more specific question: "Can I use a dot character in form's input name attribute?"

虽然Allain的评论确实回答了OP的直接问题,bobince提供了一些非常深入的信息,但我相信很多人来这里是为了回答更具体的问题:“我能在form的输入名称属性中使用一个点字符吗?”

As this thread came up as first result when I searched for this knowledge I guessed I may as well share what I found.

当这条线作为第一个结果出现时,当我搜索这个知识时,我猜我也可以分享我的发现。

Firstly, Matthias' claimed that:

首先,马赛厄斯声称:

character . are replaced by _

的性格。取而代之的是_

This is untrue. I don't know if browser's actually did this kind of operation back in 2013 - though, I doubt that. Browsers send dot characters as they are(talking about POST data)! You can check it in developer tools of any decent browser.

这是不真实的。我不知道2013年浏览器是否真的做过这种操作——不过,我对此表示怀疑。浏览器发送点字符,就像它们本身一样(谈论后数据)!您可以在任何优秀浏览器的开发工具中检查它。

Please, notice that tiny little comment by abluejelly, that probably is missed by many:

请注意abluejelly的小注释,它可能被许多人忽略了:

I'd like to note that this is a server-specific thing, not a browser thing. Tested on Win7 FF3/3.5/31, IE5/7/8/9/10/Edge, Chrome39, and Safari Windows 5, and all of them sent " test this.stuff" (four leading spaces) as the name in POST to the ASP.NET dev server bundled with VS2012.

我想指出的是,这是一个特定于服务器的东西,而不是浏览器的东西。在Win7 FF3/3.5/31、IE5/7/8/9/10/Edge、Chrome39和Safari Windows 5上进行了测试,所有这些测试都“测试了这个”。“东西”(四个空格)作为在ASP中的名字。NET dev服务器与VS2012绑定。

I checked it with Apache HTTP server(v2.4.25) and indeed input name like "foo.bar" is changed to "foo_bar". But in a name like "foo[foo.bar]" that dot is not replaced by _!

我使用Apache HTTP服务器(v2.4.25)检查了它,并确实输入了“foo”之类的名称。“bar”被更改为“foo_bar”。但是名字是foo。“那个点没有被_取代!”

My conclusion: You can use dots but I wouldn't use it as this may lead to some unexpected behaviours depending on HTTP server used.

我的结论是:您可以使用点,但我不会使用它,因为这可能会导致一些意外的行为,这取决于使用的HTTP服务器。

#5


0  

Do you mean the id and name attributes of the HTML input tag?

您是指HTML输入标记的id和名称属性吗?

If so, I'd be very tempted to restrict (or convert) allowed "input" name characters into only a-z (A-Z), 0-9 and a limited range of punctuation (".", ",", etc.), if only to limit the potential for XSS exploits, etc.

如果是这样的话,我很可能会限制(或转换)只允许“输入”名称字符到a-z (a-z)、0-9和有限范围的标点(“”)。“,”,等等),如果只是限制XSS攻击的可能性,等等。

Additionally, why let the user control any aspect of the input tag? (Might it not ultimately be easier from a validation perspective to keep the input tag names are 'custom_1', 'custom_2', etc. and then map these as required.)

另外,为什么要让用户控制输入标签的任何方面?(从验证的角度来看,保留输入标记名是“custom_1”、“custom_2”等,然后根据需要映射这些名称,这最终可能不会更简单。)