您应该在HTML表单文本字段中检查什么?

时间:2022-01-09 16:34:20

I'm writing a PHP script to grab text box data from a submitted form. These are simple text boxes and I don't want to accept any HTML tags. I think I should at least use strip_tags() and addslashes(). Anything else? I wouldn't mind restricting the input to alphanumerics, should I use a regular expression to seek out nonstandard characters?

我正在编写一个PHP脚本来从提交的表单中获取文本框数据。这些是简单的文本框,我不想接受任何HTML标记。我想我至少应该使用strip_tags()和addslashes()。还要别的吗?我不介意限制输入到字母数字,我应该使用正则表达式来寻找非标准字符吗?

This is a simple form that actually (ugh) gets emailed to the person processing it. (No database, sadly.) And it's simple data, first and last name sort of things.

这是一个简单的表格,实际上(呃)通过电子邮件发送给处理它的人。 (遗憾的是,没有数据库。)这是简单的数据,名字和姓氏。

Edit: I'd also like to know specifically what I should be looking for. What's the consensus on reasonable input filtering?

编辑:我也想知道我应该寻找什么。合理输入过滤的共识是什么?

3 个解决方案

#1


Use the PHP filter functions.

使用PHP过滤器功能。

You can use them for sanitizing input and validating input (eg email addresses).

您可以使用它们来清理输入和验证输入(例如电子邮件地址)。

There are two approaches to validation (this also applies to security and lots of other things).

验证有两种方法(这也适用于安全性和许多其他方面)。

Firstly, you can default to allow anything except for that which is explicitly disallowed. Or you can default ti disallowing everything except that which is specifically allowed.

首先,您可以默认允许除明确禁止的内容之外的任何内容。或者你可以默认禁止除了特别允许的所有内容。

Generally speaking the latter approach is more secure and should be used except in cases where you have a compelling reason not to (eg it's simply too hard to know what's allowed, you're doing an app for users who aren't deemed to be a security threat and so on).

一般来说,后一种方法更安全,应该使用,除非你有令人信服的理由不要(例如,很难知道什么是允许的,你正在为不被认为是安全威胁等)。

You have to be careful using this however. For people's names characters like ' and - are perfectly valid but naive implementations may restrict them. What you want to generally avoid is:

但是你必须小心使用它。对于人们的名字,像'和 - 这样的字符是完全有效的,但天真的实现可能会限制它们。您通常希望避免的是:

  • SQL injection: always use mysql_real_escape_string() on any input;
  • SQL注入:总是在任何输入上使用mysql_real_escape_string();

  • XSS (Cross site scripting): generally speaking you should strip out HTML tags from user input. You will of course sometimes have to allow them (eg rich text editor boxes) but even in those cases you will have a list of tags that you allow and you should strip out all others (especially tags); and
  • XSS(跨站点脚本):一般来说,您应该从用户输入中删除HTML标记。当然,您有时必须允许它们(例如,富文本编辑器框),但即使在这些情况下,您将拥有一个允许的标记列表,您应该删除所有其他标记(尤其是标记);和

  • Tpically you should strip out low characters (below ASCII 20? or so); and
  • 通常你应该删除低字符(低于ASCII 20?左右);和

  • Depending on your internationalization requirements you may want to strip out high characters (above ASCII 127).
  • 根据您的国际化要求,您可能需要删除高字符(ASCII 127以上)。

A good default value to use is:

一个好的默认值是:

$var = filter_var($var, FILTER_SANITIZE_STRING);

but pick the right filter for the situation.

但是为这种情况选择合适的过滤器。

#2


This is a very common question with alot of not so clear answers. Functions like addslashes() can actually do more harm than good in some setups. Some basic rules to follow when dealing with user input, is don't trust anything and if it's not in the format you are expecting, don't try and fix it just raise an error.

这是一个非常常见的问题,有很多不太明确的答案。像addslashes()这样的函数在某些设置中实际上可能弊大于利。处理用户输入时要遵循的一些基本规则是不要相信任何东西,如果它不是您期望的格式,请不要尝试修复它只是引发错误。

If you only require alphanumeric, then a simple regex will handle that but a little more information would help.

如果您只需要字母数字,那么一个简单的正则表达式将处理它,但更多的信息将有所帮助。

What are you going to be doing with the data? How are you currently (or planning on) handling the input, e.g., user submits a form, you process the form and store data in a DB to later display (like a comment engine).

你打算用数据做什么?您目前(或计划)如何处理输入,例如,用户提交表单,处理表单并将数据存储在数据库中以便以后显示(如注释引擎)。

Edit: If it is as simple as sending a text box via email for a human to process. My biggest concerns would be XSS and smtp header injection (depending on how the email is being sent). Try and go with the simplest solution, If you just need to receive alpha-numeric data for now use a regex and only accept that. Another solution would be to use htmlentities with ENT_QUOTES.

编辑:如果它像通过电子邮件发送文本框一样简单,供人进行处理。我最担心的是XSS和smtp标头注入(取决于电子邮件的发送方式)。尝试使用最简单的解决方案,如果您现在只需要接收字母数字数据,请使用正则表达式并且只接受它。另一种解决方案是使用具有ENT_QUOTES的htmlentities。

#3


I don't want to accept any HTML tags. I think I should at least use strip_tags()

我不想接受任何HTML标记。我想我至少应该使用strip_tags()

Maybe, but not if you want to allow people to type ‘<’/‘>’ characters that just mean less-than and greater-than, and aren't anything to do with tags.

也许吧,但是如果你想让人们输入'<'/'>'字符只是意味着小于和大于,并且与标签无关。

On input for free-text fields you won't really want to filter out much more than the non-newline control characters (which you usually don't want anywhere), and, if you are using UTF-8, invalid/redundant sequences.

在输入*文本字段时,你真的不想过滤掉比非换行控制字符(你通常不想要的任何地方),并且,如果你使用UTF-8,无效/冗余序列。

Then when you output the value back to the page you will of course remember to use htmlspecialchars() so that ‘<’ gets escaped to ‘&lt;’ and appears as a literal ‘<’ on-screen, right? You need to be using htmlspecialchars() any time you output a text value into HTML in a template, regardless of whether that string came from a form submission, or the database, or somewhere else.

然后,当您将值输出回页面时,您当然会记得使用htmlspecialchars(),以便'<'被转义为'<'并在屏幕上显示为文字'<',对吧?每当在模板中将文本值输出到HTML时,您都需要使用htmlspecialchars(),无论该字符串是来自表单提交,数据库还是其他地方。

For non-free-text fields where you want all input to match a particular restricted format, then yes, a regexp can be a good way to match this.

对于您希望所有输入都匹配特定受限格式的非*文本字段,然后是,正则表达式可以是匹配它的好方法。

and addslashes().

addslashes() is almost always the wrong thing. A good rule of thumb is: don't use this.

addslashes()几乎总是错误的。一个好的经验法则是:不要使用它。

addslashes() is inadequate for SQL escaping because it does not match the actual SQL string literal escape format, so you can construct strings that are still dangerous when addslashed. When you're using MySQL, you should use mysql_real_escape_string() instead. Other databases have their own particular escaping functions. Use them (or, easier, use parameterised queries so you don't have to manually escape text to SQL at all).

addslashes()不适合SQL转义,因为它与实际的SQL字符串文字转义格式不匹配,因此您可以构造在添加时仍然很危险的字符串。当你使用MySQL时,你应该使用mysql_real_escape_string()代替。其他数据库有自己特定的转义功能。使用它们(或者,更容易,使用参数化查询,因此您根本不必手动将文本转义为SQL)。

(addslashes() is inadequate for HTML escaping because it doesn't attempt to do anything with HTML special characters at all. That's not what it's for.)

(addslashes()不适合HTML转义,因为它根本不会尝试对HTML特殊字符做任何事情。这不是它的用途。)

In any case, trying to cope with output-escaping at the input filtering stage is backwards. Instead, keep all the strings that are internal to your application as plain text, and escape them on the way out of the application: mysql_real_escape_string when they're going out to take part in an SQL query, htmlspecialchars() when they're going out onto an HTML page, and so on.

在任何情况下,尝试在输入过滤阶段处理输出转义都是倒退的。相反,将应用程序内部的所有字符串保留为纯文本,并在离开应用程序的路上将其转义:mysql_real_escape_string,当他们出去参加SQL查询时,htmlspecialchars()在他们去的时候进入HTML页面,依此类推。

#1


Use the PHP filter functions.

使用PHP过滤器功能。

You can use them for sanitizing input and validating input (eg email addresses).

您可以使用它们来清理输入和验证输入(例如电子邮件地址)。

There are two approaches to validation (this also applies to security and lots of other things).

验证有两种方法(这也适用于安全性和许多其他方面)。

Firstly, you can default to allow anything except for that which is explicitly disallowed. Or you can default ti disallowing everything except that which is specifically allowed.

首先,您可以默认允许除明确禁止的内容之外的任何内容。或者你可以默认禁止除了特别允许的所有内容。

Generally speaking the latter approach is more secure and should be used except in cases where you have a compelling reason not to (eg it's simply too hard to know what's allowed, you're doing an app for users who aren't deemed to be a security threat and so on).

一般来说,后一种方法更安全,应该使用,除非你有令人信服的理由不要(例如,很难知道什么是允许的,你正在为不被认为是安全威胁等)。

You have to be careful using this however. For people's names characters like ' and - are perfectly valid but naive implementations may restrict them. What you want to generally avoid is:

但是你必须小心使用它。对于人们的名字,像'和 - 这样的字符是完全有效的,但天真的实现可能会限制它们。您通常希望避免的是:

  • SQL injection: always use mysql_real_escape_string() on any input;
  • SQL注入:总是在任何输入上使用mysql_real_escape_string();

  • XSS (Cross site scripting): generally speaking you should strip out HTML tags from user input. You will of course sometimes have to allow them (eg rich text editor boxes) but even in those cases you will have a list of tags that you allow and you should strip out all others (especially tags); and
  • XSS(跨站点脚本):一般来说,您应该从用户输入中删除HTML标记。当然,您有时必须允许它们(例如,富文本编辑器框),但即使在这些情况下,您将拥有一个允许的标记列表,您应该删除所有其他标记(尤其是标记);和

  • Tpically you should strip out low characters (below ASCII 20? or so); and
  • 通常你应该删除低字符(低于ASCII 20?左右);和

  • Depending on your internationalization requirements you may want to strip out high characters (above ASCII 127).
  • 根据您的国际化要求,您可能需要删除高字符(ASCII 127以上)。

A good default value to use is:

一个好的默认值是:

$var = filter_var($var, FILTER_SANITIZE_STRING);

but pick the right filter for the situation.

但是为这种情况选择合适的过滤器。

#2


This is a very common question with alot of not so clear answers. Functions like addslashes() can actually do more harm than good in some setups. Some basic rules to follow when dealing with user input, is don't trust anything and if it's not in the format you are expecting, don't try and fix it just raise an error.

这是一个非常常见的问题,有很多不太明确的答案。像addslashes()这样的函数在某些设置中实际上可能弊大于利。处理用户输入时要遵循的一些基本规则是不要相信任何东西,如果它不是您期望的格式,请不要尝试修复它只是引发错误。

If you only require alphanumeric, then a simple regex will handle that but a little more information would help.

如果您只需要字母数字,那么一个简单的正则表达式将处理它,但更多的信息将有所帮助。

What are you going to be doing with the data? How are you currently (or planning on) handling the input, e.g., user submits a form, you process the form and store data in a DB to later display (like a comment engine).

你打算用数据做什么?您目前(或计划)如何处理输入,例如,用户提交表单,处理表单并将数据存储在数据库中以便以后显示(如注释引擎)。

Edit: If it is as simple as sending a text box via email for a human to process. My biggest concerns would be XSS and smtp header injection (depending on how the email is being sent). Try and go with the simplest solution, If you just need to receive alpha-numeric data for now use a regex and only accept that. Another solution would be to use htmlentities with ENT_QUOTES.

编辑:如果它像通过电子邮件发送文本框一样简单,供人进行处理。我最担心的是XSS和smtp标头注入(取决于电子邮件的发送方式)。尝试使用最简单的解决方案,如果您现在只需要接收字母数字数据,请使用正则表达式并且只接受它。另一种解决方案是使用具有ENT_QUOTES的htmlentities。

#3


I don't want to accept any HTML tags. I think I should at least use strip_tags()

我不想接受任何HTML标记。我想我至少应该使用strip_tags()

Maybe, but not if you want to allow people to type ‘<’/‘>’ characters that just mean less-than and greater-than, and aren't anything to do with tags.

也许吧,但是如果你想让人们输入'<'/'>'字符只是意味着小于和大于,并且与标签无关。

On input for free-text fields you won't really want to filter out much more than the non-newline control characters (which you usually don't want anywhere), and, if you are using UTF-8, invalid/redundant sequences.

在输入*文本字段时,你真的不想过滤掉比非换行控制字符(你通常不想要的任何地方),并且,如果你使用UTF-8,无效/冗余序列。

Then when you output the value back to the page you will of course remember to use htmlspecialchars() so that ‘<’ gets escaped to ‘&lt;’ and appears as a literal ‘<’ on-screen, right? You need to be using htmlspecialchars() any time you output a text value into HTML in a template, regardless of whether that string came from a form submission, or the database, or somewhere else.

然后,当您将值输出回页面时,您当然会记得使用htmlspecialchars(),以便'<'被转义为'<'并在屏幕上显示为文字'<',对吧?每当在模板中将文本值输出到HTML时,您都需要使用htmlspecialchars(),无论该字符串是来自表单提交,数据库还是其他地方。

For non-free-text fields where you want all input to match a particular restricted format, then yes, a regexp can be a good way to match this.

对于您希望所有输入都匹配特定受限格式的非*文本字段,然后是,正则表达式可以是匹配它的好方法。

and addslashes().

addslashes() is almost always the wrong thing. A good rule of thumb is: don't use this.

addslashes()几乎总是错误的。一个好的经验法则是:不要使用它。

addslashes() is inadequate for SQL escaping because it does not match the actual SQL string literal escape format, so you can construct strings that are still dangerous when addslashed. When you're using MySQL, you should use mysql_real_escape_string() instead. Other databases have their own particular escaping functions. Use them (or, easier, use parameterised queries so you don't have to manually escape text to SQL at all).

addslashes()不适合SQL转义,因为它与实际的SQL字符串文字转义格式不匹配,因此您可以构造在添加时仍然很危险的字符串。当你使用MySQL时,你应该使用mysql_real_escape_string()代替。其他数据库有自己特定的转义功能。使用它们(或者,更容易,使用参数化查询,因此您根本不必手动将文本转义为SQL)。

(addslashes() is inadequate for HTML escaping because it doesn't attempt to do anything with HTML special characters at all. That's not what it's for.)

(addslashes()不适合HTML转义,因为它根本不会尝试对HTML特殊字符做任何事情。这不是它的用途。)

In any case, trying to cope with output-escaping at the input filtering stage is backwards. Instead, keep all the strings that are internal to your application as plain text, and escape them on the way out of the application: mysql_real_escape_string when they're going out to take part in an SQL query, htmlspecialchars() when they're going out onto an HTML page, and so on.

在任何情况下,尝试在输入过滤阶段处理输出转义都是倒退的。相反,将应用程序内部的所有字符串保留为纯文本,并在离开应用程序的路上将其转义:mysql_real_escape_string,当他们出去参加SQL查询时,htmlspecialchars()在他们去的时候进入HTML页面,依此类推。