使用PHP(和ZF)转义用户提供的数据的HTML最佳实践

时间:2022-11-06 06:52:51

Note: I'm using Zend Framework, but I think most of this applies to PHP coding in general.

注意:我使用的是Zend框架,但我认为这大部分适用于PHP代码。

I'm trying to choose a strategy for writing views scripts, possibly with the help of a templating engine. Motivations: clarity and security. I'm just not happy with writing .phtml scripts. This syntax is awfully verbose to do the most often needed thing - outputting a variable:

我正在尝试选择一种策略来编写视图脚本,可能需要借助模板引擎。动机:清晰、安全。我只是不喜欢写。phtml脚本。这种语法非常繁琐,要做最经常需要的事情——输出变量:

<?php echo $this->escape($this->myVariable); ?>

In addition to the code being lengthy, IMHO the template author shouldn't have to remember (and bother) writing an escape call each time he/she wants to output a variable. Forgetting the call will almost definitely result in an XSS vulnerability.

除了代码很长之外,模板作者不应该在每次想要输出一个变量时都要记起(并麻烦地)编写转义调用。忘记调用几乎肯定会导致XSS漏洞。

I have two possible solutions for this problem:

对于这个问题,我有两种可能的解决方案:

Solution 1: A template engine with automatic escaping

解决方案1:带有自动转义的模板引擎

I think at least Smarty has an option for automatically escaping html entities when outputting variables. There are points against Smarty, but maybe at least some of them are addressed in the upcoming 3.0 - I haven't checked yet.

我认为Smarty在输出变量时可以自动转义html实体。有一些针对Smarty的观点,但是可能至少有一些是在即将发布的3.0版中得到解决的——我还没有检查。

XML based template engines like PHPTAL will also escape any data by default. They might look quite odd for a beginner, though. Maybe still worth trying?

基于XML的模板引擎,比如PHPTAL,默认情况下也会转义任何数据。不过,对于初学者来说,它们可能看起来很奇怪。也许还值得一试吗?

Solution 2: Escape the data in the Model

解决方案2:转义模型中的数据

Of course, the other option would be to escape the needed data already in the Model (or even the controller?). The Model should already know the content-type (mainly plain text or HTML text) of each field, so it would be kind of logical to escape the data there. The view could consider all data as safe HTML. This would allow eg. changing the datatype of a field from plain text to HTML without touching the view script - only by changing the Model.

当然,另一种选择是转义模型(甚至控制器)中已经存在的所需数据。模型应该已经知道每个字段的内容类型(主要是纯文本或HTML文本),所以在那里转义数据是合乎逻辑的。视图可以将所有数据视为安全的HTML。这将允许。将字段的数据类型从纯文本更改为HTML,而不涉及视图脚本——只需更改模型。

But then again, it doesn't feel like good MVC practice. In addition, there are problems with this approach as well:

但话说回来,这感觉不像是好的MVC实践。此外,这种方法也存在一些问题:

  • sometimes the view only wants to print the first n characters, and we don't want to end up truncating the data foo & bar as foo &am (having first escaped it as foo &amp; bar)
  • 有时,视图只想打印第一个n个字符,我们不希望最终将数据foo & bar截断为foo &am(第一个是foo &bar)
  • maybe the view wants to construct an URL with varName=$varName in the querystring - again, escaping already in the Model would be bad.
  • 也许视图想要在querystring中使用varName=$varName构造一个URL——同样,在模型中已经转义是不好的。

(These problems could be addressed by providing two versions of the data, or unescaping in the template. Seems bad to me.)

(这些问题可以通过提供两个版本的数据来解决,也可以在模板中避免。似乎对我坏。)

Ideas? Am I missing something? What do you consider "the best practice"?

想法吗?我遗漏了什么东西?你认为“最佳实践”是什么?

PS. This post is about finding a general solution for any user-supplied plain-text data that may contain < or > or any other characters. So, filtering data before saving it to the database isn't the solution.

这篇文章是关于为任何用户提供的纯文本数据找到一个通用的解决方案,这些数据可能包含 <或> 或任何其他字符。因此,在将数据保存到数据库之前过滤数据并不是解决方案。

Update:

更新:

Thanks for all comments so far. I did some more research and will next evaluate Twig and possibly Open Power Template. Both seem interesting: Twig looks very straightforward, but the project is young. On the XML side, OPT's syntax looks a bit nicer than PHPTAL's. Both Twig and OPT are quite well documented.

谢谢大家的评论。我做了更多的研究,接下来将评估Twig和可能的开放电源模板。两者看起来都很有趣:Twig看起来很简单,但是这个项目很年轻。在XML方面,OPT的语法看起来比PHPTAL好一点。Twig和OPT都有很好的文档说明。

4 个解决方案

#1


10  

  1. Filter as soon as possible. You should ensure that all text input is proper UTF-8, to make your text manipulation functions work predictably.

    尽可能快地过滤。您应该确保所有的文本输入都是正确的UTF-8,以使您的文本操作函数可以预期地工作。

    But don't try to filter out "dangerous" characters or fragments! That doesn't work. Only fix or reject incorrect data on input. There's nothing incorrect in < or ' characters.

    但不要试图过滤掉“危险”的字符或片段!这并不工作。只修正或拒绝输入错误的数据。 <或'字符中没有错误。< p>

  2. Escape as late as possible. Add SQL escaping in your SQL query function (or better – use prepared statements). HTML-escape in your HTML templates. Quoted-Printable-escape in your e-mail generation functions, shell-escape when running CLI commands, etc.

    尽可能晚地逃离。在SQL查询函数中添加SQL转义(或更好地使用准备语句)。HTML模板中的HTML-escape。在电子邮件生成函数中引用-打印-转义,在运行CLI命令时使用shell-转义等等。

    Don't let escaped data spread all over your application, because the longer escaped data lives, the bigger chance you'll mix it up with unescaped data or break escaping during processing.

    不要让泄漏的数据散布到你的应用程序中,因为数据越长,你就越有可能在处理过程中使用未转义的数据或越狱。

#2


2  

This isn't a total solution, but one extremely helpful thing in this sort of situation is hungarian-style notation. Hungarian notation used all the time is just annoying, to me, but this is the kind of place where that sort of metadata in the variable name is very valuable. A good practice is to name your variables with a prefix that says what to expect from it...i.e. $rawUserInput, $escapedUserInput, etc.

这不是一个完整的解,但在这种情况下,一个非常有用的东西是匈牙利式的符号。匈牙利符号一直被使用,对我来说很烦人,但是在这里变量名中的元数据是很有价值的。一个很好的做法是给你的变量起一个前缀,上面写着要从变量中得到什么……rawUserInput,escapedUserInput美元等。

This doesn't totally solve the problem, but it's a good coding practice. Then when you see a snippet of code that says

这并不能完全解决问题,但是这是一个很好的编码实践。然后当你看到一段代码说

'SELECT * from table where username = ' + $rawUserName

it's immediately obvious that there's an injection vulnerability, because you know the raw prefix means you haven't escaped it.

很明显,存在一个注入漏洞,因为您知道原始前缀意味着您没有逃过它。

#3


2  

But then again, it doesn't feel like good MVC practice.

但话说回来,这感觉不像是好的MVC实践。

Totally agree, the Model's the wrong place for such presentation concerns and storing both an HTML and a raw version of every variable would make it easy for them to get out of sync. Forget solution 2.

完全一致的是,模型对于这种表示关注的地方是错误的,同时存储HTML和每个变量的原始版本会使它们很容易不同步。忘记解决方案2。

That leaves you with alternative templating engines, or sticking with PHP and learning to bear the load of calling htmlspecialchars all the time. I'm open to the idea of alternative templating entries, but the ones I've tried so far I haven't really been happy with.

这样,您就可以使用替代模板引擎,或者继续使用PHP,学习如何一直调用htmlspecialchars。我对替代模板条目的想法持开放态度,但是到目前为止我所尝试过的那些条目我并不是很满意。

(Many discard PHP syntax and implement their own limited expression languages, which means you lose the advantage of the language you already know and are stuck with a noddy-language which makes more complex presentation logic impossible, so you end up doing it yourself in PHP with strings full of HTML, which is absolutely not a win.)

(许多丢弃PHP语法和实现他们自己的有限的表达语言,这意味着你失去了语言的优势你已经知道并且坚持noddy-language使得更复杂的表示逻辑是不可能的,所以你最后做它自己与字符串的HTML,PHP是绝对不会赢。)

So for the moment I'd suggest a Solution 0a to add to the pile: define a global function with a short name to take the pain out of HTML-escaping:

因此,目前我建议向堆中添加一个解决方案0a:定义一个带有短名称的全局函数,以消除html转义的痛苦:

<?php
    function h($s) {
        echo(htmlspecialchars($s, ENT_QUOTES));
    }
?>
...

My lovely variable is <?php h($this->myVariable); ?>.

I've no idea why PHP doesn't define a shortcut for this, which is as you say by far the most common use case. Now they've dumped the short-tags for XML-PI-style tags, why isn't there one with another name to do the right thing, like say <?phph?

我不知道为什么PHP不为此定义一个快捷方式,正如您所言,这是目前最常见的用例。现在他们已经抛弃了xml - pi样式的标记的短标记,为什么没有另一个名称来做正确的事情,比如写入

#4


0  

There is a dozen ways of doing this. Here is a few:

有很多方法可以做到这一点。这里有几个:

  • You could write your custom View class, as described in the Zend Framework Manual, and escape any variables when they are assigned to or requested from the View.
  • 您可以编写自定义视图类(如Zend Framework手册中描述的那样),并在向视图分配或请求变量时转义它们。
  • In case of Datasets, you could wrap them into a custom ArrayIterator that does output escaping when fetching items from it, along with any other stuff you want to automate on output.
  • 对于数据集,您可以将它们打包到一个自定义ArrayIterator中,它在从数据集获取项时执行输出转义,以及您希望在输出中自动处理的任何其他内容。
  • Or you could use the View Script approach.
  • 或者您可以使用视图脚本方法。
  • Or, if you dont want to have your template authors write any PHP or template syntax whatsoever, you could ask them them to write just the structured HTML and then insert the values through the DomDocument extension.
  • 或者,如果您不想让模板作者编写任何PHP或模板语法,您可以要求他们只编写结构化的HTML,然后通过DomDocument扩展插入值。

As for PHP in a template being verbose, well.. it might not offer the shortest notation, but then again, it does provide a notation and it comes with no overhead. Even for non-PHP template authors it should be easy to learn a few method calls in PHP than a (often weird) template language that basically reinvents a subset of what PHP can do out of the box.

对于模板中冗长的PHP来说。它可能不会提供最短的表示法,但同样,它提供了一个表示法,而且没有开销。即使对于非PHP模板作者来说,在PHP中学习一些方法调用也应该比(通常是奇怪的)模板语言容易得多,模板语言基本上是重新创建PHP可以开箱即用的子集。

You could also use the Alternative PHP Syntax and NowDoc or HereDoc in your templates to get rid of <?php and echo calls, so you could end up with something like

您还可以在模板中使用替代的PHP语法和NowDoc或HereDoc来摆脱

<?php
// get some partial block done first
foreach($this->books as $book):
$loopdata = << LOOPDATA
<li> {$book->title} -  {$book->author} - {$book->publisher}</li>
LOOPDATA;
endforeach;

// render entire template
echo << HTML
<h1>{$this->title}</h1>
<ul>{$loopdata}</ul>
HTML;

Personally, I dont find this too appealing, but as you can see, there is many ways to write your templates with PHP. Just pick one.

就我个人而言,我不觉得这太吸引人,但是正如您所看到的,有很多方法可以用PHP编写模板。只选一个。

#1


10  

  1. Filter as soon as possible. You should ensure that all text input is proper UTF-8, to make your text manipulation functions work predictably.

    尽可能快地过滤。您应该确保所有的文本输入都是正确的UTF-8,以使您的文本操作函数可以预期地工作。

    But don't try to filter out "dangerous" characters or fragments! That doesn't work. Only fix or reject incorrect data on input. There's nothing incorrect in < or ' characters.

    但不要试图过滤掉“危险”的字符或片段!这并不工作。只修正或拒绝输入错误的数据。 <或'字符中没有错误。< p>

  2. Escape as late as possible. Add SQL escaping in your SQL query function (or better – use prepared statements). HTML-escape in your HTML templates. Quoted-Printable-escape in your e-mail generation functions, shell-escape when running CLI commands, etc.

    尽可能晚地逃离。在SQL查询函数中添加SQL转义(或更好地使用准备语句)。HTML模板中的HTML-escape。在电子邮件生成函数中引用-打印-转义,在运行CLI命令时使用shell-转义等等。

    Don't let escaped data spread all over your application, because the longer escaped data lives, the bigger chance you'll mix it up with unescaped data or break escaping during processing.

    不要让泄漏的数据散布到你的应用程序中,因为数据越长,你就越有可能在处理过程中使用未转义的数据或越狱。

#2


2  

This isn't a total solution, but one extremely helpful thing in this sort of situation is hungarian-style notation. Hungarian notation used all the time is just annoying, to me, but this is the kind of place where that sort of metadata in the variable name is very valuable. A good practice is to name your variables with a prefix that says what to expect from it...i.e. $rawUserInput, $escapedUserInput, etc.

这不是一个完整的解,但在这种情况下,一个非常有用的东西是匈牙利式的符号。匈牙利符号一直被使用,对我来说很烦人,但是在这里变量名中的元数据是很有价值的。一个很好的做法是给你的变量起一个前缀,上面写着要从变量中得到什么……rawUserInput,escapedUserInput美元等。

This doesn't totally solve the problem, but it's a good coding practice. Then when you see a snippet of code that says

这并不能完全解决问题,但是这是一个很好的编码实践。然后当你看到一段代码说

'SELECT * from table where username = ' + $rawUserName

it's immediately obvious that there's an injection vulnerability, because you know the raw prefix means you haven't escaped it.

很明显,存在一个注入漏洞,因为您知道原始前缀意味着您没有逃过它。

#3


2  

But then again, it doesn't feel like good MVC practice.

但话说回来,这感觉不像是好的MVC实践。

Totally agree, the Model's the wrong place for such presentation concerns and storing both an HTML and a raw version of every variable would make it easy for them to get out of sync. Forget solution 2.

完全一致的是,模型对于这种表示关注的地方是错误的,同时存储HTML和每个变量的原始版本会使它们很容易不同步。忘记解决方案2。

That leaves you with alternative templating engines, or sticking with PHP and learning to bear the load of calling htmlspecialchars all the time. I'm open to the idea of alternative templating entries, but the ones I've tried so far I haven't really been happy with.

这样,您就可以使用替代模板引擎,或者继续使用PHP,学习如何一直调用htmlspecialchars。我对替代模板条目的想法持开放态度,但是到目前为止我所尝试过的那些条目我并不是很满意。

(Many discard PHP syntax and implement their own limited expression languages, which means you lose the advantage of the language you already know and are stuck with a noddy-language which makes more complex presentation logic impossible, so you end up doing it yourself in PHP with strings full of HTML, which is absolutely not a win.)

(许多丢弃PHP语法和实现他们自己的有限的表达语言,这意味着你失去了语言的优势你已经知道并且坚持noddy-language使得更复杂的表示逻辑是不可能的,所以你最后做它自己与字符串的HTML,PHP是绝对不会赢。)

So for the moment I'd suggest a Solution 0a to add to the pile: define a global function with a short name to take the pain out of HTML-escaping:

因此,目前我建议向堆中添加一个解决方案0a:定义一个带有短名称的全局函数,以消除html转义的痛苦:

<?php
    function h($s) {
        echo(htmlspecialchars($s, ENT_QUOTES));
    }
?>
...

My lovely variable is <?php h($this->myVariable); ?>.

I've no idea why PHP doesn't define a shortcut for this, which is as you say by far the most common use case. Now they've dumped the short-tags for XML-PI-style tags, why isn't there one with another name to do the right thing, like say <?phph?

我不知道为什么PHP不为此定义一个快捷方式,正如您所言,这是目前最常见的用例。现在他们已经抛弃了xml - pi样式的标记的短标记,为什么没有另一个名称来做正确的事情,比如写入

#4


0  

There is a dozen ways of doing this. Here is a few:

有很多方法可以做到这一点。这里有几个:

  • You could write your custom View class, as described in the Zend Framework Manual, and escape any variables when they are assigned to or requested from the View.
  • 您可以编写自定义视图类(如Zend Framework手册中描述的那样),并在向视图分配或请求变量时转义它们。
  • In case of Datasets, you could wrap them into a custom ArrayIterator that does output escaping when fetching items from it, along with any other stuff you want to automate on output.
  • 对于数据集,您可以将它们打包到一个自定义ArrayIterator中,它在从数据集获取项时执行输出转义,以及您希望在输出中自动处理的任何其他内容。
  • Or you could use the View Script approach.
  • 或者您可以使用视图脚本方法。
  • Or, if you dont want to have your template authors write any PHP or template syntax whatsoever, you could ask them them to write just the structured HTML and then insert the values through the DomDocument extension.
  • 或者,如果您不想让模板作者编写任何PHP或模板语法,您可以要求他们只编写结构化的HTML,然后通过DomDocument扩展插入值。

As for PHP in a template being verbose, well.. it might not offer the shortest notation, but then again, it does provide a notation and it comes with no overhead. Even for non-PHP template authors it should be easy to learn a few method calls in PHP than a (often weird) template language that basically reinvents a subset of what PHP can do out of the box.

对于模板中冗长的PHP来说。它可能不会提供最短的表示法,但同样,它提供了一个表示法,而且没有开销。即使对于非PHP模板作者来说,在PHP中学习一些方法调用也应该比(通常是奇怪的)模板语言容易得多,模板语言基本上是重新创建PHP可以开箱即用的子集。

You could also use the Alternative PHP Syntax and NowDoc or HereDoc in your templates to get rid of <?php and echo calls, so you could end up with something like

您还可以在模板中使用替代的PHP语法和NowDoc或HereDoc来摆脱

<?php
// get some partial block done first
foreach($this->books as $book):
$loopdata = << LOOPDATA
<li> {$book->title} -  {$book->author} - {$book->publisher}</li>
LOOPDATA;
endforeach;

// render entire template
echo << HTML
<h1>{$this->title}</h1>
<ul>{$loopdata}</ul>
HTML;

Personally, I dont find this too appealing, but as you can see, there is many ways to write your templates with PHP. Just pick one.

就我个人而言,我不觉得这太吸引人,但是正如您所看到的,有很多方法可以用PHP编写模板。只选一个。