用户定义的正则表达式安全问题

Are there any security concerns if I run a user defined regular expression on my server with a user defined input string? I'm not asking about a single language, but any language really, with PHP as one of the main language I would like to know about.

如果我使用用户定义的输入字符串在我的服务器上运行用户定义的正则表达式,是否存在任何安全问题?我不是要问一种语言,而是任何语言,PHP是我想要了解的主要语言之一。

For example, if I have the code below:

例如,如果我有以下代码:

<?php

if(isset($_POST['regex'])) {
    preg_match($_POST['regex'], $_POST['match'], $matches);
    var_dump($matches);
}

?>
<form action="" method="post">
<input type="text" name="regex">
<textarea name="match"></textarea>
<input type="submit">
</form>

Providing this is not a controlled environment (i.e. the user can't be trusted), what are the risks of the above code? If a similar code is written for other languages, are there risks in these other languages? If so, which languages consist of threats?

提供这不是受控环境(即用户不可信),上述代码有哪些风险?如果为其他语言编写类似的代码,这些其他语言是否存在风险?如果是这样,哪些语言包含威胁?

I already found out about 'evil regular expressions', however, no matter what I try on my computer, they seem to work fine, see below.

我已经发现了'邪恶的正则表达式',然而,无论我在我的电脑上尝试什么,它们似乎工作正常,见下文。

PHP

<?php
php > preg_match('/^((ab)*)+$/', 'ababab', $matches);var_dump($matches);
array(3) {
  [0] =>
  string(6) "ababab"
  [1] =>
  string(0) ""
  [2] =>
  string(2) "ab"
}
php > preg_match('/^((ab)*)+$/', 'abababa', $matches);var_dump($matches);
array(0) {
}

JavaScript

phantomjs> /^((ab)*)+$/g.exec('ababab');
{
   "0": "ababab",
   "1": "ababab",
   "2": "ab",
   "index": 0,
   "input": "ababab"
}
phantomjs> /^((ab)*)+$/g.exec('abababa');
null

This leads me to believe that PHP and JavaScript have a fail-safe mechanism for evil regexes. Based on that, I would have that other languages have similar features.

这让我相信PHP和JavaScript具有针对邪恶正则表达式的故障安全机制。基于此,我希望其他语言具有类似的功能。

Is this a correct assumption?

这是正确的假设吗?

Finally, for any or all of the languages that may be harmful, are there any ways to make sure the regular expressions doesn't cause damage?

最后,对于任何或所有可能有害的语言,有没有办法确保正则表达式不会造成损害?

1 个解决方案

#1

When you are running user-defined regex with user-defined string on your side, it is possible for user to craft a catastrophic backtracking regex, usually with failing input to cause denial of service on your system.

当您使用用户定义的字符串运行用户定义的正则表达式时,用户可能会制作灾难性的回溯正则表达式,通常输入失败会导致系统拒绝服务。

Using your example ^((ab)*)+$, you need a slightly longer, failing input to cause catastrophic backtracking to take effect: "ababababababababababababababababababababababd".

使用你的例子^((ab)*)+ $,你需要一个稍长,失败的输入来导致灾难性的回溯生效:“ababababababababababababababababababababababd”。

For PHP version, a call to preg_last_error should return PREG_BACKTRACK_LIMIT_ERROR.

对于PHP版本,对preg_last_error的调用应返回PREG_BACKTRACK_LIMIT_ERROR。

For JS version, the code above does not cause catastrophic backtracking in Firefox 26 and the browser returns false. On Chrome 31.0.1650.63 m and Internet Explorer 11, catastrophic backtracking can be observed.

对于JS版本,上面的代码不会在Firefox 26中导致灾难性的回溯,浏览器返回false。在Chrome 31.0.1650.63 m和Internet Explorer 11上,可以观察到灾难性的回溯。

Depending on the API of the language/library, the API may provide an option to limit the number of backtracking attempts or set time-out to the operation; it is strongly recommended that you set the limit in order to prevent DoS on your server.

根据语言/库的API,API可以提供限制回溯尝试次数或设置操作超时的选项;强烈建议您设置限制以防止服务器上的DoS。

PCRE defaults to stop after 10 million backtracking attempts, and the number can be configured.

PCRE默认在1000次回溯尝试后停止,并且可以配置该数量。

.NET Regex class comes with an API to limit the time taken for matching.

.NET Regex类附带一个API来限制匹配所需的时间。

If the language doesn't come with such convenient API, it is strongly recommended that you implement your own time out mechanism to time-out the execution.

如果语言没有这种方便的API,强烈建议您实施自己的超时机制来超时执行。

Unless the specs of the regex engine includes requirement to prevent catastrophic backtracking (e.g. PCRE has a default backtracking limit), you shouldn't rely on the behavior of specific implementation (like the case of Firefox as described above).

除非正则表达式引擎的规范包括防止灾难性回溯的要求(例如PCRE具有默认的回溯限制),否则您不应该依赖于特定实现的行为(如上所述的Firefox的情况)。

#1