在哪种语言中使用用户提供的正则表达式是一个安全漏洞？

Edit: tchrist has informed me that my original accusations about Perl's insecurity are unfounded. However, the question still stands.

编辑：tchrist告诉我，我对Perl不安全的最初指责是没有根据的。但问题仍然存在。

~~I know that in Perl, you can embed arbitrary code in a regular expression, so obviously accepting a user-supplied regex and matching it allows arbitrary code execution and is a clear security hole.~~ But is this true for all languages that use regular expressions? Is it true for all languages that use "Perl-compatible" regular expressions? In which languages are user-supplied regexes safe to use, and in which languages do they allow arbitrary code execution or other security holes?

我知道在Perl中，你可以在正则表达式中嵌入任意代码，因此显然接受用户提供的正则表达式并匹配它允许任意代码执行并且是一个明显的安全漏洞。但对于使用正则表达式的所有语言都是如此吗？所有使用“Perl兼容”正则表达式的语言都适用吗？哪些语言是用户提供的正则表达式可以安全使用，以及哪些语言允许任意代码执行或其他安全漏洞？

7 个解决方案

#1

In most languages allowing users to supply regular expression means that you allow for a denial of service attack.

在大多数语言中，允许用户提供正则表达式意味着您允许拒绝服务攻击。

Some types of regular expressions are extremely cpu intensive to execute. So in general it's a bad idea to allow users to enter regular expressions that will be executed on a remote system.

某些类型的正则表达式执行起来非常麻烦。因此，一般来说，允许用户输入将在远程系统上执行的正则表达式是个坏主意。

For more info, read this page: http://www.regular-expressions.info/catastrophic.html

有关详细信息，请阅读此页：http：//www.regular-expressions.info/catastrophic.html

#2

This is not true: you cannot execute code callbacks in Perl by sneaking them in an evaluated regex. This is forbidden. You have to specifically override that with a lexically scoped

事实并非如此：您无法通过将它们隐藏在评估的正则表达式中来执行Perl中的代码回调。这是被禁止的。您必须使用词法范围专门覆盖它

use re "eval";

if you expect to have both interpolation and code escapes happening in the same pattern.

如果你希望插入和代码转义都以相同的模式发生。

Watch:

看：

% perl -le '$x = "(?{ die 'naughty' })"; "aaa" =~ /$x/'
Eval-group not allowed at runtime, use re 'eval' in regex m/(?{ die naughty })/ at -e line 1.
Exit 255

% perl -Mre=eval -le '$x = "(?{ die 'naughty' })"; "aaa" =~ /$x/'
naughty at (re_eval 1) line 1.
Exit 255

#3

It's generally dynamic languages with an eval facility that tend to have the ability to execute code from regular expressions. In static languages (i.e. those requiring a separate compilation step) there is generally no way to execute code that wasn't compiled, so evaluating code from within a regex is impossible.

它通常是带有eval工具的动态语言，往往能够从正则表达式执行代码。在静态语言（即那些需要单独编译步骤的语言）中，通常无法执行未编译的代码，因此从正则表达式中评估代码是不可能的。

Without a way to embed code in a regex, the worst a user can do is write a regex that takes a long time to evaluate.

如果没有办法在正则表达式中嵌入代码，用户可以做的最糟糕的事情就是编写一个需要很长时间才能进行评估的正则表达式。

#4

1)Vulnerabilities are found in regex libraries, such as this buffer overflow that affects Webkit and allows any attacker to gain remote code execution by accessing the regex library from javascript.

1）在regex库中发现漏洞，例如影响Webkit的缓冲区溢出，并允许任何攻击者通过从javascript访问regex库来获得远程代码执行。

2)It is a DoS condition in C#.

2）这是C＃中的DoS条件。

3)User supplied regex's can be for php because of modifiers. Adding the /e modifier evals the match. In this case system will be eval()'ed.

3）由于修饰符，用户提供的正则表达式可以用于php。添加/ e修饰符可以避免匹配。在这种情况下，系统将是eval（）。

preg_replace("/.*/e","system('echo /etc/passwd')");

preg_replace（“/.*/ e”，“system（'echo / etc / passwd'）”）;

Or in the form of a vulnerability:

或者以漏洞的形式：

preg_replace($_GET['regex'],$_GET['check']);

preg_replace函数（$ _ GET [ '正则表达式']，$ _ GET [ '检查']）;

#5

Regular expressions are a programming language. I don't think they're quite Turing-complete, but they're close enough that allowing your users to enter them into your web site IS allowing other people to run code on your server. QED, yes, it's a security hole.

正则表达式是一种编程语言。我不认为他们是图灵完整的，但它们足够接近允许用户将它们输入您的网站IS允许其他人在您的服务器上运行代码。 QED，是的，这是一个安全漏洞。

You might be able to get away with allowing a subset of whatever regexp language you want to use, whitelist a particular set of constructs to make it a not-big-enough-to-sweat-over hole... other people have already mentioned the possible dooms of nesting and * . How much you're willing to let people load down your server is up to you. Personally, I'd be comfortable with letting 'em have one SQL "CONTAINS" statement and maybe a "BETWEEN()". :)

您可以通过允许使用您想要使用的任何正则表达式语言的子集，将一组特定的构造列入白名单，使其成为一个不够大的漏洞...其他人已经提到过可能的筑巢和*的厄运。你愿意让人们加载你的服务器多少取决于你。就个人而言，我很乐意让他们有一个SQL“CONTAINS”语句，也许是“BETWEEN（）”。 :)

#6

I suspect ruby would allow /#{system("rm -rf really_important_directory")}/ - is that the kind of thing you're worried about?

我怀疑ruby会允许/＃{system（“rm -rf really_important_directory”）} / - 这是你担心的那种事情吗？

#7

AFAIK, you can do it safely in C#: you can supply the regex string to the Regex constructor, and if it fails to parse it'll throw. I'm not sure about others.

AFAIK，您可以在C＃中安全地执行它：您可以将正则表达式字符串提供给Regex构造函数，如果它无法解析它将抛出。我不确定别人。

#1