从文件中读取正则表达式是否安全？

Assuming a Perl script that allows users to specify several text filter expressions in a config file, is there a safe way to let them enter regular expressions as well, without the possibility of unintended side effects or code execution? Without actually parsing the regexes and checking them for problematic constructs, that is. There won't be any substitution, only matching.

假设一个允许用户在配置文件中指定多个文本过滤器表达式的Perl脚本，是否有一种安全的方法让他们也可以输入正则表达式，而不会出现意外的副作用或代码执行？没有实际解析正则表达式并检查它们是否有问题的结构，那就是。不会有任何替代，只有匹配。

As an aside, is there a way to test if the specified regex is valid before actually using it? I'd like to issue warnings if something like /foo (bar/ was entered.

顺便说一下，有没有办法在实际使用它之前测试指定的正则表达式是否有效？如果/ foo（bar /已输入），我想发出警告。

Thanks, Z.

谢谢，Z。

EDIT:
Thanks for the very interesting answers. I've since found out that the following dangerous constructs will only be evaluated in regexes if the use re 'eval' pragma is used:

(?{code})
(??{code})
${code}
@{code}

The default is no re 'eval'; so unless I'm missing something, it should be safe to read regular expressions from a file, with the only check being the eval/catch posted by Axeman. At least I haven't been able to hide anything evil in them in my tests.

默认值是“re'eval”;因此，除非我遗漏了某些内容，否则从文件中读取正则表达式应该是安全的，唯一的检查是Axeman发布的eval / catch。至少我在测试中无法隐藏任何邪恶的东西。

Thanks again. Z.

再次感谢。 Z.

5 个解决方案

#1

Depending on what you're matching against, and the version of Perl you're running, there might be some regexes that act as an effective denial of service attack by using excessive lookaheads, lookbehinds, and other assertions.

根据您所匹配的内容以及您正在运行的Perl版本，可能会有一些正则表达式通过使用过多的前瞻，外观和其他断言来充当有效的拒绝服务攻击。

You're best off allowing only a small, well-known subset of regex patterns, and expanding it cautiously as you and your users learn how to use the system. In the same way that many blog commenting systems allow only a small subset of HTML tags.

您最好只允许一个小的，众所周知的正则表达式模式子集，并在您和您的用户学习如何使用系统时谨慎扩展它。与许多博客评论系统仅允许一小部分HTML标签的方式相同。

Eventually Parse::RecDescent might become useful, if you need to do complex analysis of regexes.

如果你需要对正则表达式进行复杂的分析，最终Parse :: RecDescent可能会变得有用。

#2

This

这个

eval { 
    qr/$re/;
};
if ( $@  ) { 
    # do something
}

compiles an expression, and lets you recover from an error.

编译表达式，并让您从错误中恢复。

You can watch for malicious expression, since you're only going to do matching, by looking for these patterns, which would allow arbitrary code to be run:

您可以观察恶意表达，因为您只是通过查找这些模式来进行匹配，这将允许运行任意代码：

(?: \( \?{1,2} \{  # '(' followed by '?' or '??', and then '{'
|   \@ \{ \s* \[   # a dereference of a literal array, which may be arbitrary code.
)

Make sure you compile this with the x flag.

确保使用x标志编译它。

#3

You will probably have to do some level of sanitisation. For example, the perlre man page describes the following construct:

您可能需要进行一定程度的卫生处理。例如，perlre手册页描述了以下构造：

(?{ code })

which allows executable code inside a pattern match.

它允许模式匹配中的可执行代码。

#4

I would suggest not trusting any regular expressions from users. If you are actually determined to do so, please run perl in taint (-T) mode. In that case, you'll need some form of validation. Instead of using Parse::RecDescent for writing your own regular expression parser as another answer suggests, you should use the existing YAPE::Regex regexp parser which is probably faster, was written by an expert and works like a charm.

我建议不要相信用户的任何正则表达式。如果您确实这样做，请在污染（-T）模式下运行perl。在这种情况下，您需要某种形式的验证。你可以使用现有的YAPE :: Regex regexp解析器而不是使用Parse :: RecDescent来编写自己的正则表达式解析器，它可能更快，由专家编写并且像魅力一样工作。

Finally, since perl 5.10.0, you can plug different regular expression engines into perl (lexically!). You could check whether there's a less powerful regular expression engine available whose syntax is more easily verifiable. If you want to go down that route, read the API description, Avar's re::engine::Plugin, or in general check out any of Avar's plugin engines.

最后，从perl 5.10.0开始，你可以将不同的正则表达式引擎插入perl（词法！）。您可以检查是否存在功能较弱的正则表达式引擎，其语法更容易验证。如果你想沿着那条路走下去，请阅读API描述，Avar的re :: engine :: Plugin，或者通常查看任何Avar的插件引擎。

#5

Would the Safe module be of any use with regard to compiling/executing untrusted regular expressions?

对于编译/执行不受信任的正则表达式，Safe模块是否有用？

#1