如何在Perl中安全地验证不受信任的正则表达式？

This answer explains that to validate an arbitrary regular expression, one simply uses eval:

这个答案解释了要验证任意正则表达式,只需使用eval:

while (<>) {
    eval "qr/$_/;"
    print $@ ? "Not a valid regex: $@\n" : "That regex looks valid\n";
}

However, this strikes me as very unsafe, for what I hope are obvious reasons. Someone could input, say:

然而,这让我觉得非常不安全,因为我希望这是显而易见的原因。有人可以输入,说:

foo/; system('rm -rf /'); qr/

富/; system('rm -rf /'); QR /

or whatever devious scheme they can devise.

或者他们可以设计的任何狡猾的计划。

The natural way to prevent such things is to escape special characters, but if I escape too many characters, I severely limit the usefulness of the regex in the first place. A strong argument can be made, I believe, that at least []{}()/-,.*?^$! and white space characters ought to be permitted (and probably others), un-escaped, in a user regex interface, for the regexes to have minimal usefulness.

防止这种事情的自然方法是逃避特殊字符,但如果我逃避了太多字符,我首先严重限制了正则表达式的用处。我相信,至少[] {}()/ - ,。*?^ $可以提出强有力的论据!在用户正则表达式界面中,应该允许(并且可能是其他的)未转义的白色空格字符,以使正则表达式具有最小的实用性。

Is it possible to secure myself from regex injection, without limiting the usefulness of the regex language?

是否有可能在不限制正则表达式语言有用性的情况下保护自己不受正则表达式注入的影响?

2 个解决方案

#1

The solution is simply to change

解决方案只是改变

eval("qr/$_/")

eval("qr/\$_/")

This can be written more clearly as follows:

这可以写得更清楚如下:

eval('qr/$_/')

But that's still not optimal. The following would be far better as it doesn't involve generating and compiling Perl code at run-time:

但那仍然不是最佳的。以下内容会更好,因为它不涉及在运行时生成和编译Perl代码:

eval { qr/$_/ }

Note that neither solution protects you from denial of service attacks. It's quite easy to write a pattern that will take longer than the life of the universe to complete. To hand that situation, yYou could execute the regex match in a child for which CPU ulimit has been set.

请注意,这两种解决方案都不能保护您免受拒绝服务攻击。编写一个比宇宙生命需要更长时间才能完成的模式非常容易。为了解决这种情况,你可以在已经设置了CPU ulimit的子进程中执行正则表达式匹配。

#2

There is some discussion about this over at The Monastery.

在修道院对此进行了一些讨论。

TLDR: use re::engine::RE2 (-strict => 1);

TLDR:使用re :: engine :: RE2(-strict => 1);

Make sure at add (-strict => 1) to your use statement or re::engine::RE2 will fall back to perl's re.

确保在你的use语句中添加(-strict => 1)或者re :: engine :: RE2将重新回到perl的re。

The following is a quote from junyer, owner of the project on github.

以下是来自github上项目所有者junyer的引用。

RE2 was designed and implemented with an explicit goal of being able to handle regular expressions from untrusted users without risk. One of its primary guarantees is that the match time is linear in the length of the input string. It was also written with production concerns in mind: the parser, the compiler and the execution engines limit their memory usage by working within a configurable budget – failing gracefully when exhausted – and they avoid stack overflow by eschewing recursion.

RE2的设计和实现具有明确的目标,即能够在没有风险的情况下处理来自不受信任的用户的正则表达式。其主要保证之一是匹配时间在输入字符串的长度上是线性的。它的编写也考虑了生产问题:解析器,编译器和执行引擎通过在可配置的预算内工作来限制其内存使用 - 在耗尽时优雅地失败 - 并且它们通过避免递归来避免堆栈溢出。

#1