PHP中用户提供的正则表达式的清理

I want to create a website where users can test regular expressions (there are many out there already...such as this one: http://www.pagecolumn.com/tool/pregtest.htm). Basically, the user provides a regular expression and some sample text, and the results of the regex evaluation will be spit back.

我想创建一个网站,用户可以在其中测试正则表达式(已经存在很多...例如这个:http://www.pagecolumn.com/tool/pregtest.htm)。基本上,用户提供正则表达式和一些示例文本,并且正在进行正则表达式评估的结果。

I want to evaluate the regex on the server side with the PHP "preg_*" functions. Is there a way to sanitize the supplied regex? What are the security vulnerabilities that I should be concerned about?

我想用PHP“preg_ *”函数评估服务器端的正则表达式。有没有办法消毒提供的正则表达式?我应该关注哪些安全漏洞?

5 个解决方案

#1

I think PHP itself will check the regex. Here's a sample script I made :

我认为PHP本身会检查正则表达式。这是我制作的示例脚本:

// check for input, and set max size of input
if(@!empty($_POST['regex'])
    && @!empty($_POST['text'])
    && strlen($_POST['regex'])<1000
    && strlen($_POST['text'])<2000
    ){
    // set script timeout in case something goes wrong (SAFE MODE must be OFF)
    $old_time=ini_get('max_execution_time');
    if(!set_time_limit(1)) die('SAFE MODE MUST BE OFF'); // 1 sec is more then enough

    // trim input, it's up to you to do more checks
    $regex=trim($_POST['regex']);
    // don't trim the text, it can be needed
    $input=$_POST['text'];
    // escape slashes
    $regex=preg_replace('/([\\/]+)?//', '\/', $regex);

    // go for the regex
    if(false===$matched=@preg_match('/'.$regex.'/', $input, $matches)){
            // regex was tested, show results
            echo 'Matches: '.$matched.'<br />';
            if($matched>0){
                    echo 'matches: <br />';
                    foreach($matches as $i =>  $match){
                            echo $i.' = '.$match.'<br />';
                }
            }
    }
    // set back original execution time
    set_time_limit($old_time);
}

Anyways, NEVER EVER use eval() with user submitted strings.

无论如何,永远不要使用eval()和用户提交的字符串。

Additionally, you can do some simple minimalistic sanitizing, but that's up to you. ;)

此外,您可以进行一些简单的简约消毒,但这取决于您。 ;)

#2

If you allow user-submitted values for preg_replace make sure you disallow the e flag! Not doing so could allow a malicious user to delete your entire site, or worse.

如果您允许用户提交preg_replace的值,请确保禁止使用e标志!不这样做可能会让恶意用户删除整个网站,或者更糟。

Otherwise, the worst thing that can happen is what the other answers already point out. Set a low script timeout, and maybe you should even make sure that the page can only be called X times per minute.

否则,可能发生的最糟糕的事情是其他答案已经指出的。设置一个低脚本超时,也许您甚至应该确保该页面每分钟只能调用X次。

#3

The only problem I can think of is that someone can DOS you by entering a bad regex (one that is O(2^n) or O(n!) or whatever), and the easiest way to prevent this might be to set your page timeout short.

我能想到的唯一问题是有人可以输入一个坏的正则表达式(一个是O(2 ^ n)或O(n!)或其他)来管理你,而最简单的方法就是设置你的页面超时短。

#4

If the regex is being stored in a database, you should use whatever method you would normally use to escape the data, such as prepared statements.

如果正则表达式存储在数据库中,则应使用通常用于转义数据的任何方法,例如预准备语句。

Otherwise, my only concern is that the user could supply malicious regex in the sense that it could contain a mischeviously complex regex, and I'm not sure there is a way to check that.

否则,我唯一关心的是用户可以提供恶意正则表达式,因为它可能包含错误复杂的正则表达式,我不确定是否有办法检查它。

One thought is that you could make your regex evaluator all client side by doing it in JS, but there are inconsistencies between php's preg functions and JavaScript regex functions.

一个想法是你可以通过在JS中完成你的正则表达式评估器所有客户端,但是php的preg函数和JavaScript正则表达式函数之间存在不一致。

#5

Afaik there are now "vulnerabilities" when trying to evaluate user-supplied regexps. The worst thing that could possibly happen is - like erik points out - a DOS attack or fatal error within your script.

Afaik在尝试评估用户提供的regexp时现在存在“漏洞”。可能发生的最糟糕的事情是 - 像erik指出的那样 - 在你的脚本中发生DOS攻击或致命错误。

I'm afraid to tell you that you won't be (even theoretically) able to "sanitize" every possible regexp out there. The best you can do is to check for lexical and/or syntactic errors.

我不敢告诉你,你不会(甚至理论上)能够“消毒”每一个可能的正则表达式。您可以做的最好的事情是检查词法和/或句法错误。

#1