如何在php中对字符串执行preg_replace?

时间:2021-06-16 08:45:16

i have some simple code that does a preg match:

我有一些简单的代码,可以进行preg匹配:

$bad_words = array('dic', 'tit', 'fuc',); //for this example i replaced the bad words

for($i = 0; $i < sizeof($bad_words); $i++)
{
    if(preg_match("/$bad_words[$i]/", $str, $matches))
    {
        $rep = str_pad('', strlen($bad_words[$i]), '*');
        $str = str_replace($bad_words[$i], $rep, $str);
    }
}
echo $str;

So, if $str was "dic" the result will be '*' and so on.

所以,如果$ str是“dic”,结果将是'*',依此类推。

Now there is a small problem if $str == f.u.c. The solution might be to use:

现在,如果$ str == f.u.c,则存在一个小问题。解决方案可能是使用:

$pattern = '~f(.*)u(.*)c(.*)~i';
$replacement = '***';
$foo =  preg_replace($pattern, $replacement, $str);

In this case i will get ***, in any case. My issue is putting all this code together.

在这种情况下,我会得到***,无论如何。我的问题是把所有这些代码放在一起。

I've tried:

我试过了:

$pattern = '~f(.*)u(.*)c(.*)~i';
$replacement = 'fuc';
$fuc =  preg_replace($pattern, $replacement, $str);

$bad_words = array('dic', 'tit', $fuc,); 

for($i = 0; $i < sizeof($bad_words); $i++)
{
    if(preg_match("/$bad_words[$i]/", $str, $matches))
    {
        $rep = str_pad('', strlen($bad_words[$i]), '*');
            $str = str_replace($bad_words[$i], $rep, $str);
    }
}
echo $str;

The idea is that $fuc becomes fuc then I place it in the array then the array does its jobs, but this doesn't seem to work.

我的想法是$ fuc变成fuc然后我将它放在数组中然后数组完成它的工作,但这似乎不起作用。

1 个解决方案

#1


3  

First of all, you can do all of the bad word replacements with one (dynamically generated) regex, like this:

首先,你可以用一个(动态生成的)正则表达式完成所有坏词替换,如下所示:

$bad_words = array('dic', 'tit', 'fuc',);

$str = preg_replace_callback("/\b(?:" . implode( '|', $bad_words) . ")\b/", 
    function( $match) {
        return str_repeat( '*', strlen( $match[0])); 
}, $str);

Now, you have the problem of people adding periods in between the word, which you can search for with another regex and replace them as well. However, you must keep in mind that . matches any character in a regex, and must be escaped (with preg_quote() or a backslash).

现在,您遇到的问题是人们在单词之间添加句点,您可以使用其他正则表达式进行搜索并替换它们。但是,你必须记住这一点。匹配正则表达式中的任何字符,并且必须进行转义(使用preg_quote()或反斜杠)。

$bad_words = array_map( function( $el) { 
    return implode( '\.', str_split( $el));
}, $bad_words);

This will create a $bad_words array similar to:

这将创建一个$ bad_words数组,类似于:

array(
    'd\.i\.c',
    't\.i\.t',
    'f\.u\.c'
)

Now, you can use this new $bad_words array just like the above one to replace these obfuscated ones.

现在,您可以使用这个新的$ bad_words数组,就像上面那个一样,来替换这些混淆的数组。

Hint: You can make this array_map() call "better" in the sense that it can be smarter to catch more obfuscations. For example, if you wanted to catch a bad word separated with either a period or a whitespace character or a comma, you can do:

提示:你可以使这个array_map()调用“更好”,因为它可以更聪明地捕获更多的混淆。例如,如果要捕获用句点或空格字符或逗号分隔的错误单词,则可以执行以下操作:

$bad_words = array_map( function( $el) { 
    return implode( '(?:\.|\s|,)', str_split( $el));
}, $bad_words);

Now if you make that obfuscation group optional, you'll catch a lot more bad words:

现在,如果你让混淆组成为可选的,你会发现更多不好的词:

$bad_words = array_map( function( $el) { 
    return implode( '(?:\.|\s|,)?', str_split( $el));
}, $bad_words);

Now, bad words should match:

现在,坏词应该匹配:

f.u.c
f,u.c
f u c 
fu c
f.uc

And many more.

还有很多。

#1


3  

First of all, you can do all of the bad word replacements with one (dynamically generated) regex, like this:

首先,你可以用一个(动态生成的)正则表达式完成所有坏词替换,如下所示:

$bad_words = array('dic', 'tit', 'fuc',);

$str = preg_replace_callback("/\b(?:" . implode( '|', $bad_words) . ")\b/", 
    function( $match) {
        return str_repeat( '*', strlen( $match[0])); 
}, $str);

Now, you have the problem of people adding periods in between the word, which you can search for with another regex and replace them as well. However, you must keep in mind that . matches any character in a regex, and must be escaped (with preg_quote() or a backslash).

现在,您遇到的问题是人们在单词之间添加句点,您可以使用其他正则表达式进行搜索并替换它们。但是,你必须记住这一点。匹配正则表达式中的任何字符,并且必须进行转义(使用preg_quote()或反斜杠)。

$bad_words = array_map( function( $el) { 
    return implode( '\.', str_split( $el));
}, $bad_words);

This will create a $bad_words array similar to:

这将创建一个$ bad_words数组,类似于:

array(
    'd\.i\.c',
    't\.i\.t',
    'f\.u\.c'
)

Now, you can use this new $bad_words array just like the above one to replace these obfuscated ones.

现在,您可以使用这个新的$ bad_words数组,就像上面那个一样,来替换这些混淆的数组。

Hint: You can make this array_map() call "better" in the sense that it can be smarter to catch more obfuscations. For example, if you wanted to catch a bad word separated with either a period or a whitespace character or a comma, you can do:

提示:你可以使这个array_map()调用“更好”,因为它可以更聪明地捕获更多的混淆。例如,如果要捕获用句点或空格字符或逗号分隔的错误单词,则可以执行以下操作:

$bad_words = array_map( function( $el) { 
    return implode( '(?:\.|\s|,)', str_split( $el));
}, $bad_words);

Now if you make that obfuscation group optional, you'll catch a lot more bad words:

现在,如果你让混淆组成为可选的,你会发现更多不好的词:

$bad_words = array_map( function( $el) { 
    return implode( '(?:\.|\s|,)?', str_split( $el));
}, $bad_words);

Now, bad words should match:

现在,坏词应该匹配:

f.u.c
f,u.c
f u c 
fu c
f.uc

And many more.

还有很多。