preg replace会在检测到单词时忽略非字母字符

I have an array of words and a string and want to add a hashtag to the words in the string that they have a match inside the array. I use this loop to find and replace the words:

我有一个单词和一个字符串数组,并希望在字符串中的单词中添加一个hashtag,它们在数组中匹配。我使用这个循环来查找和替换单词:

foreach($testArray as $tag){
   $str = preg_replace("~\b".$tag."~i","#\$0",$str);
}

Problem: lets say I have the word "is" and "isolate" in my array. I will get ##isolate at the output. this means that the word "isolate" is found once for "is" and once for "isolate". And the pattern ignores the fact that "#isoldated" is not starting with "is" anymore and it starts with "#".

问题:假设我的数组中有“is”和“isolate”这个词。我会在输出中得到## isolate。这意味着“孤立”这个词一次被发现为“是”,一次被发现为“孤立”。并且该模式忽略了“#isoldated”不再以“is”开头而且以“#”开头的事实。

I bring an example BUT this is only an example and I don't want to just solve this one but every other possiblity:

我举了一个例子,但这只是一个例子,我不想只解决这个问题,而是每个其他可能性:

$str = "this is isolated is an  example of this and that";
$testArray = array('is','isolated','somethingElse');

Output will be:

输出将是:

this #is ##isolated #is an  example of this and that

2 个解决方案

#1

You may build a regex with an alternation group enclosed with word boundaries on both ends and replace all the matches in one pass:

您可以构建一个正则表达式,其中包含两端带有单词边界的替换组,并在一次传递中替换所有匹配项:

$str = "this is isolated is an  example of this and that";
$testArray = array('is','isolated','somethingElse');
echo preg_replace('~\b(?:' . implode('|', $testArray) . ')\b~i', '#$0', $str);
// => this #is #isolated #is an  example of this and that

See the PHP demo.

请参阅PHP演示。

The regex will look like

正则表达式看起来像

~\b(?:is|isolated|somethingElse)\b~

See its online demo.

查看其在线演示。

If you want to make your approach work, you might add a negative lookbehind after \b: "~\b(?<!#)".$tag."~i","#\$0". The lookbehind will fail all matches that are preceded with #. See this PHP demo.

如果你想让你的方法有效,你可以在\ b:“〜\ b(?

#2

A way to do that is to split your string by words and to build a associative array with your original array of words (to avoid the use of in_array):

一种方法是用字分割你的字符串,并用原始的单词数组构建一个关联数组(以避免使用in_array):

$str = "this is isolated is an example of this and that";
$testArray = array('is','isolated','somethingElse');

$hash = array_flip(array_map('strtolower', $testArray));

$parts = preg_split('~\b~', $str);

for ($i=1; $i<count($parts); $i+=2) {
    $low = strtolower($parts[$i]);
    if (isset($hash[$low])) $parts[$i-1] .= '#';
}

$result = implode('', $parts);

echo $result;

This way, your string is processed only once, whatever the number of words in your array.

这样,无论数组中的单词数是多少,您的字符串只会被处理一次。

#1