
时间:2022-07-19 22:13:01

what I want is: Let's supose I searched "goo" using a query that goes like this: ...WHERE message LIKE '%goo%' and it returned me a result, for example I love Google to make my searches, but I'm starting to worry about privacy, so it will be displayed as a result, because the word Google matches my search criteria.

我想要的是:让我嘲笑我使用这样的查询搜索“goo”:... WHERE消息LIKE'%goo%'并且它返回了我的结果,例如我喜欢Google进行搜索,但是我我开始担心隐私问题,因此会显示结果,因为Google这个词符合我的搜索条件。

How do I, based on my search string save this entire Google result on a variable? I need this because I'm using a regular expression that will highlight the searched word and display content before and after this result, but it's only working when the searched word matches exactly the word in the result, and also it's malconstructed, so it won't work well with words that are not surrounded by space.


This is the regular expression code



What I want is that change this $_GET['s'] to my variable which will contain the whole word found in my query string.

我想要的是将$ _GET ['s']改为我的变量,该变量将包含我的查询字符串中找到的整个单词。

How do I achieve this ?


3 个解决方案



I read your discussion on this and more robust implementation might be in order. Especially taking your need to support diacritics into account. Using a single regular expression to fix all your problems might seem tempting, but the more complicated it becomes the harder it gets to maintain or expand upon. To quote Jamie Zawinski

我读到了你对此的讨论,可能会有更强大的实现。特别是考虑到你需要支持变音符号。使用单个正则表达式来修复所有问题可能看起来很诱人,但是越复杂就越难以维护或扩展。引用Jamie Zawinski的话

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.


As I have problems with iconv on my local machine, I used a more simple implementation instead, feel free to use something more complicated or robust if your situation requires it.


I use a simple regular expression in this solution to get a set of alphanumeric characters only (also known as a "word"), the part in the regular expression that reads \p{L}\p{M} makes sure we also get all the multibyte characters.

我在这个解决方案中使用一个简单的正则表达式来获取一组字母数字字符(也称为“单词”),正则表达式中读取\ p {L} \ p {M}的部分确保我们也得到所有多字节字符。

You can see this code working on IDEone.


function stripAccents($p_sSubject) {
    $sSubject = (string) $p_sSubject;

    $sSubject = str_replace('æ', 'ae', $sSubject);
    $sSubject = str_replace('Æ', 'AE', $sSubject);

    $sSubject = strtr(
        , utf8_decode('àáâãäåçèéêëìíîïñòóôõöøùúûüýÿÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝ')
        , 'aaaaaaceeeeiiiinoooooouuuuyyAAAAAACEEEEIIIINOOOOOOUUUUY'

    return $sSubject;

function emphasiseWord($p_sSubject, $p_sSearchTerm){

    $aSubjects = preg_split('#([^a-z0-9\p{L}\p{M}]+)#iu', $p_sSubject, null, PREG_SPLIT_DELIM_CAPTURE);

    foreach($aSubjects as $t_iKey => $t_sSubject){
        $sSubject = stripAccents($t_sSubject);

        if(stripos($sSubject, $p_sSearchTerm) !== false || mb_stripos($t_sSubject, $p_sSearchTerm) !== false){
            $aSubjects[$t_iKey] = '<strong>' . $t_sSubject . '</strong>';

    $sSubject = implode('', $aSubjects);

    return $sSubject;

/////////////////////////////// Test \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
$aTest = array(
      'goo' => 'I love Google to make my searches, but I`m starting to worry about privacy.'
    , 'peo' => 'people, People, PEOPLE, peOple, people!, people., people?, "people, people" péo'
    , 'péo' => 'people, People, PEOPLE, peOple, people!, people., people?, "people, people" péo'
    , 'gen' => '"gente", "inteligente", "VAGENS", and "Gente" ...vocês da física que passam o dia protegendo...'
    , 'voce' => '...vocês da física que passam o dia protegendo...'
    , 'o' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'
    , 'ø' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'
    , 'ae' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'
    , 'Æ' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'

$sContent = '<dl>';
foreach($aTest as $t_sSearchTerm => $t_sSubject){
    $sContent .= '<dt>' . $t_sSearchTerm . '</dt><dd>' . emphasiseWord($t_sSubject, $t_sSearchTerm) .'</dd>';
$sContent .= '</dl>';

echo $sContent;



I bet it will be easier to change your regular expression to check any word containing the term, what about:





I don't understand the importance of matching everything else in the search string, wouldn't this simply be enough?


<?=preg_replace('/\b\S*'.$GET['s'].'\S*\b/i', '<strong>$0</strong>', $message);?>

As far as I can tell, you are only putting the matched word in a html tag, but not doing anything to the rest of the string?


The above regex works fine for cases where you are only matching whole words, captures multiple matches within a string (should there be more than one) and also works fine with case insensitivity.




I read your discussion on this and more robust implementation might be in order. Especially taking your need to support diacritics into account. Using a single regular expression to fix all your problems might seem tempting, but the more complicated it becomes the harder it gets to maintain or expand upon. To quote Jamie Zawinski

我读到了你对此的讨论,可能会有更强大的实现。特别是考虑到你需要支持变音符号。使用单个正则表达式来修复所有问题可能看起来很诱人,但是越复杂就越难以维护或扩展。引用Jamie Zawinski的话

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.


As I have problems with iconv on my local machine, I used a more simple implementation instead, feel free to use something more complicated or robust if your situation requires it.


I use a simple regular expression in this solution to get a set of alphanumeric characters only (also known as a "word"), the part in the regular expression that reads \p{L}\p{M} makes sure we also get all the multibyte characters.

我在这个解决方案中使用一个简单的正则表达式来获取一组字母数字字符(也称为“单词”),正则表达式中读取\ p {L} \ p {M}的部分确保我们也得到所有多字节字符。

You can see this code working on IDEone.


function stripAccents($p_sSubject) {
    $sSubject = (string) $p_sSubject;

    $sSubject = str_replace('æ', 'ae', $sSubject);
    $sSubject = str_replace('Æ', 'AE', $sSubject);

    $sSubject = strtr(
        , utf8_decode('àáâãäåçèéêëìíîïñòóôõöøùúûüýÿÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝ')
        , 'aaaaaaceeeeiiiinoooooouuuuyyAAAAAACEEEEIIIINOOOOOOUUUUY'

    return $sSubject;

function emphasiseWord($p_sSubject, $p_sSearchTerm){

    $aSubjects = preg_split('#([^a-z0-9\p{L}\p{M}]+)#iu', $p_sSubject, null, PREG_SPLIT_DELIM_CAPTURE);

    foreach($aSubjects as $t_iKey => $t_sSubject){
        $sSubject = stripAccents($t_sSubject);

        if(stripos($sSubject, $p_sSearchTerm) !== false || mb_stripos($t_sSubject, $p_sSearchTerm) !== false){
            $aSubjects[$t_iKey] = '<strong>' . $t_sSubject . '</strong>';

    $sSubject = implode('', $aSubjects);

    return $sSubject;

/////////////////////////////// Test \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
$aTest = array(
      'goo' => 'I love Google to make my searches, but I`m starting to worry about privacy.'
    , 'peo' => 'people, People, PEOPLE, peOple, people!, people., people?, "people, people" péo'
    , 'péo' => 'people, People, PEOPLE, peOple, people!, people., people?, "people, people" péo'
    , 'gen' => '"gente", "inteligente", "VAGENS", and "Gente" ...vocês da física que passam o dia protegendo...'
    , 'voce' => '...vocês da física que passam o dia protegendo...'
    , 'o' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'
    , 'ø' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'
    , 'ae' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'
    , 'Æ' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'

$sContent = '<dl>';
foreach($aTest as $t_sSearchTerm => $t_sSubject){
    $sContent .= '<dt>' . $t_sSearchTerm . '</dt><dd>' . emphasiseWord($t_sSubject, $t_sSearchTerm) .'</dd>';
$sContent .= '</dl>';

echo $sContent;



I bet it will be easier to change your regular expression to check any word containing the term, what about:





I don't understand the importance of matching everything else in the search string, wouldn't this simply be enough?


<?=preg_replace('/\b\S*'.$GET['s'].'\S*\b/i', '<strong>$0</strong>', $message);?>

As far as I can tell, you are only putting the matched word in a html tag, but not doing anything to the rest of the string?


The above regex works fine for cases where you are only matching whole words, captures multiple matches within a string (should there be more than one) and also works fine with case insensitivity.
