what I want is: Let's supose I searched "goo" using a query that goes like this: ...WHERE message LIKE '%goo%'
and it returned me a result, for example I love Google to make my searches, but I'm starting to worry about privacy
, so it will be displayed as a result, because the word Google
matches my search criteria.
我想要的是:让我嘲笑我使用这样的查询搜索“goo”:... WHERE消息LIKE'%goo%'并且它返回了我的结果,例如我喜欢Google进行搜索,但是我我开始担心隐私问题,因此会显示结果,因为Google这个词符合我的搜索条件。
How do I, based on my search string save this entire Google
result on a variable? I need this because I'm using a regular expression that will highlight the searched word and display content before and after this result, but it's only working when the searched word matches exactly the word in the result, and also it's malconstructed, so it won't work well with words that are not surrounded by space.
如何根据我的搜索字符串将整个Google结果保存在变量中?我需要这个,因为我正在使用一个正则表达式,它会突出显示搜索到的单词并在此结果之前和之后显示内容,但它只在搜索到的单词与结果中的单词完全匹配时才起作用,而且它的构造也是错误的,所以它赢了用不被空间包围的词语很好用。
This is the regular expression code
这是正则表达式代码
<?=preg_replace('/^.*?\s(.{0,'.$size.'})(\b'.$_GET['s'].'\b)(.{0,'.$size.'})\s.*?$/',
'...$1<strong>$2</strong>$3...',$message);?>
What I want is that change this $_GET['s'] to my variable which will contain the whole word found in my query string.
我想要的是将$ _GET ['s']改为我的变量,该变量将包含我的查询字符串中找到的整个单词。
How do I achieve this ?
我该如何实现这一目标?
3 个解决方案
#1
2
I read your discussion on this and more robust implementation might be in order. Especially taking your need to support diacritics into account. Using a single regular expression to fix all your problems might seem tempting, but the more complicated it becomes the harder it gets to maintain or expand upon. To quote Jamie Zawinski
我读到了你对此的讨论,可能会有更强大的实现。特别是考虑到你需要支持变音符号。使用单个正则表达式来修复所有问题可能看起来很诱人,但是越复杂就越难以维护或扩展。引用Jamie Zawinski的话
Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.
有些人在遇到问题时会想“我知道,我会使用正则表达式。”现在他们有两个问题。
As I have problems with iconv
on my local machine, I used a more simple implementation instead, feel free to use something more complicated or robust if your situation requires it.
由于我在本地机器上遇到了iconv问题,我使用了更简单的实现,如果您的情况需要,可以随意使用更复杂或更强大的东西。
I use a simple regular expression in this solution to get a set of alphanumeric characters only (also known as a "word"), the part in the regular expression that reads \p{L}\p{M}
makes sure we also get all the multibyte characters.
我在这个解决方案中使用一个简单的正则表达式来获取一组字母数字字符(也称为“单词”),正则表达式中读取\ p {L} \ p {M}的部分确保我们也得到所有多字节字符。
You can see this code working on IDEone.
您可以在IDEone上看到此代码。
<?php
function stripAccents($p_sSubject) {
$sSubject = (string) $p_sSubject;
$sSubject = str_replace('æ', 'ae', $sSubject);
$sSubject = str_replace('Æ', 'AE', $sSubject);
$sSubject = strtr(
utf8_decode($sSubject)
, utf8_decode('àáâãäåçèéêëìíîïñòóôõöøùúûüýÿÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝ')
, 'aaaaaaceeeeiiiinoooooouuuuyyAAAAAACEEEEIIIINOOOOOOUUUUY'
);
return $sSubject;
}
function emphasiseWord($p_sSubject, $p_sSearchTerm){
$aSubjects = preg_split('#([^a-z0-9\p{L}\p{M}]+)#iu', $p_sSubject, null, PREG_SPLIT_DELIM_CAPTURE);
foreach($aSubjects as $t_iKey => $t_sSubject){
$sSubject = stripAccents($t_sSubject);
if(stripos($sSubject, $p_sSearchTerm) !== false || mb_stripos($t_sSubject, $p_sSearchTerm) !== false){
$aSubjects[$t_iKey] = '<strong>' . $t_sSubject . '</strong>';
}
}
$sSubject = implode('', $aSubjects);
return $sSubject;
}
/////////////////////////////// Test \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
$aTest = array(
'goo' => 'I love Google to make my searches, but I`m starting to worry about privacy.'
, 'peo' => 'people, People, PEOPLE, peOple, people!, people., people?, "people, people" péo'
, 'péo' => 'people, People, PEOPLE, peOple, people!, people., people?, "people, people" péo'
, 'gen' => '"gente", "inteligente", "VAGENS", and "Gente" ...vocês da física que passam o dia protegendo...'
, 'voce' => '...vocês da física que passam o dia protegendo...'
, 'o' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'
, 'ø' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'
, 'ae' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'
, 'Æ' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'
);
$sContent = '<dl>';
foreach($aTest as $t_sSearchTerm => $t_sSubject){
$sContent .= '<dt>' . $t_sSearchTerm . '</dt><dd>' . emphasiseWord($t_sSubject, $t_sSearchTerm) .'</dd>';
}
$sContent .= '</dl>';
echo $sContent;
?>
#2
4
I bet it will be easier to change your regular expression to check any word containing the term, what about:
我敢打赌,更改正则表达式以检查包含该术语的任何单词会更容易,如何:
<?=preg_replace('/^.*?(.{0,'.$size.'})(\b\S*'.$_GET['s'].'\S*\b)(.{0,'.$size.'}).*?$/i',
'...$1<strong>$2</strong>$3...',$message);?>
#3
0
I don't understand the importance of matching everything else in the search string, wouldn't this simply be enough?
我不明白匹配搜索字符串中其他所有内容的重要性,这不是足够的吗?
<?=preg_replace('/\b\S*'.$GET['s'].'\S*\b/i', '<strong>$0</strong>', $message);?>
As far as I can tell, you are only putting the matched word in a html tag, but not doing anything to the rest of the string?
据我所知,你只是将匹配的单词放在一个html标签中,但是没有对字符串的其余部分做任何事情?
The above regex works fine for cases where you are only matching whole words, captures multiple matches within a string (should there be more than one) and also works fine with case insensitivity.
上面的正则表达式适用于你只匹配整个单词,捕获字符串中的多个匹配(如果有多个匹配)的情况,并且在不区分大小写的情况下也能正常工作。
#1
2
I read your discussion on this and more robust implementation might be in order. Especially taking your need to support diacritics into account. Using a single regular expression to fix all your problems might seem tempting, but the more complicated it becomes the harder it gets to maintain or expand upon. To quote Jamie Zawinski
我读到了你对此的讨论,可能会有更强大的实现。特别是考虑到你需要支持变音符号。使用单个正则表达式来修复所有问题可能看起来很诱人,但是越复杂就越难以维护或扩展。引用Jamie Zawinski的话
Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.
有些人在遇到问题时会想“我知道,我会使用正则表达式。”现在他们有两个问题。
As I have problems with iconv
on my local machine, I used a more simple implementation instead, feel free to use something more complicated or robust if your situation requires it.
由于我在本地机器上遇到了iconv问题,我使用了更简单的实现,如果您的情况需要,可以随意使用更复杂或更强大的东西。
I use a simple regular expression in this solution to get a set of alphanumeric characters only (also known as a "word"), the part in the regular expression that reads \p{L}\p{M}
makes sure we also get all the multibyte characters.
我在这个解决方案中使用一个简单的正则表达式来获取一组字母数字字符(也称为“单词”),正则表达式中读取\ p {L} \ p {M}的部分确保我们也得到所有多字节字符。
You can see this code working on IDEone.
您可以在IDEone上看到此代码。
<?php
function stripAccents($p_sSubject) {
$sSubject = (string) $p_sSubject;
$sSubject = str_replace('æ', 'ae', $sSubject);
$sSubject = str_replace('Æ', 'AE', $sSubject);
$sSubject = strtr(
utf8_decode($sSubject)
, utf8_decode('àáâãäåçèéêëìíîïñòóôõöøùúûüýÿÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝ')
, 'aaaaaaceeeeiiiinoooooouuuuyyAAAAAACEEEEIIIINOOOOOOUUUUY'
);
return $sSubject;
}
function emphasiseWord($p_sSubject, $p_sSearchTerm){
$aSubjects = preg_split('#([^a-z0-9\p{L}\p{M}]+)#iu', $p_sSubject, null, PREG_SPLIT_DELIM_CAPTURE);
foreach($aSubjects as $t_iKey => $t_sSubject){
$sSubject = stripAccents($t_sSubject);
if(stripos($sSubject, $p_sSearchTerm) !== false || mb_stripos($t_sSubject, $p_sSearchTerm) !== false){
$aSubjects[$t_iKey] = '<strong>' . $t_sSubject . '</strong>';
}
}
$sSubject = implode('', $aSubjects);
return $sSubject;
}
/////////////////////////////// Test \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
$aTest = array(
'goo' => 'I love Google to make my searches, but I`m starting to worry about privacy.'
, 'peo' => 'people, People, PEOPLE, peOple, people!, people., people?, "people, people" péo'
, 'péo' => 'people, People, PEOPLE, peOple, people!, people., people?, "people, people" péo'
, 'gen' => '"gente", "inteligente", "VAGENS", and "Gente" ...vocês da física que passam o dia protegendo...'
, 'voce' => '...vocês da física que passam o dia protegendo...'
, 'o' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'
, 'ø' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'
, 'ae' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'
, 'Æ' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'
);
$sContent = '<dl>';
foreach($aTest as $t_sSearchTerm => $t_sSubject){
$sContent .= '<dt>' . $t_sSearchTerm . '</dt><dd>' . emphasiseWord($t_sSubject, $t_sSearchTerm) .'</dd>';
}
$sContent .= '</dl>';
echo $sContent;
?>
#2
4
I bet it will be easier to change your regular expression to check any word containing the term, what about:
我敢打赌,更改正则表达式以检查包含该术语的任何单词会更容易,如何:
<?=preg_replace('/^.*?(.{0,'.$size.'})(\b\S*'.$_GET['s'].'\S*\b)(.{0,'.$size.'}).*?$/i',
'...$1<strong>$2</strong>$3...',$message);?>
#3
0
I don't understand the importance of matching everything else in the search string, wouldn't this simply be enough?
我不明白匹配搜索字符串中其他所有内容的重要性,这不是足够的吗?
<?=preg_replace('/\b\S*'.$GET['s'].'\S*\b/i', '<strong>$0</strong>', $message);?>
As far as I can tell, you are only putting the matched word in a html tag, but not doing anything to the rest of the string?
据我所知,你只是将匹配的单词放在一个html标签中,但是没有对字符串的其余部分做任何事情?
The above regex works fine for cases where you are only matching whole words, captures multiple matches within a string (should there be more than one) and also works fine with case insensitivity.
上面的正则表达式适用于你只匹配整个单词,捕获字符串中的多个匹配(如果有多个匹配)的情况,并且在不区分大小写的情况下也能正常工作。