I am building a basic search engine using mongodb, I have verified that the basic query work in the mongo shell. I am not quite understanding how this can be translated into PHP though.
我正在使用mongodb构建一个基本的搜索引擎,我已经验证了mongo shell中的基本查询工作。我不太理解如何将其转换为PHP。
Spaces in the input string signify 'and' operators and | or pipe characters are the 'or' operators. The input query changes , but could be something along these lines (minus the quotes!):
输入字符串中的空格表示“和”操作符,|或管道字符表示“或”操作符。输入查询会发生变化,但可能是以下几行(减去引号!)
'o g|ra'
That would be equivalent to writing:
这相当于这样写:
(o&&g)||(ra)
Basic mongo query (please note I am not trying to translate this exact query everytime, I need it to be flexible in terms of the number of $ands and $ors). Have tested this and it works fine:
基本的mongo查询(请注意,我不是每次都要翻译这个查询,我需要它在$and和$ors的数量上具有灵活性)。测试过,效果很好:
db.scores.find({$or:[{Title:/o/i, Title: /g/i},{Title:/ra/i}])
The code that I have produced in PHP is this:
我用PHP生成的代码是:
if(strstr($textInput, '|') != FALSE)
{
foreach($orArray as $item)
{
$itemMod = explode( " " , $item);
array_push($stringArray, $itemMod);
}
$masterAndQueryStack = array();
foreach ($stringArray as $varg)
{
$multiAndQuerySet = array();
foreach ($varg as $obj)
{
$searchText = '/'. $obj .'/i';
$regexObj = new MongoRegex( $searchText ) ;
$singleQuery = array('Title' => $regexObj);
array_push($multiAndQuerySet , $singleQuery);
}
array_push($masterAndQueryStack , $multiAndQuerySet);
}
$orAndQueryStack = array('$or' => $masterAndQueryStack);
return $orAndQueryStack ;
}
This is the query that has been returned by the PHP code, as you can see the and terms have been put in an array. I can't see any way of storing these without pushing them to an array, however it seems that mongodb's $or does not like accepting an array, I'm just not sure how to re-work the search algorithm to account for this.
这是PHP代码返回的查询,如您所见,和项被放入数组中。如果不将它们放到数组中,我就看不到任何存储它们的方法,但是mongodb的$或者不喜欢接受数组,我只是不确定如何重新使用搜索算法来解释这一点。
Array
(
[$or] => Array
(
[0] => Array
(
[0] => Array ( [Title] => MongoRegex Object ( [regex] => o [flags] => i ) )
[1] => Array ( [Title] => MongoRegex Object ( [regex] => g [flags] => i ) )
)
[1] => Array
(
[0] => Array ( [Title] => MongoRegex Object ( [regex] => ra [flags] => i ) )
)
)
)
2 个解决方案
#1
2
To explain my comment further I will tell you about the $and operator: http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24and
为了进一步解释我的评论,我将介绍$和操作符:http://www.mongodb.org/display/DOCS/Advanced Queries#AdvancedQueries-%24和
You can nest this within your first $or making:
你可以将它嵌在你的第一美元或赚到的钱里:
Array
(
[$or] => Array
(
[0] => Array
(
[$and] => Array
(
[0] => Array ( [Title] => MongoRegex Object ( [regex] => o [flags] => i ) )
[1] => Array ( [Title] => MongoRegex Object ( [regex] => g [flags] => i ) )
)
)
[1] => Array
(
[Title] => MongoRegex Object ( [regex] => ra [flags] => i )
)
)
)
Like that. You can also perform $and queries in Regex, some info here about regex syntax: http://www.regular-expressions.info/refadv.html
像这样。您还可以在Regex中执行$和查询,这里有关于Regex语法的一些信息:http://www.regular-expressions.info/refadv.html
#2
1
Not sure what sort of corpus of data you have to search, but there are some significant limitations with your current approach:
不确定你需要搜索什么样的数据主体,但是你目前的方法有一些明显的局限性:
- case-insensitive regex matches will result in full index scan
- 不区分大小写的regex匹配将导致完全索引扫描
- you are combining multiple regex matches with $or (adding to the performance overhead)
- 您将多个regex匹配与$or组合在一起(增加性能开销)
- there is no relevance ordering for matching results
- 匹配结果没有关联排序
All of the above caveats may be fine if you don't have a large data set to search.
如果您没有要搜索的大数据集,那么上述所有注意事项可能都没有问题。
Some more performant alternatives would be:
一些更有效果的替代方案将是:
- use an index of tags or tokenized search keywords (see related wiki page Fulltext search in Mongo)
- 使用标记或标记化搜索关键字的索引(参见Mongo相关的wiki页面全文搜索)
- use a more full featured fulltext search product (see related discussion on SO: Full text search in NoSQL databases)
- 使用功能更全面的全文搜索产品(请参阅有关SO的讨论:NoSQL数据库中的全文搜索)
#1
2
To explain my comment further I will tell you about the $and operator: http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24and
为了进一步解释我的评论,我将介绍$和操作符:http://www.mongodb.org/display/DOCS/Advanced Queries#AdvancedQueries-%24和
You can nest this within your first $or making:
你可以将它嵌在你的第一美元或赚到的钱里:
Array
(
[$or] => Array
(
[0] => Array
(
[$and] => Array
(
[0] => Array ( [Title] => MongoRegex Object ( [regex] => o [flags] => i ) )
[1] => Array ( [Title] => MongoRegex Object ( [regex] => g [flags] => i ) )
)
)
[1] => Array
(
[Title] => MongoRegex Object ( [regex] => ra [flags] => i )
)
)
)
Like that. You can also perform $and queries in Regex, some info here about regex syntax: http://www.regular-expressions.info/refadv.html
像这样。您还可以在Regex中执行$和查询,这里有关于Regex语法的一些信息:http://www.regular-expressions.info/refadv.html
#2
1
Not sure what sort of corpus of data you have to search, but there are some significant limitations with your current approach:
不确定你需要搜索什么样的数据主体,但是你目前的方法有一些明显的局限性:
- case-insensitive regex matches will result in full index scan
- 不区分大小写的regex匹配将导致完全索引扫描
- you are combining multiple regex matches with $or (adding to the performance overhead)
- 您将多个regex匹配与$or组合在一起(增加性能开销)
- there is no relevance ordering for matching results
- 匹配结果没有关联排序
All of the above caveats may be fine if you don't have a large data set to search.
如果您没有要搜索的大数据集,那么上述所有注意事项可能都没有问题。
Some more performant alternatives would be:
一些更有效果的替代方案将是:
- use an index of tags or tokenized search keywords (see related wiki page Fulltext search in Mongo)
- 使用标记或标记化搜索关键字的索引(参见Mongo相关的wiki页面全文搜索)
- use a more full featured fulltext search product (see related discussion on SO: Full text search in NoSQL databases)
- 使用功能更全面的全文搜索产品(请参阅有关SO的讨论:NoSQL数据库中的全文搜索)