I am using haystack
within a project using solr
as the backend. I want to be able to perform a contains search, similar to the Django .filter(something__contains="...")
我使用solr作为后端在一个项目中使用haystack。我希望能够执行一个包含的搜索,类似于Django .filter(something__contains="…")
The __startswith
option does not suit our needs as it, as the name suggests, looks for words that start with the string.
选项不适合我们的需要,正如名称所示,查找以字符串开头的单词。
I tried to use something like *keyword*
but Solr does not allow the *
to be used as the first character
我尝试使用*关键字*,但Solr不允许*作为第一个字符使用。
Thanks.
谢谢。
4 个解决方案
#1
9
To get "contains" functionallity you can use:
要获得“包含”功能,您可以使用:
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="100" side="back"/>
<filter class="solr.LowerCaseFilterFactory" />
as index analyzer.
指数分析仪。
This will create ngrams for every whitespace separated word in your field. For example:
这将为您的字段中的每个空格分隔的单词创建ngram。例如:
"Index this!" => x, ex, dex, ndex, index, !, s!, is!, his!, this!
As you see this will expand your index greatly but if you now enter a query like:
如您所见,这将大大扩展您的索引,但是如果您现在输入一个查询:
"nde*"
it will match "ndex" giving you a hit.
它将匹配“ndex”给你一个打击。
Use this approach carefully to make sure that your index doesn't get too large. If you increase minGramSize, or decrease maxGramSize it will not expand the index as mutch but reduce the "contains" functionallity. For instance setting minGramSize="3" will require that you have at least 3 characters in your contains query.
仔细使用这个方法,确保索引不会太大。如果您增加minGramSize,或者减小maxGramSize,它将不会将索引扩展为mutch,而是减少“包含”功能。例如,设置minGramSize="3"需要在包含查询中至少有3个字符。
#2
1
You can achieve the same behavior without having to touch the solr schema. In your index, make your text field an EdgeNgramField instead of a CharField. Under the hood this will generate a similar schema to what lindstromhenrik suggested.
您可以在不接触solr模式的情况下实现相同的行为。在索引中,将文本字段设置为EdgeNgramField,而不是CharField。在引擎盖下,这将产生一个与lindstromhenrik所建议的相似的模式。
#3
0
I am using an expression like: .filter(something__startswith='...') .filter_or(name=''+s'...') as is seems solr does not like expression like '...*', but combined with or will do
我使用的表达式是:.filter(something__startswith='…').filter_or(name= " +s'…"),似乎solr不喜欢这样的表达式。*',但与之结合或将会。
#4
0
None of the answers here do a real substring search *keyword*
.
这里的答案没有一个是真正的子字符串搜索*关键字*。
They don't find the keyword that is part of a bigger string, (not a prefix or suffix).
他们没有找到属于更大字符串的关键字(不是前缀或后缀)。
Using EdgeNGramFilterFactory
or the EdgeNgramField
in the indexes can only do a "startswith" or a "endswith" type of filtering.
在索引中使用EdgeNGramFilterFactory或EdgeNgramField只能执行“startswith”或“endswith”类型的筛选。
The solution is to use a NgramField like this:
解决方法是使用像这样的NgramField:
class MyIndex(indexes.SearchIndex, indexes.Indexable):
...
field_to_index= indexes.NgramField(model_attr='field_name')
...
This is very elegant, because you don't need to manually add anything to the schema.xml
这是非常优雅的,因为您不需要手动将任何东西添加到schema.xml中。
#1
9
To get "contains" functionallity you can use:
要获得“包含”功能,您可以使用:
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="100" side="back"/>
<filter class="solr.LowerCaseFilterFactory" />
as index analyzer.
指数分析仪。
This will create ngrams for every whitespace separated word in your field. For example:
这将为您的字段中的每个空格分隔的单词创建ngram。例如:
"Index this!" => x, ex, dex, ndex, index, !, s!, is!, his!, this!
As you see this will expand your index greatly but if you now enter a query like:
如您所见,这将大大扩展您的索引,但是如果您现在输入一个查询:
"nde*"
it will match "ndex" giving you a hit.
它将匹配“ndex”给你一个打击。
Use this approach carefully to make sure that your index doesn't get too large. If you increase minGramSize, or decrease maxGramSize it will not expand the index as mutch but reduce the "contains" functionallity. For instance setting minGramSize="3" will require that you have at least 3 characters in your contains query.
仔细使用这个方法,确保索引不会太大。如果您增加minGramSize,或者减小maxGramSize,它将不会将索引扩展为mutch,而是减少“包含”功能。例如,设置minGramSize="3"需要在包含查询中至少有3个字符。
#2
1
You can achieve the same behavior without having to touch the solr schema. In your index, make your text field an EdgeNgramField instead of a CharField. Under the hood this will generate a similar schema to what lindstromhenrik suggested.
您可以在不接触solr模式的情况下实现相同的行为。在索引中,将文本字段设置为EdgeNgramField,而不是CharField。在引擎盖下,这将产生一个与lindstromhenrik所建议的相似的模式。
#3
0
I am using an expression like: .filter(something__startswith='...') .filter_or(name=''+s'...') as is seems solr does not like expression like '...*', but combined with or will do
我使用的表达式是:.filter(something__startswith='…').filter_or(name= " +s'…"),似乎solr不喜欢这样的表达式。*',但与之结合或将会。
#4
0
None of the answers here do a real substring search *keyword*
.
这里的答案没有一个是真正的子字符串搜索*关键字*。
They don't find the keyword that is part of a bigger string, (not a prefix or suffix).
他们没有找到属于更大字符串的关键字(不是前缀或后缀)。
Using EdgeNGramFilterFactory
or the EdgeNgramField
in the indexes can only do a "startswith" or a "endswith" type of filtering.
在索引中使用EdgeNGramFilterFactory或EdgeNgramField只能执行“startswith”或“endswith”类型的筛选。
The solution is to use a NgramField like this:
解决方法是使用像这样的NgramField:
class MyIndex(indexes.SearchIndex, indexes.Indexable):
...
field_to_index= indexes.NgramField(model_attr='field_name')
...
This is very elegant, because you don't need to manually add anything to the schema.xml
这是非常优雅的,因为您不需要手动将任何东西添加到schema.xml中。