
时间:2021-07-08 01:30:01

It would be grateful if somebody can help me with my problem. I have this query:


select?q=city:Frankfurt am Main~&fq=street:Gerhart-Hauptmann-Str.~

This is not working for me. I want to use fuzzy search to catch some user input mistakes.


Here is what I want:


  • Frankfurt am Main should be searched completely in the field city with fuzzy search
  • 应该通过模糊搜索在田野城市中完全搜索法兰克福

  • Gerhart-Hauptmann-Str. should be converted into three terms with fuzzy search.
  • 格哈特 - 豪普特曼-STR。应该用模糊搜索转换成三个术语。

Debug output of what I get actually:


"debug": {
    "rawquerystring": "city:Frankfurt am Main~",
    "querystring": "city:Frankfurt am Main~",
    "parsedquery": city:frankfurt text:am text:Main~2",
    "parsedquery_toString": "city:frankfurt text:am text:Main~2",
    "explain": {...},
    "QParser": "LuceneQParser",
    "filter_queries": [
    "parsed_filter_queries": [

I (think) I want this output:


 "debug": {
        "rawquerystring": "city:Frankfurt am Main~",
        "querystring": "city:Frankfurt am Main~",
        "parsedquery": city:frankfurt~2 city:am~2 text:Main~2",
        "parsedquery_toString": "city:frankfurt~2 city:am~2 text:Main~2",
        "explain": {...},
        "QParser": "LuceneQParser",
        "filter_queries": [
        "parsed_filter_queries": [
         # My analyser converts Str. to strasse
          "street:gerhart~2 street:hauptmann~2 strasse~2"

The definition of the fields in the schema.xml


<field name="city" type="admin_name" indexed="true" stored="true" />
<field name="street" type="street_name" indexed="true" stored="true" multiValued="false"/>

<fieldType name="admin_name" class="solr.TextField" >
          <tokenizer class="solr.StandardTokenizerFactory"/>          
          <filter class="solr.LowerCaseFilterFactory" />
          <filter class="solr.SynonymFilterFactory" synonyms="lang/synonyms_de_admin.txt"/>       
          <filter class="solr.ASCIIFoldingFilterFactory"/>

    <fieldType name="street_name" class="solr.TextField" >
          <tokenizer class="solr.StandardTokenizerFactory"/>          
          <filter class="solr.LowerCaseFilterFactory" />
          <!-- The StartEndSynonymFilter replaces synonyms which 
               are at the start or the end of an term. The types
               START_SYNONYM or END_SYNONYM will be set. -->          
          <filter class="my.StartEndSynonymFilterFactory" synonyms="lang/synonyms_de_street.txt"/>        
          <filter class="solr.ASCIIFoldingFilterFactory"/>

Is this somehow possible?


If you need additional information to answer, please leave a hint in a comment.


1 个解决方案


  1. Tokenizing on Hyphens
  2. 对连字符进行标记

Have a look at the WordDelimiterFilterFactory:


  1. Applying Fuzzy to every single term
  2. 将模糊应用于每个单项

DISCLAIMER: I have not yet used fuzzy search in my SOLR setups.


You might have to be careful with tokenizing the city names and applying the fuzzy search to every single token. Your example "Frankfurt am Main" would in this case apply fuzzy search to "am", as well. Please try with parenthesis: (Frankfurt am Main)~ whether this gets you the intended result.


However, in case of names (city or streets) I'm not sure you should be even tokenizing them. Maybe storing them as one case insensitive token and applying the fuzzy search like this "Frankfurt am Main"~ (with quotes in the query) is actually what you need.

但是,如果是名字(城市或街道),我不确定你是否应该对它们进行标记。也许将它们存储为一个不区分大小写的令牌并应用模糊搜索,如“Frankfurt am Main”〜(在查询中带引号)实际上就是您所需要的。

Nevertheless, you should try and get it to work in the way you have described it. Then look at the query results. And (maybe in parallel) setup an index where you store the city and street names as single tokens (KeywordTokenizer with lower casing and ascii folding, e.g.) and apply fuzzy search to them as single terms. I would guess that the results will be sharper. But best - try it out and compare.

不过,你应该尝试按照你所描述的方式使它工作。然后查看查询结果。并且(可能并行)设置索引,将城市和街道名称存储为单个标记(具有较低套管和ascii折叠的KeywordTokenizer,例如),并将模糊搜索作为单个术语应用于它们。我猜想结果会更清晰。但最好 - 尝试一下并进行比较。

In addition, I would suggest to try out the (extended or not) DisMax Handler for input without even caring to differentiate between cities and streets on the input side:

另外,我建议尝试使用(扩展或不扩展)DisMax处理程序进行输入,甚至无需区分输入端的城市和街道: +扩展+ DisMax +查询+分析器

With the dismax handler processing the input, you can allow the user to input search terms very freely (like having a single search field where cities and streets can be input in random order and format).



  1. Tokenizing on Hyphens
  2. 对连字符进行标记

Have a look at the WordDelimiterFilterFactory:


  1. Applying Fuzzy to every single term
  2. 将模糊应用于每个单项

DISCLAIMER: I have not yet used fuzzy search in my SOLR setups.


You might have to be careful with tokenizing the city names and applying the fuzzy search to every single token. Your example "Frankfurt am Main" would in this case apply fuzzy search to "am", as well. Please try with parenthesis: (Frankfurt am Main)~ whether this gets you the intended result.


However, in case of names (city or streets) I'm not sure you should be even tokenizing them. Maybe storing them as one case insensitive token and applying the fuzzy search like this "Frankfurt am Main"~ (with quotes in the query) is actually what you need.

但是,如果是名字(城市或街道),我不确定你是否应该对它们进行标记。也许将它们存储为一个不区分大小写的令牌并应用模糊搜索,如“Frankfurt am Main”〜(在查询中带引号)实际上就是您所需要的。

Nevertheless, you should try and get it to work in the way you have described it. Then look at the query results. And (maybe in parallel) setup an index where you store the city and street names as single tokens (KeywordTokenizer with lower casing and ascii folding, e.g.) and apply fuzzy search to them as single terms. I would guess that the results will be sharper. But best - try it out and compare.

不过,你应该尝试按照你所描述的方式使它工作。然后查看查询结果。并且(可能并行)设置索引,将城市和街道名称存储为单个标记(具有较低套管和ascii折叠的KeywordTokenizer,例如),并将模糊搜索作为单个术语应用于它们。我猜想结果会更清晰。但最好 - 尝试一下并进行比较。

In addition, I would suggest to try out the (extended or not) DisMax Handler for input without even caring to differentiate between cities and streets on the input side:

另外,我建议尝试使用(扩展或不扩展)DisMax处理程序进行输入,甚至无需区分输入端的城市和街道: +扩展+ DisMax +查询+分析器

With the dismax handler processing the input, you can allow the user to input search terms very freely (like having a single search field where cities and streets can be input in random order and format).
