使用Lucene.net进行确切的短语搜索

时间:2021-06-11 03:03:20

I am having trouble searching for an exact phrase using Lucene.NET 2.0.0.4

我无法使用Lucene.NET 2.0.0.4搜索确切的短语

For example I am searching for "scope attribute sets the variable" (including quotes) but receive no matches, I have confirmed 100% that the phrase exists.

例如,我正在搜索“范围属性设置变量”(包括引号)但没有收到匹配,我已经确认100%该短语存在。

Can anyone suggest where I am going wrong? Is this even supported with Lucene.NET? As usual the API documentation is not too helpful and a few CodeProject articles I've read don't specifically touch on this.

任何人都可以建议我哪里出错了?这甚至是Lucene.NET支持的吗?像往常一样,API文档并没有太大帮助,我读过的一些CodeProject文章并没有特别涉及到这一点。

Using the following code to create the index:

使用以下代码创建索引:

Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory("Index", true);

Analyzer analyzer = new Lucene.Net.Analysis.SimpleAnalyzer();

IndexWriter indexWriter = new Lucene.Net.Index.IndexWriter(dir, analyzer,true);

//create a document, add in a single field
Lucene.Net.Documents.Document doc = new Lucene.Net.Documents.Document();

Lucene.Net.Documents.Field fldContent = new Lucene.Net.Documents.Field(
    "content", File.ReadAllText(@"Documents\100.txt"),
    Lucene.Net.Documents.Field.Store.YES,
    Lucene.Net.Documents.Field.Index.TOKENIZED);

doc.Add(fldContent);

//write the document to the index
indexWriter.AddDocument(doc);

I then search for a phrase using:

然后我使用以下方法搜索短语:

//state the file location of the index
Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory("Index", false);

//create an index searcher that will perform the search
IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(dir);

QueryParser qp = new QueryParser("content", new SimpleAnalyzer());

// txtSearch.Text  Contains a phrase such as "this is a phrase" 
Query q=qp.Parse(txtSearch.Text);  


//execute the query
Lucene.Net.Search.Hits hits = searcher.Search(q);

The target document is about 7 MB plain text.

目标文档大约是7 MB纯文本。

I have seen this previous question however I don't want a proximity search, just an exact phrase search.

我已经看过上一个问题,但是我不想要接近搜索,只需要一个精确的短语搜索。

2 个解决方案

#1


You have not enabled the term positions. Creating field as follows should solve your problem.

您尚未启用术语位置。如下创建字段可以解决您的问题。

Lucene.Net.Documents.Field fldContent = 
    new Lucene.Net.Documents.Field("content", 
        File.ReadAllText(@"Documents\100.txt"),
    Lucene.Net.Documents.Field.Store.YES,
    Lucene.Net.Documents.Field.Index.TOKENIZED, 
    Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS);

#2


Shashikant Kore is correct with his answer, you need to enable term positions...

Shashikant Kore对他的回答是正确的,你需要启用任期...

However, I would recommend not storing the text of the document in the field unless you absolutely need it to return back to you in the search results... Setting the store to 'NO' might help reduce the size of your index a bit.

但是,我建议不要在文档中存储文档的文本,除非您绝对需要它在搜索结果中返回给您...将存储设置为“否”可能有助于减少索引的大小。

Lucene.Net.Documents.Field fldContent = 
    new Lucene.Net.Documents.Field("content", 
        File.ReadAllText(@"Documents\100.txt"),
    Lucene.Net.Documents.Field.Store.NO,
    Lucene.Net.Documents.Field.Index.TOKENIZED, 
    Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS);

#1


You have not enabled the term positions. Creating field as follows should solve your problem.

您尚未启用术语位置。如下创建字段可以解决您的问题。

Lucene.Net.Documents.Field fldContent = 
    new Lucene.Net.Documents.Field("content", 
        File.ReadAllText(@"Documents\100.txt"),
    Lucene.Net.Documents.Field.Store.YES,
    Lucene.Net.Documents.Field.Index.TOKENIZED, 
    Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS);

#2


Shashikant Kore is correct with his answer, you need to enable term positions...

Shashikant Kore对他的回答是正确的,你需要启用任期...

However, I would recommend not storing the text of the document in the field unless you absolutely need it to return back to you in the search results... Setting the store to 'NO' might help reduce the size of your index a bit.

但是,我建议不要在文档中存储文档的文本,除非您绝对需要它在搜索结果中返回给您...将存储设置为“否”可能有助于减少索引的大小。

Lucene.Net.Documents.Field fldContent = 
    new Lucene.Net.Documents.Field("content", 
        File.ReadAllText(@"Documents\100.txt"),
    Lucene.Net.Documents.Field.Store.NO,
    Lucene.Net.Documents.Field.Index.TOKENIZED, 
    Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS);