我如何实现标签搜索?与lucene?

时间:2022-07-16 03:07:20

I havent used lucene. Last time i ask (many months ago, maybe a year) people suggested lucene. If i shouldnt use lucene what should i use? As am example say there are items tagged like this

我还没用过lucene。上次我问(几个月前,也许一年),人们建议使用lucene。如果我不应该使用lucene我应该使用什么?例如,有些标记的项目是这样的

  1. apples carrots
  2. apples
  3. carrots
  4. apple banana

if a user search apples i dont care if there is any preference from 1,2 and 4. However i seen many forums do this which i HATED is when a user search apple carrots 2 and 3 have high results while 1 is hard to find even though it matches my search more closely.

如果一个用户搜索苹果我不在乎是否有1,2和4的任何偏好。但是我看到很多论坛这样做我讨厌当用户搜索苹果胡萝卜2和3有高结果而1很难找到甚至虽然它更贴近我的搜索。

Also i would like the ability to do search carrots -apples which will only get me 3. I am not sure what should happen if i search carrots banana but anyways as long as more items tagged with 2 and 3 results are lower ranking then 1 when i search apples carrots i'll be happy.

此外,我希望能够搜索胡萝卜 - 苹果只能得到我3.我不知道如果我搜索胡萝卜香蕉会发生什么,但不管怎样,只要更多的项目标记为2和3的结果是较低的排名然后1我搜索苹果胡萝卜,我会很高兴。

Can lucene do this? and where do i start? I tried looking it up and when i do i see a lot of classes and i'll see tutorials talking about documents, webpages but none were clear about what to do when i like to tag something. If not lucene what should i use for tagging?

lucene可以这样做吗?我从哪里开始?我尝试查找它,当我看到很多课程时,我会看到有关文档,网页的教程,但是当我喜欢标记某些内容时,没有人明白该做什么。如果不是lucene我应该用什么标记?

2 个解决方案

#1


14  

Lucene for .net seems to be mature. No need to use Java or SOLR

Lucene for .net似乎很成熟。无需使用Java或SOLR

The Standard query language for Lucene allows equally ranked search terms and negation

Lucene的标准查询语言允许同等排名的搜索词和否定词

So if your Lucene index had a field "tag" your query would be

因此,如果您的Lucene索引有一个字段“tag”,那么您的查询就是

tag:apple* OR tag: carrot*

Which would give equal ranking to each word, and more rank weighting to document with both tags

这将给每个单词赋予相同的排名,并且使用两个标签进行更多的排名加权

To negate a tag use this

要否定标签,请使用此选项

tag:carrot* NOT tag:apple*

Simple example to show indexing and querying with Lucene here

在这里用Lucene显示索引和查询的简单示例

#2


17  

Edit: You can use Lucene. Here's an explanation how to do this in Lucene.net. Some Lucene basics are:

编辑:您可以使用Lucene。以下是Lucene.net中如何做到这一点的解释。一些Lucene基础知识是:

  • Document - is the storage unit in Lucene. It is somewhat analogous to a database record.
  • 文档 - 是Lucene的存储单元。它有点类似于数据库记录。

  • Field - the search unit in Lucene. Analogous to a database column. Lucene searches for text by taking a query and matching it against fields. A field should be indexed in order to enable search.
  • Field - Lucene的搜索单位。类似于数据库列。 Lucene通过查询并将其与字段匹配来搜索文本。应对字段编制索引以启用搜索。

  • Token - the search atom in Lucene. Usually a word, sometimes a phrase, letter or digit.
  • 令牌 - Lucene中的搜索原子。通常是一个单词,有时是短语,字母或数字。

  • Analyzer - the part of Lucene that transforms a field into tokens.
  • 分析器 - Lucene将字段转换为标记的部分。

Please read this blog post about creating and using a Lucene.net index.

请阅读有关创建和使用Lucene.net索引的博客文章。

I assume you are tagging blog posts. If I am totally wrong, please say so. In order to search for tags, you need to represent them as Lucene entities, namely as tokens inside a "tags" field.

我假设你正在标记博客帖子。如果我完全错了,请说出来。为了搜索标签,您需要将它们表示为Lucene实体,即“标签”字段中的标记。

One way of doing so, is assigning a Lucene document per blog post. The document will have at least the following fields:

这样做的一种方法是为每个博客文章分配一个Lucene文档。该文件至少包含以下字段:

  • id: unique id of the blog post.
  • id:博客文章的唯一ID。

  • content: the text of the blog post.
  • 内容:博客文章的文字。

  • tags: list of tags.
  • 标签:标签列表。

Indexing: Whenever you add a tag to a post, remove a tag or edit it, you will need to index the post. The Analyzer will transform the fields into their token representation.

索引:每当您向帖子添加标签,删除标签或对其进行编辑时,您都需要为帖子编制索引。分析器将字段转换为其令牌表示。

Document doc = new Document();
doc.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NO));
doc.Add(new Field("content", text, Field.Store.YES, Field.Index.TOKENIZED));
doc.Add(new Field("tags", tags, Field.Store.YES, Field.Index.TOKENIZED));
writer.AddDocument(doc);

The remaining part is retrieval. For this, you need to create a QueryParser and pass it a query string, like this:

剩下的部分是检索。为此,您需要创建一个QueryParser并将其传递给查询字符串,如下所示:

QueryParser qp = new QueryParser();
Query q = qp.Parse(s);
Hits = Searcher.Search(q);

The syntax you need for s will be:

您需要的语法是:

tags: apples tags: carrots

To search for apples or carrots

寻找苹果或胡萝卜

tags: carrots NOT tags: apples

See the Lucene Query Parser Syntax for details on constructing s.

有关构造s的详细信息,请参阅Lucene Query Parser语法。

#1


14  

Lucene for .net seems to be mature. No need to use Java or SOLR

Lucene for .net似乎很成熟。无需使用Java或SOLR

The Standard query language for Lucene allows equally ranked search terms and negation

Lucene的标准查询语言允许同等排名的搜索词和否定词

So if your Lucene index had a field "tag" your query would be

因此,如果您的Lucene索引有一个字段“tag”,那么您的查询就是

tag:apple* OR tag: carrot*

Which would give equal ranking to each word, and more rank weighting to document with both tags

这将给每个单词赋予相同的排名,并且使用两个标签进行更多的排名加权

To negate a tag use this

要否定标签,请使用此选项

tag:carrot* NOT tag:apple*

Simple example to show indexing and querying with Lucene here

在这里用Lucene显示索引和查询的简单示例

#2


17  

Edit: You can use Lucene. Here's an explanation how to do this in Lucene.net. Some Lucene basics are:

编辑:您可以使用Lucene。以下是Lucene.net中如何做到这一点的解释。一些Lucene基础知识是:

  • Document - is the storage unit in Lucene. It is somewhat analogous to a database record.
  • 文档 - 是Lucene的存储单元。它有点类似于数据库记录。

  • Field - the search unit in Lucene. Analogous to a database column. Lucene searches for text by taking a query and matching it against fields. A field should be indexed in order to enable search.
  • Field - Lucene的搜索单位。类似于数据库列。 Lucene通过查询并将其与字段匹配来搜索文本。应对字段编制索引以启用搜索。

  • Token - the search atom in Lucene. Usually a word, sometimes a phrase, letter or digit.
  • 令牌 - Lucene中的搜索原子。通常是一个单词,有时是短语,字母或数字。

  • Analyzer - the part of Lucene that transforms a field into tokens.
  • 分析器 - Lucene将字段转换为标记的部分。

Please read this blog post about creating and using a Lucene.net index.

请阅读有关创建和使用Lucene.net索引的博客文章。

I assume you are tagging blog posts. If I am totally wrong, please say so. In order to search for tags, you need to represent them as Lucene entities, namely as tokens inside a "tags" field.

我假设你正在标记博客帖子。如果我完全错了,请说出来。为了搜索标签,您需要将它们表示为Lucene实体,即“标签”字段中的标记。

One way of doing so, is assigning a Lucene document per blog post. The document will have at least the following fields:

这样做的一种方法是为每个博客文章分配一个Lucene文档。该文件至少包含以下字段:

  • id: unique id of the blog post.
  • id:博客文章的唯一ID。

  • content: the text of the blog post.
  • 内容:博客文章的文字。

  • tags: list of tags.
  • 标签:标签列表。

Indexing: Whenever you add a tag to a post, remove a tag or edit it, you will need to index the post. The Analyzer will transform the fields into their token representation.

索引:每当您向帖子添加标签,删除标签或对其进行编辑时,您都需要为帖子编制索引。分析器将字段转换为其令牌表示。

Document doc = new Document();
doc.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NO));
doc.Add(new Field("content", text, Field.Store.YES, Field.Index.TOKENIZED));
doc.Add(new Field("tags", tags, Field.Store.YES, Field.Index.TOKENIZED));
writer.AddDocument(doc);

The remaining part is retrieval. For this, you need to create a QueryParser and pass it a query string, like this:

剩下的部分是检索。为此,您需要创建一个QueryParser并将其传递给查询字符串,如下所示:

QueryParser qp = new QueryParser();
Query q = qp.Parse(s);
Hits = Searcher.Search(q);

The syntax you need for s will be:

您需要的语法是:

tags: apples tags: carrots

To search for apples or carrots

寻找苹果或胡萝卜

tags: carrots NOT tags: apples

See the Lucene Query Parser Syntax for details on constructing s.

有关构造s的详细信息,请参阅Lucene Query Parser语法。