如何在Lucene.NET中按标签查找相关项目

时间:2021-01-12 03:17:46

My indexed documents have a field containing a pipe-delimited set of ids:

我的索引文档有一个字段,其中包含一组以管道分隔的ID:

a845497737704e8ab439dd410e7f1328|
0a2d7192f75148cca89b6df58fcf2e54|
204fce58c936434598f7bd7eccf11771

(ignore line breaks)

(忽略换行符)

This field represents a list of tags. The list may contain 0 to n tag Ids.

该字段表示标签列表。该列表可以包含0到n个标签ID。

When users of my site view a particular document, I want to display a list of related documents. This list of related document must be determined by tags:

当我的网站的用户查看特定文档时,我想显示相关文档的列表。此相关文档列表必须由标记确定:

  • Only documents with at least one matching tag should appear in the "related documents" list.
  • 只有具有至少一个匹配标签的文档才会出现在“相关文档”列表中。

  • Document with the most matching tags should appear at the top of the "related documents" list.
  • 具有最匹配标签的文档应显示在“相关文档”列表的顶部。


I was thinking of using a WildcardQuery for this but queries starting with '*' are not allowed.

我考虑使用WildcardQuery,但不允许以'*'开头的查询。


Any suggestions?

4 个解决方案

#1


Setting aside for a minute the possible uses of Lucene for this task (which I am not overly familiar with) - consider checking out the LinkDatabase.

暂时搁置Lucene用于此任务的可能用途(我不太熟悉) - 考虑检查LinkDatabase。

Sitecore will, behind the scenes, track all your references to and from items. And since your multiple tags are indeed (I assume) selected from a meta hierarchy of tags represented as Sitecore Items somewhere - the LinkDatabase would be able to tell you all items referencing it.

Sitecore将在幕后跟踪您对项目的所有引用。并且由于您的多个标签确实(我假设)从某个地方表示为Sitecore Items的标签的元层次结构中选择 - LinkDatabase将能够告诉您引用它的所有项目。

In some sort of pseudo code mockup, this would then become

在某种伪代码模型中,这将成为

for each ID in tags
  get all documents referencing this tag
  for each document found
    if master-list contains document; increase usage-count
    else; add document to master list
sort master-list by usage-count descending

Forgive me that I am not more precise, but am unavailable to mock up a fully working example right at this stage.

请原谅我,我不是更精确,但在现阶段无法模拟一个完整的例子。

You can find an article about the LinkDatabase here http://larsnielsen.blogspirit.com/tag/XSLT. Be aware that if you're tagging documents using a TreeListEx field, there is a known flaw in earlier versions of Sitecore. Documented here: http://www.cassidy.dk/blog/sitecore/2008/12/treelistex-not-registering-links-in.html

您可以在http://larsnielsen.blogspirit.com/tag/XSLT找到有关LinkDatabase的文章。请注意,如果您使用TreeListEx字段标记文档,则早期版本的Sitecore中存在已知缺陷。记录在这里:http://www.cassidy.dk/blog/sitecore/2008/12/treelistex-not-registering-links-in.html

#2


Your pipe-delimited set of ids should really have been separated into individual fields when the documents were indexed. This way, you could simply do a query for the desired tag, sorting by relevance descending.

在索引文档时,您的以管道分隔的一组ID实际上应该被分成单独的字段。这样,您可以简单地对所需标记进行查询,按相关性降序排序。

#3


You can have the same field multiple times in a document. In this case, you would add multiple "tag" fields at index time by splitting on |. Then, when you search, you just have to search on the "tag" field.

您可以在文档中多次使用相同的字段。在这种情况下,您可以通过拆分|来在索引时添加多个“tag”字段。然后,当您搜索时,您只需要搜索“标记”字段。

#4


Try this query on the tag field.

在标记字段上尝试此查询。

+(tag1 OR tag2 OR ... tagN) 

where tag1, .. tagN are the tags of a document.

其中tag1,.. tagN是文档的标记。

This query will return documents with at least one tag match. The scoring automatically will take care to bring up the documents with highest number of matches as the final score is sum of individual scores.

此查询将返回至少一个标记匹配的文档。评分会自动显示最高分数的文件,因为最终分数是个别分数的总和。

Also, you need to realizes that if you want to find documents similar to tags of Doc1, you will find Doc1 coming at the top of the search results. So, handle this case accordingly.

此外,您需要意识到,如果您要查找类似于Doc1标签的文档,您会发现Doc1位于搜索结果的顶部。所以,相应地处理这个案子。

#1


Setting aside for a minute the possible uses of Lucene for this task (which I am not overly familiar with) - consider checking out the LinkDatabase.

暂时搁置Lucene用于此任务的可能用途(我不太熟悉) - 考虑检查LinkDatabase。

Sitecore will, behind the scenes, track all your references to and from items. And since your multiple tags are indeed (I assume) selected from a meta hierarchy of tags represented as Sitecore Items somewhere - the LinkDatabase would be able to tell you all items referencing it.

Sitecore将在幕后跟踪您对项目的所有引用。并且由于您的多个标签确实(我假设)从某个地方表示为Sitecore Items的标签的元层次结构中选择 - LinkDatabase将能够告诉您引用它的所有项目。

In some sort of pseudo code mockup, this would then become

在某种伪代码模型中,这将成为

for each ID in tags
  get all documents referencing this tag
  for each document found
    if master-list contains document; increase usage-count
    else; add document to master list
sort master-list by usage-count descending

Forgive me that I am not more precise, but am unavailable to mock up a fully working example right at this stage.

请原谅我,我不是更精确,但在现阶段无法模拟一个完整的例子。

You can find an article about the LinkDatabase here http://larsnielsen.blogspirit.com/tag/XSLT. Be aware that if you're tagging documents using a TreeListEx field, there is a known flaw in earlier versions of Sitecore. Documented here: http://www.cassidy.dk/blog/sitecore/2008/12/treelistex-not-registering-links-in.html

您可以在http://larsnielsen.blogspirit.com/tag/XSLT找到有关LinkDatabase的文章。请注意,如果您使用TreeListEx字段标记文档,则早期版本的Sitecore中存在已知缺陷。记录在这里:http://www.cassidy.dk/blog/sitecore/2008/12/treelistex-not-registering-links-in.html

#2


Your pipe-delimited set of ids should really have been separated into individual fields when the documents were indexed. This way, you could simply do a query for the desired tag, sorting by relevance descending.

在索引文档时,您的以管道分隔的一组ID实际上应该被分成单独的字段。这样,您可以简单地对所需标记进行查询,按相关性降序排序。

#3


You can have the same field multiple times in a document. In this case, you would add multiple "tag" fields at index time by splitting on |. Then, when you search, you just have to search on the "tag" field.

您可以在文档中多次使用相同的字段。在这种情况下,您可以通过拆分|来在索引时添加多个“tag”字段。然后,当您搜索时,您只需要搜索“标记”字段。

#4


Try this query on the tag field.

在标记字段上尝试此查询。

+(tag1 OR tag2 OR ... tagN) 

where tag1, .. tagN are the tags of a document.

其中tag1,.. tagN是文档的标记。

This query will return documents with at least one tag match. The scoring automatically will take care to bring up the documents with highest number of matches as the final score is sum of individual scores.

此查询将返回至少一个标记匹配的文档。评分会自动显示最高分数的文件,因为最终分数是个别分数的总和。

Also, you need to realizes that if you want to find documents similar to tags of Doc1, you will find Doc1 coming at the top of the search results. So, handle this case accordingly.

此外,您需要意识到,如果您要查找类似于Doc1标签的文档,您会发现Doc1位于搜索结果的顶部。所以,相应地处理这个案子。