
时间:2021-01-12 03:17:46

My indexed documents have a field containing a pipe-delimited set of ids:



(ignore line breaks)


This field represents a list of tags. The list may contain 0 to n tag Ids.


When users of my site view a particular document, I want to display a list of related documents. This list of related document must be determined by tags:


  • Only documents with at least one matching tag should appear in the "related documents" list.
  • 只有具有至少一个匹配标签的文档才会出现在“相关文档”列表中。

  • Document with the most matching tags should appear at the top of the "related documents" list.
  • 具有最匹配标签的文档应显示在“相关文档”列表的顶部。

I was thinking of using a WildcardQuery for this but queries starting with '*' are not allowed.


Any suggestions?

4 个解决方案


Setting aside for a minute the possible uses of Lucene for this task (which I am not overly familiar with) - consider checking out the LinkDatabase.

暂时搁置Lucene用于此任务的可能用途(我不太熟悉) - 考虑检查LinkDatabase。

Sitecore will, behind the scenes, track all your references to and from items. And since your multiple tags are indeed (I assume) selected from a meta hierarchy of tags represented as Sitecore Items somewhere - the LinkDatabase would be able to tell you all items referencing it.

Sitecore将在幕后跟踪您对项目的所有引用。并且由于您的多个标签确实(我假设)从某个地方表示为Sitecore Items的标签的元层次结构中选择 - LinkDatabase将能够告诉您引用它的所有项目。

In some sort of pseudo code mockup, this would then become


for each ID in tags
  get all documents referencing this tag
  for each document found
    if master-list contains document; increase usage-count
    else; add document to master list
sort master-list by usage-count descending

Forgive me that I am not more precise, but am unavailable to mock up a fully working example right at this stage.


You can find an article about the LinkDatabase here http://larsnielsen.blogspirit.com/tag/XSLT. Be aware that if you're tagging documents using a TreeListEx field, there is a known flaw in earlier versions of Sitecore. Documented here: http://www.cassidy.dk/blog/sitecore/2008/12/treelistex-not-registering-links-in.html



Your pipe-delimited set of ids should really have been separated into individual fields when the documents were indexed. This way, you could simply do a query for the desired tag, sorting by relevance descending.



You can have the same field multiple times in a document. In this case, you would add multiple "tag" fields at index time by splitting on |. Then, when you search, you just have to search on the "tag" field.



Try this query on the tag field.


+(tag1 OR tag2 OR ... tagN) 

where tag1, .. tagN are the tags of a document.

其中tag1,.. tagN是文档的标记。

This query will return documents with at least one tag match. The scoring automatically will take care to bring up the documents with highest number of matches as the final score is sum of individual scores.


Also, you need to realizes that if you want to find documents similar to tags of Doc1, you will find Doc1 coming at the top of the search results. So, handle this case accordingly.



Setting aside for a minute the possible uses of Lucene for this task (which I am not overly familiar with) - consider checking out the LinkDatabase.

暂时搁置Lucene用于此任务的可能用途(我不太熟悉) - 考虑检查LinkDatabase。

Sitecore will, behind the scenes, track all your references to and from items. And since your multiple tags are indeed (I assume) selected from a meta hierarchy of tags represented as Sitecore Items somewhere - the LinkDatabase would be able to tell you all items referencing it.

Sitecore将在幕后跟踪您对项目的所有引用。并且由于您的多个标签确实(我假设)从某个地方表示为Sitecore Items的标签的元层次结构中选择 - LinkDatabase将能够告诉您引用它的所有项目。

In some sort of pseudo code mockup, this would then become


for each ID in tags
  get all documents referencing this tag
  for each document found
    if master-list contains document; increase usage-count
    else; add document to master list
sort master-list by usage-count descending

Forgive me that I am not more precise, but am unavailable to mock up a fully working example right at this stage.


You can find an article about the LinkDatabase here http://larsnielsen.blogspirit.com/tag/XSLT. Be aware that if you're tagging documents using a TreeListEx field, there is a known flaw in earlier versions of Sitecore. Documented here: http://www.cassidy.dk/blog/sitecore/2008/12/treelistex-not-registering-links-in.html



Your pipe-delimited set of ids should really have been separated into individual fields when the documents were indexed. This way, you could simply do a query for the desired tag, sorting by relevance descending.



You can have the same field multiple times in a document. In this case, you would add multiple "tag" fields at index time by splitting on |. Then, when you search, you just have to search on the "tag" field.



Try this query on the tag field.


+(tag1 OR tag2 OR ... tagN) 

where tag1, .. tagN are the tags of a document.

其中tag1,.. tagN是文档的标记。

This query will return documents with at least one tag match. The scoring automatically will take care to bring up the documents with highest number of matches as the final score is sum of individual scores.


Also, you need to realizes that if you want to find documents similar to tags of Doc1, you will find Doc1 coming at the top of the search results. So, handle this case accordingly.
