SOLR自定义存储和索引元数据的原子更新清除全文索引

时间:2021-01-01 03:11:17

I use bin/post to index all my files in /documents (mounted volume). It works and full-text search works fine.

我使用bin / post来索引/ documents(挂载卷)中的所有文件。它的工作原理和全文搜索工作正常。

I do an atomic update for specific metadata that I added to the schema BEFORE posting all docs, it works too.

我对发布所有文档之前添加到模式的特定元数据进行了原子更新,它也可以。

I do a full-text search to find back the document for which the metadata has been updated, it DOESN'T work anymore, the updates are there but it seems that the full-text index has disappeared.

我进行全文搜索以找回已更新元数据的文档,它不再起作用,更新已存在,但似乎全文索引已消失。

I do a full re-index and then it overrides my added metadata for the doc, resetting it to the default value. Although the metadata field I added is both stored and indexed.

我执行完整的重新索引,然后它覆盖我添加的文档元数据,将其重置为默认值。虽然我添加的元数据字段都存储和索引。

Not sure what to do. That means that each reindexing will reset my added metadata...not great

不知道该怎么办。这意味着每次重建索引都会重置我添加的元数据......不是很好

1 个解决方案

#1


0  

The update - under the hood - reconstructs the document from stored fields, applies changes and puts them back to disk. On Lucene level, there is no "document update", it is a higher level concept. That's how the search indexes stay fast in this architecture.

更新 - 在引擎盖下 - 从存储的字段重建文档,应用更改并将它们放回磁盘。在Lucene级别上,没有“文档更新”,它是一个更高级别的概念。这就是搜索索引在这种架构中保持快速的方式。

So, your full-text field which is not stored, does not show up in the reconstructed document and does not get stored again in the "updated document".

因此,未存储的全文字段不会显示在重建文档中,也不会再次存储在“更新文档”中。

If you have such a mix of stored and non-stored fields, you have to merge your updates outside of Solr from the original full-content.

如果您混合使用存储和非存储字段,则必须将Solr之外的更新与原始完整内容合并。

Alternatively, depending on your use case, if you are just returning those update values, you could inject them with custom SearchComponent, use ExternalFileField or similar. The user mailing list could be a good place to ask for various options possible.

或者,根据您的使用情况,如果您只是返回这些更新值,您可以使用自定义SearchComponent注入它们,使用ExternalFileField或类似的。用户邮件列表可以是询问各种可能选项的好地方。

#1


0  

The update - under the hood - reconstructs the document from stored fields, applies changes and puts them back to disk. On Lucene level, there is no "document update", it is a higher level concept. That's how the search indexes stay fast in this architecture.

更新 - 在引擎盖下 - 从存储的字段重建文档,应用更改并将它们放回磁盘。在Lucene级别上,没有“文档更新”,它是一个更高级别的概念。这就是搜索索引在这种架构中保持快速的方式。

So, your full-text field which is not stored, does not show up in the reconstructed document and does not get stored again in the "updated document".

因此,未存储的全文字段不会显示在重建文档中,也不会再次存储在“更新文档”中。

If you have such a mix of stored and non-stored fields, you have to merge your updates outside of Solr from the original full-content.

如果您混合使用存储和非存储字段,则必须将Solr之外的更新与原始完整内容合并。

Alternatively, depending on your use case, if you are just returning those update values, you could inject them with custom SearchComponent, use ExternalFileField or similar. The user mailing list could be a good place to ask for various options possible.

或者,根据您的使用情况,如果您只是返回这些更新值,您可以使用自定义SearchComponent注入它们,使用ExternalFileField或类似的。用户邮件列表可以是询问各种可能选项的好地方。