I have an index named LocationIndex
in solr with fields as follows:
我在solr中有一个名为LocationIndex的索引,字段如下:
<fields>
<field name="solr_id" type="string" stored="true" required="true" indexed="true"/>
<field name="solr_ver" type="string" stored="true" required="true" indexed="true" default="0000"/>
// and some more fields
</fields>
<uniqueKey>solr_id</uniqueKey>
But now I want to change schema so that unique key must be composite of two already present fields solr_id
and solr_ver
... something as follows:
但是现在我想要更改模式,以便惟一键必须是两个已经出现的字段solr_id和solr_ver的组合……事情如下:
<fields>
<field name="solr_id" type="string" stored="true" required="true" indexed="true"/>
<field name="solr_ver" type="string" stored="true" required="true" indexed="true" default="0000"/>
<field name="composite-id" type="string" stored="true" required="true" indexed="true"/>
// and some more fields
</fields>
<uniqueKey>solr_ver-solr_id</uniqueKey>
After searching I found that it's possible by adding following to schema: (ref: Solr Composite Unique key from existing fields in schema)
搜索之后,我发现可以通过向模式添加以下内容:(引用:Solr从模式中的现有字段中组合唯一键)
<updateRequestProcessorChain name="composite-id">
<processor class="solr.CloneFieldUpdateProcessorFactory">
<str name="source">docid_s</str>
<str name="source">userid_s</str>
<str name="dest">id</str>
</processor>
<processor class="solr.ConcatFieldUpdateProcessorFactory">
<str name="fieldName">id</str>
<str name="delimiter">--</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
So I changed schema and finally it looks like:
所以我改变了模式,最后是这样的:
<updateRequestProcessorChain name="composite-id">
<processor class="solr.CloneFieldUpdateProcessorFactory">
<str name="source">solr_ver</str>
<str name="source">solr_id</str>
<str name="dest">id</str>
</processor>
<processor class="solr.ConcatFieldUpdateProcessorFactory">
<str name="fieldName">id</str>
<str name="delimiter">-</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<fields>
<field name="solr_id" type="string" stored="true" required="true" indexed="true"/>
<field name="solr_ver" type="string" stored="true" required="true" indexed="true" default="0000"/>
<field name="id" type="string" stored="true" required="true" indexed="true"/>
// and some more fields
</fields>
<uniqueKey>id</uniqueKey>
But while adding a document it's giving me error:
但在添加文档时,它会给我带来错误:
org.apache.solr.client.solrj.SolrServerException: Server at http://localhost:8983/solr/LocationIndex returned non ok status:400, message:Document [null] missing required field: id
I'm not getting what changes in schema are required to work as desired?
我没有得到模式中需要哪些更改才能正常工作?
In a document I add, it contain fields solr_ver
and solr_id
. How and where it'll (solr) create id
field by combining both these field something like solr_ver-solr_id
?
在我添加的文档中,它包含字段solr_ver和solr_id。它是如何(solr)通过合并这两个字段(比如solr_ver-solr_id)来创建id字段的?
EDIT:
编辑:
At this link It's given how refer to this chain. Bu I'm unable to understand how would it be used in schema? And where should I make changes?
在这个链接中,它给出了如何引用这个链。但是我不明白如何在图式中使用它?我应该在哪里做出改变?
3 个解决方案
#1
10
So it looks like you have your updateRequestProcessorChain defined appropriately and it should work. However, you need to add this to the solrconfig.xml file and not the schema.xml. The additional link you provided shows you how to modify your solrconfig.xml file and add your defined updateRequestProcessorChain to the current /update
request handler for your solr instance.
因此,看起来您已经适当地定义了updateRequestProcessorChain,它应该可以工作。但是,您需要将它添加到solrconfig中。xml文件,而不是schema.xml。您提供的附加链接显示了如何修改solrconfig。将定义好的updateRequestProcessorChain添加到solr实例的当前/更新请求处理程序中。
So find do the following:
因此,找到以下方法:
- Move your
<updateRequestProcessorChain>
to your solrconfig.xml file. -
将
移动到solrconfig中。xml文件。 -
Update the
<requestHandler name="/update" class="solr.UpdateRequestHandler">
entry in your solrconfig.xml file and modify it so it looks like the following:更新
条目”。xml文件并对其进行修改,如下所示: <requestHandler name="/update" class="solr.UpdateRequestHandler"> <lst name="defaults"> <str name="update.chain">composite-id</str> </lst> </requestHandler>
This should then execute your defined update chain and populate the id field when new documents are added to the index.
然后,应该执行定义的更新链,并在将新文档添加到索引时填充id字段。
#2
4
The described above solution may have some limitations, what if "dest" is over maximum length because concatenated fields are too long. There is also one more solution with MD5Signature (A class capable of generating a signature String from the concatenation of a group of specified document fields, 128 bit hash used for exact duplicate detection)
上面描述的解决方案可能有一些限制,如果“dest”超过了最大长度,因为连接字段太长。还有一个使用MD5Signature的解决方案(一个类可以从一组指定文档字段的串联中生成签名字符串,128位散列用于精确重复检测)
<!-- An example dedup update processor that creates the "id" field on the fly
based on the hash code of some other fields. This example has
overwriteDupes set to false since we are using the id field as the
signatureField and Solr will maintain uniqueness based on that anyway. -->
<updateRequestProcessorChain name="dedupe">
<processor class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
<bool name="enabled">true</bool>
<bool name="overwriteDupes">false</bool>
<str name="signatureField">id</str>
<str name="fields">name,features,cat</str>
<str name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
From here: http://lucene.472066.n3.nabble.com/Solr-duplicates-detection-td506230.html
在这里:http://lucene.472066.n3.nabble.com/Solr-duplicates-detection-td506230.html
#3
2
I'd like to add this as a comment, but it's impossible to get the creds these days... anyway, here is a better link: https://wiki.apache.org/solr/Deduplication
我想加上这句话作为一个评论,但这几天不可能拿到creds……总之,这里有一个更好的链接:https://wiki.apache.org/solr/deduplicate
#1
10
So it looks like you have your updateRequestProcessorChain defined appropriately and it should work. However, you need to add this to the solrconfig.xml file and not the schema.xml. The additional link you provided shows you how to modify your solrconfig.xml file and add your defined updateRequestProcessorChain to the current /update
request handler for your solr instance.
因此,看起来您已经适当地定义了updateRequestProcessorChain,它应该可以工作。但是,您需要将它添加到solrconfig中。xml文件,而不是schema.xml。您提供的附加链接显示了如何修改solrconfig。将定义好的updateRequestProcessorChain添加到solr实例的当前/更新请求处理程序中。
So find do the following:
因此,找到以下方法:
- Move your
<updateRequestProcessorChain>
to your solrconfig.xml file. -
将
移动到solrconfig中。xml文件。 -
Update the
<requestHandler name="/update" class="solr.UpdateRequestHandler">
entry in your solrconfig.xml file and modify it so it looks like the following:更新
条目”。xml文件并对其进行修改,如下所示: <requestHandler name="/update" class="solr.UpdateRequestHandler"> <lst name="defaults"> <str name="update.chain">composite-id</str> </lst> </requestHandler>
This should then execute your defined update chain and populate the id field when new documents are added to the index.
然后,应该执行定义的更新链,并在将新文档添加到索引时填充id字段。
#2
4
The described above solution may have some limitations, what if "dest" is over maximum length because concatenated fields are too long. There is also one more solution with MD5Signature (A class capable of generating a signature String from the concatenation of a group of specified document fields, 128 bit hash used for exact duplicate detection)
上面描述的解决方案可能有一些限制,如果“dest”超过了最大长度,因为连接字段太长。还有一个使用MD5Signature的解决方案(一个类可以从一组指定文档字段的串联中生成签名字符串,128位散列用于精确重复检测)
<!-- An example dedup update processor that creates the "id" field on the fly
based on the hash code of some other fields. This example has
overwriteDupes set to false since we are using the id field as the
signatureField and Solr will maintain uniqueness based on that anyway. -->
<updateRequestProcessorChain name="dedupe">
<processor class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
<bool name="enabled">true</bool>
<bool name="overwriteDupes">false</bool>
<str name="signatureField">id</str>
<str name="fields">name,features,cat</str>
<str name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
From here: http://lucene.472066.n3.nabble.com/Solr-duplicates-detection-td506230.html
在这里:http://lucene.472066.n3.nabble.com/Solr-duplicates-detection-td506230.html
#3
2
I'd like to add this as a comment, but it's impossible to get the creds these days... anyway, here is a better link: https://wiki.apache.org/solr/Deduplication
我想加上这句话作为一个评论,但这几天不可能拿到creds……总之,这里有一个更好的链接:https://wiki.apache.org/solr/deduplicate