I want to index xml files of Wikipedia into Solr.
我想将*的xml文件索引到Solr中。
But I am getting an error, it is unable to index. Solr has a specific format for xml files. I changed the schema.xml
and data-config.xml
files to suit the tags of the wikipedia files.
但我收到一个错误,它无法索引。 Solr具有xml文件的特定格式。我更改了schema.xml和data-config.xml文件以适应*文件的标记。
Still it is unable to index the files. My actual intention is to index wikipedia which is an xml file of 30 GB.
仍然无法索引文件。我的目的是索引*,这是一个30 GB的xml文件。
How would I go about indexing all wikipedia files into Solr?
我如何将所有*文件索引到Solr?
1 个解决方案
#1
1
DataImportHandler文档中有一个示例部分就是这样:索引*。
Basically, you use the DataImportHandler
and some XPath to pull the metadata you care about out of the Wikipedia XML, and put it in flat Solr field listings.
基本上,您使用DataImportHandler和一些XPath从Wikipedia XML中提取您关心的元数据,并将其放在平面Solr字段列表中。
#1
1
DataImportHandler文档中有一个示例部分就是这样:索引*。
Basically, you use the DataImportHandler
and some XPath to pull the metadata you care about out of the Wikipedia XML, and put it in flat Solr field listings.
基本上,您使用DataImportHandler和一些XPath从Wikipedia XML中提取您关心的元数据,并将其放在平面Solr字段列表中。