Lucene 4.7 --创建索引

Lucene的最新版本和以前的语法或者类名，类规定都相差甚远

0.准备工作：

1). Lucene官方API http://lucene.apache.org/core/4_7_0/index.html

2). 我用到的常用JAR包下载：http://download.csdn.net/detail/yangxy81118/8062269

3). 所用到的jar包

lucene-analyzers-common-4.7.0.jar
lucene-analyzers-smartcn-4.7.0.jar
lucene-core-4.7.0.jar
lucene-queryparser-4.7.0.jar

本次先介绍关于创建索引

1. 总步骤

//a.从数据源准备索引数据
List<ResultVOFromDB> resultList = getKeyWords();
//b.创建IndexWriter
indexWriter = getIndexWriter();
//c.根据索引数据创建索引
addDoc(indexWriter, resultList);

a. 从数据源准备索引数据

这个就不用多说了，我这里通用了一个ResultVOFromDB这么一个数据来表示一个从数据源获取来的VO对象，比如你从数据库select来的数据，那肯定就是一个VO的List了

b. 创建IndexWriter

这里变化比较大，很多之前参考的语法都无法使用，或者被废弃，或者不推荐了。

    private IndexWriter getIndexWriter() throws IOException {
        Directory dir = FSDirectory.open(new File(indexBuild));
        //Version操作开始变得非常常见
        //中文分词器的引入，好像4.7.0对庖丁等第三方分词器兼容得并不好，可能也是因为apache对原生的做了一些整合的缘故
        Analyzer analyzer = new SmartChineseAnalyzer(Version.LUCENE_47); 
        //同时引入了IndexWriterConfig对象，封装了早期版本的一大堆参数
        IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_47, analyzer);
        IndexWriter writer = new IndexWriter(dir, config);
        return writer;
    }

c. 根据索引数据创建索引

　　4.7.0 对Field做了更进一步的封装，官方DEMO推荐已经不再是原来的new Field然后传参了，（虽然语法还存在，但是有些已经被废弃了）具体可以参考 org.apache.lucene.document包下的相关Field的类

举例，如StringField： http://lucene.apache.org/core/4_7_0/core/org/apache/lucene/document/StringField.html

A field that is indexed but not tokenized: the entire String value is indexed as a single token. For example this might be used for a 'country'field or an 'id' field, or any field that you intend to use for sorting or access through the field cache.

这段英文应该不难吧，StringField就是可以被参与索引的(is indexed)，但是并不做分词操作(not tokenized),适合做id或者国家名这种“要么不对，要么整个都对”

然后我下面还用到了StoredField（只存不索引），TextField（索引并分词）

private void addDoc(IndexWriter indexWriter, List<ResultVOFromDB> resultList) throws IOException {
        for (ResultVOFromDB vo : resultList) {
            Document doc = createDoc(vo);
            indexWriter.addDocument(doc);
        }
    }

private Document createDoc(ResultVOFromDB vo) throws UnsupportedEncodingException {
        Document doc = new Document();
//就像有某个商品，查询结果列表要展示商品的名称，ID，和跳转链接地址，所以从数据库取出name,id,url字段
        doc.add(new StringField("name", vo.name, Field.Store.YES));
        doc.add(new StringField("id", vo.id, Field.Store.YES));
        doc.add(new StoredField("url", vo.url));
        
//这个keywords就像博客文章的自定义“关键字”，这些字有多个，而且都会做用到索引并且接受分词操作的，“css学习”会被拆分为“css”和“学习”
        String[] keys = vo.keywords;
        for (int i = 0; i < keys.length; i++) {
            doc.add(new TextField("keyword", keys[i],Field.Store.YES));
        }
         
        return doc;
    }

秒客网

Lucene 4.7 --创建索引

相关文章