假如我们在特殊的应用场景中,需要忽略tf、df所产生的影响,可以如下实现:
1、实现自己的相似度计算方式:
public class MySimilarity extends DefaultSimilarity {2、在创建索引时IndexWriterConfig中指定相似度计算方式如下:
@Override
public float tf(float freq) {
return 1.0f;
}
/** Implemented as <code>log(numDocs/(docFreq+1)) + 1</code>. */
@Override
public float idf(long docFreq, long numDocs) {
return 1.0f;
}
}
Analyzer analyzer = new MyAnalyzer(0);3、在搜索时指定相似度计算方式:
MySimilarity sim = new MySimilarity();
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_48, analyzer);
iwc.setOpenMode(OpenMode.CREATE);
iwc.setSimilarity(sim);
IndexWriter writer = new IndexWriter(indexDir, iwc);
MySimilarity sim = new MySimilarity();
IndexSearcher searcher = new IndexSearcher(reader);
searcher.setSimilarity(sim);