I don't know what changed--things were working relatively well with our Lucene implementation. But now, the number of files in the index directory just keeps growing. It started with _0 files, then _1 files appeared, then _2 and _3 files. I am passing in false to the IndexWriter's constructor for the 'create' parameter, if there are existing files in that directory when it begins:
我不知道发生了什么变化——在我们的Lucene实现中,事情进行得相对顺利。但是现在,索引目录中的文件数量一直在增长。首先是_0文件,然后是_1文件,然后是_2和_3文件。我将false传递给IndexWriter的“创建”参数的构造函数,如果该目录在开始时存在文件:
indexWriter = new IndexWriter(azureDirectory, analyzer, (azureDirectory.ListAll().Length == 0), IndexWriter.MaxFieldLength.UNLIMITED);
if (indexWriter != null)
{
// Set the number of segments to save in memory before writing to disk.
indexWriter.MergeFactor = 1000;
indexWriter.UseCompoundFile = false;
indexWriter.SetRAMBufferSizeMB(800);
...
indexWriter.Dispose(); indexWriter = null;
}
Maybe it's realated to the UseCompoundFile flag?
也许它被重新分配到UseCompoundFile标志?
Every couple of minutes, I create a new IndexWriter, process 10,000 documents, then dispose the IndexWriter. The index works, but the growing number of files is very bad, because I'm using AzureDirectory which copies every file out of Azure into a cache directory before starting the Lucene write.
每隔几分钟,我就创建一个新的IndexWriter,处理10,000个文档,然后处理IndexWriter。索引可以工作,但是文件数量的增加非常糟糕,因为我正在使用AzureDirectory,它在开始Lucene写之前将Azure之外的每个文件复制到缓存目录中。
Thanks.
谢谢。
1 个解决方案
#1
2
This is the normal behavior. If you want a single index segment you have some options:
这是正常的行为。如果你想要一个单独的索引段,你有一些选择:
- Use compound files
- 使用复合文件
- Use a MergeFactor of 1 if you use
LogMergePolicy
, which is the default policy for lucene 3.0. Note that the method you use on theIndexWriter
is just a convenience method that calls mergePolicy.MergeFactor as long as mergePolicy is an instance ofLogMergePolicy
. - 如果使用LogMergePolicy,则使用MergeFactor 1,这是lucene 3.0的默认策略。注意,在IndexWriter上使用的方法只是一个调用mergePolicy的方便方法。只要MergeFactor是LogMergePolicy的一个实例。
- Run an optimization after each updates to your index
- 在每次更新索引之后运行优化
Low merge factors and optimizations after each updates can have serious drawbacks on the performance of your app which will depend on the type of indexing you do.
每次更新后的低合并因素和优化都会对应用程序的性能产生严重的不利影响,这将取决于您的索引类型。
See this link which documents a little bit the effects of MergeFactor
: http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/core/org/apache/lucene/index/LogMergePolicy.html#setMergeFactor%28%29
请看这个链接,它记录了一些MergeFactor的影响:http://lucene.apache.org/core/old_versioned_s/versions/3_0_1 / api/core/org/apache/index/logmergepolicy.html #setMergeFactor%28%29
#1
2
This is the normal behavior. If you want a single index segment you have some options:
这是正常的行为。如果你想要一个单独的索引段,你有一些选择:
- Use compound files
- 使用复合文件
- Use a MergeFactor of 1 if you use
LogMergePolicy
, which is the default policy for lucene 3.0. Note that the method you use on theIndexWriter
is just a convenience method that calls mergePolicy.MergeFactor as long as mergePolicy is an instance ofLogMergePolicy
. - 如果使用LogMergePolicy,则使用MergeFactor 1,这是lucene 3.0的默认策略。注意,在IndexWriter上使用的方法只是一个调用mergePolicy的方便方法。只要MergeFactor是LogMergePolicy的一个实例。
- Run an optimization after each updates to your index
- 在每次更新索引之后运行优化
Low merge factors and optimizations after each updates can have serious drawbacks on the performance of your app which will depend on the type of indexing you do.
每次更新后的低合并因素和优化都会对应用程序的性能产生严重的不利影响,这将取决于您的索引类型。
See this link which documents a little bit the effects of MergeFactor
: http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/core/org/apache/lucene/index/LogMergePolicy.html#setMergeFactor%28%29
请看这个链接,它记录了一些MergeFactor的影响:http://lucene.apache.org/core/old_versioned_s/versions/3_0_1 / api/core/org/apache/index/logmergepolicy.html #setMergeFactor%28%29