Lucene.NET并行索引。我需要一个自定义解决方案有人可以帮忙吗？

I'm using Lucene .NET

我正在使用Lucene .NET

I've got 2 threads, each one doing indexing of some different content (using a different algorithm, although they might try to index the same document). They are both writing to the same index (using a single IndexWriter instance).

我有2个线程,每个线程都对一些不同的内容进行索引(使用不同的算法,尽管他们可能会尝试索引同一个文档)。它们都写入相同的索引(使用单个IndexWriter实例)。

Also, I've got a web application that also needs to write to the index occasionally. (it obviously cannot use that same indexwriter instance)

此外,我有一个Web应用程序,偶尔也需要写入索引。 (显然不能使用相同的indexwriter实例)

My problem is , that the web application cannot write to the index while the 2 threads are running their indexing operation, and they always are!!

我的问题是,当2个线程正在运行索引操作时,Web应用程序无法写入索引,并且它们始终是!!

How do I manage this more efficiently?

如何更有效地管理这个?

Thanks

2 个解决方案

#1

I'm not very familiar with how Lucene.NET supports threading, but based on your description, you may want to create a "work queue" that other threads post work to - and use a single thread to pick up the work from the queue and use an IndexWriter to add it to the index. This way no single thread is ever starved from the opportunity to get its changes added to the index.

我不太熟悉Lucene.NET如何支持线程,但根据你的描述,你可能想要创建一个其他线程后期工作的“工作队列” - 并使用一个线程从队列中获取工作并使用IndexWriter将其添加到索引。通过这种方式,没有任何一个线程能够将其更改添加到索引中。

I suspect that Lucene has to use internal locks on its full text indexes anyways, so having more than one thread writing to the index is probably not an effective way to scale your code.

我怀疑Lucene必须在其全文索引上使用内部锁,因此有多个线程写入索引可能不是扩展代码的有效方法。

Finally, having multiple threads writing to a single mutable object is often a way to introduce subtle and difficult to fix concurrency problems into a codebase. I generally try to avoid having multiple writer - multiple readers, on the other hand can be quite useful.

最后,让多个线程写入单个可变对象通常是一种将细微且难以修复的并发问题引入代码库的方法。我一般尽量避免有多个作家 - 多个读者,另一方面可能非常有用。

#2

If you don't want to use LBushkin's idea of a work queue, the other approach is to use the same IndexWriter instance in the web application as the background threads are using. You haven't explained where the 2 indexing threads are - if they are in the same process/appdomain as the web application, it should be feasible to use the same instance. If not, then you have to use the equivalent of the work queue as mentioned by LBushkin, or an adapted version of it as follows: Add a third thread to the indexing process whose job is to listen to indexing requests from the web application. You can use e.g. Named Pipes for this (especially easy if you're using .NET 3.5). The web application sends indexing requests to the third thread, which uses the same IndexWriter as the other existing threads to update the index.

如果您不想使用LBushkin的工作队列概念,则另一种方法是在后台线程使用的Web应用程序中使用相同的IndexWriter实例。您还没有解释2个索引线程的位置 - 如果它们与Web应用程序位于相同的进程/ appdomain中,则使用相同的实例应该是可行的。如果没有,那么你必须使用LBushkin提到的工作队列的等价物,或者它的改编版本,如下所示:向索引过程添加第三个线程,其工作是监听来自Web应用程序的索引请求。你可以用例如为此命名管道(如果您使用的是.NET 3.5,则特别容易)。 Web应用程序将索引请求发送到第三个线程,该第三个线程使用与其他现有线程相同的IndexWriter来更新索引。

This is essentially the same idea as LBushkin's (the 3rd thread is a work queue consumer) but may involve less development work as you could be doing less additional coding.

这与LBushkin(第三个线程是工作队列消费者)的想法基本相同,但可能涉及较少的开发工作,因为您可以减少额外的编码。

Update: Named Pipes can be used between processes on different machines. You just need to be aware of firewall issues which may arise in certain network topologies.

更新:命名管道可以在不同计算机上的进程之间使用。您只需要了解某些网络拓扑中可能出现的防火墙问题。

#1