When using Lucene.Net with ASP.NET, I can imagine that one web request can trigger an update to the index while another web request is performing a search. Does Lucene.Net have built in it the ability to manage concurrent access, or do I have to manage it, to avoid "being used by another process" errors?
当使用Lucene.Net和ASP.NET时,我可以想象一个Web请求可以触发对索引的更新,而另一个Web请求正在执行搜索。 Lucene.Net是否内置了管理并发访问的能力,或者我是否必须对其进行管理,以避免“被其他进程使用”错误?
EDIT: After reading docs and experimentation, this is what I think I've learned: There are two issues, thread safety and concurrency. Multithreading is "safe" in that you can't do anything bad to the index. But, it's safe at the cost of just one object having a lock on the index at one time. The second object will come along and throw an exception. So, you can't leave a search open and expect a writer in another thread to be able to update the index. And if a thread is busy updating the index, then trying to create a searcher will fail.
编辑:阅读文档和实验后,我认为这是我所学到的:有两个问题,线程安全和并发。多线程是“安全的”,因为你不能对索引做任何坏事。但是,以一个锁定索引的对象为代价是安全的。第二个对象将出现并抛出异常。因此,您不能打开搜索并期望另一个线程中的编写者能够更新索引。如果线程正在忙于更新索引,那么尝试创建搜索器将失败。
Also, Searchers see the index as it was at the time that they open it, so if you keep them around, and update the index, they won't see the updates.
此外,搜索者会看到索引在打开时的状态,因此如果您保留它们并更新索引,他们将看不到更新。
I wanted my searchers to see the latest updates.
我希望我的搜索者能够看到最新的更新。
My design, and it seems to be working so far, is that my writers and searchers share a lock, so that they don't fail - they just wait - until the current write or search is done.
我的设计,到目前为止似乎工作,是我的作家和搜索者共享锁,以便他们不会失败 - 他们只是等待 - 直到当前的写或搜索完成。
3 个解决方案
#1
21
According to this page,
根据这个页面,
Indexing and searching are not only thread safe, but process safe. What this means is that:
索引和搜索不仅是线程安全的,而且是过程安全的。这意味着:
- Multiple index searchers can read the lucene index files at the same time.
- 多个索引搜索者可以同时读取lucene索引文件。
- An index writer or reader can edit the lucene index files while searches are ongoing
- 索引编写者或读者可以在搜索过程中编辑lucene索引文件
- Multiple index writers or readers can try to edit the lucene index files at the same time (it's important for the index writer/reader to be closed so it will release the file lock). However, the query parser is not thread safe, so each thread using the index should have its own query parser.
- 多个索引编写者或读者可以尝试同时编辑lucene索引文件(关闭索引编写器/读取器以便释放文件锁定非常重要)。但是,查询解析器不是线程安全的,因此使用索引的每个线程都应该有自己的查询解析器。
The index writer however, is thread safe, so you can update the index while people are searching it. However, you then have to make sure that the threads with open index searchers close them and open new ones, to get the newly updated data.
但是,索引编写器是线程安全的,因此您可以在人们搜索索引时更新索引。但是,您必须确保具有开放索引搜索器的线程关闭它们并打开新的线程以获取新更新的数据。
#2
3
You may have issues, if your indexing thread is creating a new document which results in merging of some index segments, then the merged segments will be deleted and new segment will be created. The problem is that your index searcher loaded up all the segments when it was opened, such that is has "pointers" to those segments which existed when it was opened. Now if the index writer does a segment merge and deletes a segment, your index searcher will still think that segment file exists and will fail with a "file not found error". What you really need to do is seperate your writable index from your searchable index, by using SOLR or doing your own index snapshot replication similar to what SOLR does. I have build very similar system to SOLR using .NET and Lucene.NET on Windows, using NTFS hard-links to make efficient snapshot replication. I can give you more info if you are interested.
如果您的索引线程正在创建一个导致合并某些索引段的新文档,那么您可能会遇到问题,然后将删除合并的段并创建新段。问题是你的索引搜索器在打开时加载了所有段,这样就有了指向打开时存在的那些段的“指针”。现在,如果索引编写器执行段合并并删除一个段,则索引搜索器仍将认为该段文件存在,并且将失败并显示“找不到文件错误”。您真正需要做的是通过使用SOLR或执行类似于SOLR的索引快照复制,从可搜索索引中分离可写索引。我在Windows上使用.NET和Lucene.NET构建了与SOLR非常相似的系统,使用NTFS硬链接进行高效的快照复制。如果你有兴趣,我可以给你更多信息。
#3
2
You don't have a problem with that so much as managing concurrent writes to the index. I've had an easier path going with SOLR, which abstracts most of those differences away for me since it runs as a server.
您没有遇到与管理对索引的并发写入有关的问题。我有一个更简单的SOLR路径,它为我提供了大部分差异,因为它作为服务器运行。
#1
21
According to this page,
根据这个页面,
Indexing and searching are not only thread safe, but process safe. What this means is that:
索引和搜索不仅是线程安全的,而且是过程安全的。这意味着:
- Multiple index searchers can read the lucene index files at the same time.
- 多个索引搜索者可以同时读取lucene索引文件。
- An index writer or reader can edit the lucene index files while searches are ongoing
- 索引编写者或读者可以在搜索过程中编辑lucene索引文件
- Multiple index writers or readers can try to edit the lucene index files at the same time (it's important for the index writer/reader to be closed so it will release the file lock). However, the query parser is not thread safe, so each thread using the index should have its own query parser.
- 多个索引编写者或读者可以尝试同时编辑lucene索引文件(关闭索引编写器/读取器以便释放文件锁定非常重要)。但是,查询解析器不是线程安全的,因此使用索引的每个线程都应该有自己的查询解析器。
The index writer however, is thread safe, so you can update the index while people are searching it. However, you then have to make sure that the threads with open index searchers close them and open new ones, to get the newly updated data.
但是,索引编写器是线程安全的,因此您可以在人们搜索索引时更新索引。但是,您必须确保具有开放索引搜索器的线程关闭它们并打开新的线程以获取新更新的数据。
#2
3
You may have issues, if your indexing thread is creating a new document which results in merging of some index segments, then the merged segments will be deleted and new segment will be created. The problem is that your index searcher loaded up all the segments when it was opened, such that is has "pointers" to those segments which existed when it was opened. Now if the index writer does a segment merge and deletes a segment, your index searcher will still think that segment file exists and will fail with a "file not found error". What you really need to do is seperate your writable index from your searchable index, by using SOLR or doing your own index snapshot replication similar to what SOLR does. I have build very similar system to SOLR using .NET and Lucene.NET on Windows, using NTFS hard-links to make efficient snapshot replication. I can give you more info if you are interested.
如果您的索引线程正在创建一个导致合并某些索引段的新文档,那么您可能会遇到问题,然后将删除合并的段并创建新段。问题是你的索引搜索器在打开时加载了所有段,这样就有了指向打开时存在的那些段的“指针”。现在,如果索引编写器执行段合并并删除一个段,则索引搜索器仍将认为该段文件存在,并且将失败并显示“找不到文件错误”。您真正需要做的是通过使用SOLR或执行类似于SOLR的索引快照复制,从可搜索索引中分离可写索引。我在Windows上使用.NET和Lucene.NET构建了与SOLR非常相似的系统,使用NTFS硬链接进行高效的快照复制。如果你有兴趣,我可以给你更多信息。
#3
2
You don't have a problem with that so much as managing concurrent writes to the index. I've had an easier path going with SOLR, which abstracts most of those differences away for me since it runs as a server.
您没有遇到与管理对索引的并发写入有关的问题。我有一个更简单的SOLR路径,它为我提供了大部分差异,因为它作为服务器运行。