从asp.net Web应用程序使用Lucene.Net线程安全

时间:2022-04-07 21:00:30

So I've been doing some research on the best way to implement Lucene.Net index searching and writing from within a web application. I set out with the following requirements:

所以我一直在研究从Web应用程序中实现Lucene.Net索引搜索和写入的最佳方法。我提出了以下要求:

  • Need to allow concurrent searching and accessing of the index (queries run in parallel)
  • 需要允许并发搜索和访问索引(查询并行运行)
  • there will be multiple indexes
  • 会有多个索引
  • having an index search be completely up-to-date ("real-time") is NOT a requirement
  • 使索引搜索完全是最新的(“实时”)不是必需的
  • run jobs to update the indexes on some frequency (frequency is different for each index)
  • 运行作业以在某个频率上更新索引(每个索引的频率不同)
  • obviously, would like to do all of this in a way which follows lucene "best practices" and can perform and scale well
  • 很明显,我希望以一种遵循lucene“最佳实践”并且能够很好地执行和扩展的方式来完成所有这些工作

I found some helpful resources, and a couple of good questions here on SO like this one

我找到了一些有用的资源,这里有几个很好的问题就像这样

Following that post as guidance, I decided to try a singleton pattern with a concurrent dictionary of a wrapper built to manage an index.

在该帖子作为指导之后,我决定尝试一个单例模式,其中包含一个用于管理索引的包装器的并发字典。

To make things simpler, I'll pretend that I am only managing one index, in which case the wrapper can become the singleton. This ends up looking like this:

为了简单起见,我假装我只管理一个索引,在这种情况下,包装器可以成为单例。最终看起来像这样:

public sealed class SingleIndexManager
{
    private const string IndexDirectory = "C:\\IndexDirectory\\";
    private const string IndexName = "test-index";
    private static readonly Version _version = Version.LUCENE_29;

    #region Singleton Behavior
    private static volatile SingleIndexManager _instance;
    private static object syncRoot = new Object();

    public static SingleIndexManager Instance
    {
        get
        {
            if (_instance == null)
            {
                lock (syncRoot)
                {
                    if (_instance == null)
                        _instance = new SingleIndexManager();
                }
            }

            return _instance;
        }
    }
    #endregion

    private IndexWriter _writer;
    private IndexSearcher _searcher;

    private int _activeSearches = 0;
    private int _activeWrites = 0;

    private SingleIndexManager()
    {
        lock(syncRoot)
        {
            _writer = CreateWriter(); //hidden for sake of brevity
            _searcher = new IndexSearcher(_writer.GetReader());
        }
    }

    public List<Document> Search(Func<IndexSearcher,List<Document>> searchMethod)
    {
        lock(syncRoot)
        {
            if(_searcher != null && !_searcher.GetIndexReader().IsCurrent() && _activeSearches == 0)
            {
                _searcher.Close();
                _searcher = null;
            }
            if(_searcher == null)
            {
                _searcher = new IndexSearcher((_writer ?? (_writer = CreateWriter())).GetReader());
            }
        }
        List<Document> results;
        Interlocked.Increment(ref _activeSearches);
        try
        {
            results = searchMethod(_searcher);
        } 
        finally
        {
            Interlocked.Decrement(ref _activeSearches);
        }
        return results;
    }

    public void Write(List<Document> docs)
    {
        lock(syncRoot)
        {
            if(_writer == null)
            {
                _writer = CreateWriter();
            }
        }
        try
        {
            Interlocked.Increment(ref _activeWrites);
            foreach (Document document in docs)
            {
                _writer.AddDocument(document, new StandardAnalyzer(_version));
            }

        } 
        finally
        {
            lock(syncRoot)
            {
                int writers = Interlocked.Decrement(ref _activeWrites);
                if(writers == 0)
                {
                    _writer.Close();
                    _writer = null;
                }
            }
        }
    }
}

Theoretically, this is supposed to allow a thread-safe singleton instance for an Index (here named "index-test") where I have two publicly exposed methods, Search() and Write() which can be called from within an ASP.NET web application with no concerns regarding thread safety? (if this is incorrect, please let me know).

从理论上讲,这应该允许一个线程安全的单例实例用于索引(此处命名为“index-test”),其中我有两个公开公开的方法,Search()和Write()可以在ASP.NET中调用Web应用程序,不关心线程安全? (如果这不正确,请告诉我)。

There was one thing which is giving me a little bit of trouble right now:

现在有一件事让我有点麻烦:

How do I gracefully close these instances on Application_End in the Global.asax.cs file so that if I want to restart my web application in IIS, I am not going to get a bunch of write.lock failures, etc?

如何在Global.asax.cs文件中优雅地关闭Application_End上的这些实例,这样如果我想在IIS中重新启动我的Web应用程序,我不会得到一堆write.lock失败等?

All I can think of so far is:

到目前为止,我所能想到的只有:

public void Close()
{
    lock(syncRoot)
    {
        _searcher.Close();
        _searcher.Dispose();
        _searcher = null;

        _writer.Close();
        _writer.Dispose();
        _writer = null;
    }
}

and calling that in Application_End, but if I have any active searchers or writers, is this going to result in a corrupt index?

并在Application_End中调用它,但如果我有任何活跃的搜索者或编写者,这是否会导致索引损坏?

Any help or suggestions are much appreciated. thanks.

任何帮助或建议都非常感谢。谢谢。

3 个解决方案

#1


11  

Lucene.NET is very thread safe. I can say for sure that all of the methods on the IndexWriter and IndexReader classes are thread-safe and you can use them without having to worry about synchronization. You can get rid of all of your code that involves synchronizing around instances of these classes.

Lucene.NET非常安全。我可以肯定地说,IndexWriter和IndexReader类上的所有方法都是线程安全的,您可以使用它们而不必担心同步。您可以删除涉及在这些类的实例之间进行同步的所有代码。

That said, the bigger problem is using Lucene.NET from ASP.NET. ASP.NET recycles the application pool for a number of reasons, however, while shutting down one application domain, it brings up another one to handle new requests to the site.

也就是说,更大的问题是使用ASP.NET中的Lucene.NET。 ASP.NET会出于多种原因回收应用程序池,但是,在关闭一个应用程序域时,它会引发另一个应用程序池来处理对该站点的新请求。

If you try to access the same physical files (assuming you are using the file-system based FSDirectory) with a different IndexWriter/IndexReader, then you'll get an error as the lock on the files hasn't been released by the application domain that hasn't been shut down yet.

如果您尝试使用不同的IndexWriter / IndexReader访问相同的物理文件(假设您使用的是基于文件系统的FSDirectory),那么您将收到错误,因为应用程序域尚未释放文件锁定尚未关闭的。

To that end, the recommended best practice is to control the process that is handling the access to Lucene.NET; this usually means creating a service in which you'd expose your operations via Remoting or WCF (preferably the latter).

为此,建议的最佳实践是控制处理Lucene.NET访问的进程;这通常意味着创建一个服务,您可以通过Remoting或WCF(最好是后者)公开您的操作。

It's more work this way (as you'd have to create all of the abstractions to represent your operations), but you gain the following benefits:

这样做更多(因为你必须创建代表你的操作的所有抽象),但你获得了以下好处:

  • The service process will always be up, which means that the clients (the ASP.NET application) won't have to worry about contending for the files that FSDirectory requires. They simply have to call the service.

    服务进程将始终处于启动状态,这意味着客户端(ASP.NET应用程序)不必担心争用FSDirectory所需的文件。他们只需要打电话给服务。

  • You're abstracting your search operations on a higher level. You aren't accessing Lucene.NET directly, but rather, your defining the operations and types that are required for those operations. Once you have that abstracted away, if you decide to move from Lucene.NET to some other search mechanism (say RavenDB), then it's a matter of changing the implementation of the contract.

    您正在更高层次上抽象搜索操作。您不是直接访问Lucene.NET,而是定义这些操作所需的操作和类型。一旦你完成了抽象,如果你决定从Lucene.NET转移到其他搜索机制(比如RavenDB),那么就需要改变合同的实现。

#2


3  

  • Opening an IndexWriter may be a heavy operation. You can reuse it.
  • 打开IndexWriter可能是一项繁重的操作。你可以重复使用它。
  • There's a lock in Write(...) to ensure a transactional behavior, all documents are added and written to disk before the method returns. The call to Commit() can be a lengthy operation (it may cause segment merges). You can move this to a background thread if you want (which introduces scenarios where some of the documents added are written in a commit, some in another).
  • Write(...)中有一个锁定以确保事务行为,在方法返回之前添加所有文档并将其写入磁盘。对Commit()的调用可能是一个冗长的操作(它可能导致段合并)。如果需要,可以将其移动到后台线程(这会引入一些添加的文档在提交中写入,一些在另一个中写入的情况)。
  • There's no need for an unconditional lock in your Search(...) method. You could check if you have a _searcher instance, and use it. It is set to null in Write(...) to force a new searcher.
  • 您的搜索(...)方法无需无条件锁定。您可以检查是否有_searcher实例,并使用它。在Write(...)中将其设置为null以强制使用新的搜索器。
  • I'm not sure about your use of a searchMethod, it looks like something a collector is better suited for.
  • 我不确定你使用searchMethod,它看起来像收藏家更适合的东西。


public sealed class SingleIndexManager {
    private static readonly Version _version = Version.LUCENE_29;
    private readonly IndexWriter _writer;
    private volatile IndexSearcher _searcher;
    private readonly Object _searcherLock = new Object();

    private SingleIndexManager() {
        _writer = null; // TODO
    }

    public List<Document> Search(Func<IndexSearcher, List<Document>> searchMethod) {
        var searcher = _searcher;
        if (searcher == null) {
            lock (_searcherLock) {
                if (_searcher == null) {
                    var reader = _writer.GetReader();
                    _searcher = searcher = new IndexSearcher(reader);
                }
            }
        }

        return searchMethod(searcher);
    }

    public void Write(List<Document> docs) {
        lock (_writer) {
            foreach (var document in docs) {
                _writer.AddDocument(document, new StandardAnalyzer(_version));
            }

            _writer.Commit();
            _searcher = null;
        }
    }
}

#3


1  

You can also disable application pool overlap setting in IIS to avoid Lucene write.lock issues when one app pool is shutting down (but still holding the write.lock) and IIS is preparing another one for new requests.

您还可以在IIS中禁用应用程序池重叠设置,以避免在一个应用程序池关闭时(但仍保留write.lock)并且IIS正在为新请求准备另一个应用程序池时出现Lucene write.lock问题。

#1


11  

Lucene.NET is very thread safe. I can say for sure that all of the methods on the IndexWriter and IndexReader classes are thread-safe and you can use them without having to worry about synchronization. You can get rid of all of your code that involves synchronizing around instances of these classes.

Lucene.NET非常安全。我可以肯定地说,IndexWriter和IndexReader类上的所有方法都是线程安全的,您可以使用它们而不必担心同步。您可以删除涉及在这些类的实例之间进行同步的所有代码。

That said, the bigger problem is using Lucene.NET from ASP.NET. ASP.NET recycles the application pool for a number of reasons, however, while shutting down one application domain, it brings up another one to handle new requests to the site.

也就是说,更大的问题是使用ASP.NET中的Lucene.NET。 ASP.NET会出于多种原因回收应用程序池,但是,在关闭一个应用程序域时,它会引发另一个应用程序池来处理对该站点的新请求。

If you try to access the same physical files (assuming you are using the file-system based FSDirectory) with a different IndexWriter/IndexReader, then you'll get an error as the lock on the files hasn't been released by the application domain that hasn't been shut down yet.

如果您尝试使用不同的IndexWriter / IndexReader访问相同的物理文件(假设您使用的是基于文件系统的FSDirectory),那么您将收到错误,因为应用程序域尚未释放文件锁定尚未关闭的。

To that end, the recommended best practice is to control the process that is handling the access to Lucene.NET; this usually means creating a service in which you'd expose your operations via Remoting or WCF (preferably the latter).

为此,建议的最佳实践是控制处理Lucene.NET访问的进程;这通常意味着创建一个服务,您可以通过Remoting或WCF(最好是后者)公开您的操作。

It's more work this way (as you'd have to create all of the abstractions to represent your operations), but you gain the following benefits:

这样做更多(因为你必须创建代表你的操作的所有抽象),但你获得了以下好处:

  • The service process will always be up, which means that the clients (the ASP.NET application) won't have to worry about contending for the files that FSDirectory requires. They simply have to call the service.

    服务进程将始终处于启动状态,这意味着客户端(ASP.NET应用程序)不必担心争用FSDirectory所需的文件。他们只需要打电话给服务。

  • You're abstracting your search operations on a higher level. You aren't accessing Lucene.NET directly, but rather, your defining the operations and types that are required for those operations. Once you have that abstracted away, if you decide to move from Lucene.NET to some other search mechanism (say RavenDB), then it's a matter of changing the implementation of the contract.

    您正在更高层次上抽象搜索操作。您不是直接访问Lucene.NET,而是定义这些操作所需的操作和类型。一旦你完成了抽象,如果你决定从Lucene.NET转移到其他搜索机制(比如RavenDB),那么就需要改变合同的实现。

#2


3  

  • Opening an IndexWriter may be a heavy operation. You can reuse it.
  • 打开IndexWriter可能是一项繁重的操作。你可以重复使用它。
  • There's a lock in Write(...) to ensure a transactional behavior, all documents are added and written to disk before the method returns. The call to Commit() can be a lengthy operation (it may cause segment merges). You can move this to a background thread if you want (which introduces scenarios where some of the documents added are written in a commit, some in another).
  • Write(...)中有一个锁定以确保事务行为,在方法返回之前添加所有文档并将其写入磁盘。对Commit()的调用可能是一个冗长的操作(它可能导致段合并)。如果需要,可以将其移动到后台线程(这会引入一些添加的文档在提交中写入,一些在另一个中写入的情况)。
  • There's no need for an unconditional lock in your Search(...) method. You could check if you have a _searcher instance, and use it. It is set to null in Write(...) to force a new searcher.
  • 您的搜索(...)方法无需无条件锁定。您可以检查是否有_searcher实例,并使用它。在Write(...)中将其设置为null以强制使用新的搜索器。
  • I'm not sure about your use of a searchMethod, it looks like something a collector is better suited for.
  • 我不确定你使用searchMethod,它看起来像收藏家更适合的东西。


public sealed class SingleIndexManager {
    private static readonly Version _version = Version.LUCENE_29;
    private readonly IndexWriter _writer;
    private volatile IndexSearcher _searcher;
    private readonly Object _searcherLock = new Object();

    private SingleIndexManager() {
        _writer = null; // TODO
    }

    public List<Document> Search(Func<IndexSearcher, List<Document>> searchMethod) {
        var searcher = _searcher;
        if (searcher == null) {
            lock (_searcherLock) {
                if (_searcher == null) {
                    var reader = _writer.GetReader();
                    _searcher = searcher = new IndexSearcher(reader);
                }
            }
        }

        return searchMethod(searcher);
    }

    public void Write(List<Document> docs) {
        lock (_writer) {
            foreach (var document in docs) {
                _writer.AddDocument(document, new StandardAnalyzer(_version));
            }

            _writer.Commit();
            _searcher = null;
        }
    }
}

#3


1  

You can also disable application pool overlap setting in IIS to avoid Lucene write.lock issues when one app pool is shutting down (but still holding the write.lock) and IIS is preparing another one for new requests.

您还可以在IIS中禁用应用程序池重叠设置,以避免在一个应用程序池关闭时(但仍保留write.lock)并且IIS正在为新请求准备另一个应用程序池时出现Lucene write.lock问题。