I'm building an ASP.NET MVC site where I plan to use Lucene.Net. I've envisioned a way to structure the usage of Lucene, but not sure whether my planned architecture is OK and efficient.
我建立一个ASP。我计划使用Lucene.Net的MVC网站。我已经设想了一种方法来组织Lucene的使用,但是我不确定我计划的架构是否正确有效。
My Plan:
- On
Application_Start
event in Global.asax: I check for the existence of the index on the file system - if it doesn't exist, I create it and fill it with documents extracted it from the database. - 在全局的Application_Start事件上。asax:我检查文件系统上的索引是否存在——如果不存在,我就创建它,并用从数据库中提取的文档填充它。
- When new content is submitted: I create an
IndexWriter
, fill up a document, write to the index, and finally dispose of theIndexWriter
.IndexWriters
are not reused, as I can't imagine a good way to do that in an ASP.NET MVC application. - 提交新内容时:我创建一个IndexWriter,填充文档,写入索引,最后处理IndexWriter。索引编写者不会被重用,因为我无法想象在ASP中有一种很好的方法来实现这一点。净MVC应用程序。
- When content is edited: I repeat the same process as when new content is submitted, except that I first delete the old content and then add the edits.
- 当内容被编辑时:我重复相同的过程,当新的内容被提交时,除了我首先删除旧的内容,然后添加编辑。
- When a user searches for content: I check
HttpRuntime.Cache
to see if a user has already searched for this term in the last 5 minutes - if they have, I return those results; otherwise, I create anIndexReader
, build and run a query, put the results inHttpRuntime.Cache
, return them to the user, and finally dispose of theIndexReader
. Once again,IndexReaders
aren't reused. - 当用户搜索内容时:我检查HttpRuntime。缓存,查看用户是否在最后5分钟内已经搜索了这个术语——如果有,我将返回这些结果;否则,我将创建一个IndexReader,构建并运行一个查询,并将结果放到HttpRuntime中。缓存,返回给用户,最后处理IndexReader。同样,索引阅读器也不会被重用。
My Questions:
- Is that a good structure - how can I improve it?
- 这是一个好的结构吗?
- Are there any performance/efficiency problems I should be aware of?
- 有什么性能/效率问题我应该注意吗?
- Also, is not reusing the IndexReaders and IndexWriters a huge code smell?
- 此外,重用indexreader和indexwriter难道不是一种巨大的代码味道吗?
2 个解决方案
#1
15
The answer to all three of your questions is the same: reuse your readers (and possibly your writers). You can use a singleton pattern to do this (i.e. declare your reader/writer as public static). Lucene's FAQ tells you the same thing: share your readers, because the first query is reaaalllyyyy slow. Lucene handles all the locking for you, so there is really no reason why you shouldn't have a shared reader.
三个问题的答案都是一样的:重用你的读者(可能还有你的作者)。您可以使用单例模式来实现这一点(例如,将您的读写器声明为公共静态)。Lucene的FAQ告诉你同样的事情:分享你的读者,因为第一个查询非常慢。Lucene为您处理所有的锁,所以您没有理由不使用共享阅读器。
It's probably easiest to just keep your writer around and (using the NRT model) get the readers from that. If it's rare that you are writing to the index, or if you don't have a huge need for speed, then it's probably OK to open your writer each time instead. That is what I do.
可能最简单的方法就是让你的作者在你身边,然后(使用NRT模型)从那里获得读者。如果你很少写索引,或者你对速度没有很大的需求,那么最好每次都打开你的作者。我就是这么做的。
Edit: added a code sample:
编辑:添加代码示例:
public static IndexWriter writer = new IndexWriter(myDir);
public JsonResult SearchForStuff(string query)
{
IndexReader reader = writer.GetReader();
IndexSearcher search = new IndexSearcher(reader);
// do the search
}
#2
13
I would probably skip the caching -- Lucene is very, very efficent. Perhaps so efficent that it is faster to search again than cache.
我可能会跳过缓存——Lucene非常非常有效。也许它的效率如此之高,以至于搜索比缓存要快。
The OnApplication_Start full index feels a bit off to me -- should probably be run in it's own thread so as not to block other expensive startup activities.
OnApplication_Start完整索引对我来说有点不对劲——应该在它自己的线程中运行,以免阻塞其他昂贵的启动活动。
#1
15
The answer to all three of your questions is the same: reuse your readers (and possibly your writers). You can use a singleton pattern to do this (i.e. declare your reader/writer as public static). Lucene's FAQ tells you the same thing: share your readers, because the first query is reaaalllyyyy slow. Lucene handles all the locking for you, so there is really no reason why you shouldn't have a shared reader.
三个问题的答案都是一样的:重用你的读者(可能还有你的作者)。您可以使用单例模式来实现这一点(例如,将您的读写器声明为公共静态)。Lucene的FAQ告诉你同样的事情:分享你的读者,因为第一个查询非常慢。Lucene为您处理所有的锁,所以您没有理由不使用共享阅读器。
It's probably easiest to just keep your writer around and (using the NRT model) get the readers from that. If it's rare that you are writing to the index, or if you don't have a huge need for speed, then it's probably OK to open your writer each time instead. That is what I do.
可能最简单的方法就是让你的作者在你身边,然后(使用NRT模型)从那里获得读者。如果你很少写索引,或者你对速度没有很大的需求,那么最好每次都打开你的作者。我就是这么做的。
Edit: added a code sample:
编辑:添加代码示例:
public static IndexWriter writer = new IndexWriter(myDir);
public JsonResult SearchForStuff(string query)
{
IndexReader reader = writer.GetReader();
IndexSearcher search = new IndexSearcher(reader);
// do the search
}
#2
13
I would probably skip the caching -- Lucene is very, very efficent. Perhaps so efficent that it is faster to search again than cache.
我可能会跳过缓存——Lucene非常非常有效。也许它的效率如此之高,以至于搜索比缓存要快。
The OnApplication_Start full index feels a bit off to me -- should probably be run in it's own thread so as not to block other expensive startup activities.
OnApplication_Start完整索引对我来说有点不对劲——应该在它自己的线程中运行,以免阻塞其他昂贵的启动活动。