I'm developing a ASP.Net MVC3 app which will have few hundred videos. I want to create a search system based on tags and other parameters like the user type that uploaded the video, the date of the video, video category, etc..
我开发一个ASP。Net MVC3应用程序将有几百个视频。我想创建一个基于标签和其他参数的搜索系统,比如上传视频的用户类型,视频的日期,视频类别等等。
I have been looking around and Lucene.NET seems really good tool for full text search, but I don't know if it's the best solution for my project... I have read the tutorials and they recommend to keep the search index to a minimum but also that you should NOT hit your database for retrieving extra data that is not stored in the search index...
我一直在四处找露西妮。NET似乎是一个很好的全文搜索工具,但我不知道它是否是我项目的最佳解决方案……我读过这些教程,它们建议将搜索索引保持在最少,但也不应该为了检索未存储在搜索索引中的额外数据而访问数据库……
How this can be possible?
这怎么可能呢?
Lets put an example: I have a video row (as a concept, this is really held in different SQL tables) which has columns for the video id, the video name, the video file name, the full path, user id, user type, tags, creation date, video category, video subcategory, video location, etc... If I want to create a lucene search index I think I will have to put all the information in there so that later on I can query on every parameter, right?
让一个例子:我有一个视频行(作为一个概念,这是在不同的SQL表)举行视频id列,视频名称、视频文件名称,完整路径,用户id、用户类型、标签,创建日期,视频类别,视频子类别,视频的位置,等等……如果我想创建一个lucene搜索索引我想我必须把所有的信息都放在那里以便以后我可以查询每个参数,对吧?
This seems to me a duplicate of the SQL Database but with the overload of adding, editing and removing documents from lucene search index. Is this the standard scenario when using lucene? All the examples I have seen with lucene are based on a post id, post title and post body..
在我看来,这是SQL数据库的一个副本,但是在lucene搜索索引中添加、编辑和删除文档的工作量太大了。这是使用lucene时的标准场景吗?我在lucene上看到的所有例子都是基于一个post id, post title和post body。
What do you think? Can you give me some light?
你怎么认为?你能给我点灯吗?
1 个解决方案
#1
2
Yes, if you want to query multiple fields (including things like tags) from within lucene, you'll need to make that data available to lucene. It might sound like this is duplication, but it is not redundant duplication - it is restructuring the data into a very different layout - indexed for search.
是的,如果您想从lucene中查询多个字段(包括诸如标记之类的内容),则需要将这些数据提供给lucene。这听起来像是复制,但它不是冗余的复制——它正在将数据重组成一种非常不同的布局——为搜索建立索引。
It should work fine; it is pretty much how search works here on * (which is using lucene.net to perform the search).
它应该工作得很好;这差不多就是*上的搜索工作原理(它使用lucene.net来执行搜索)。
It should be noted, however, that a few hundred is not a large sample: frankly you could do that any way you like, and it'll take about the same amount of time. Writing a complex SQL query should work, as should full-text-search in the database (that is how *'s search used to work), as should filtering objects in-memory (at the few-hundred level, you could trivially just cache all the data excluding video frames in memory).
但是,需要注意的是,几百个并不是一个大样本:坦白地说,你可以用任何你喜欢的方式去做,而且要花费同样的时间。编写一个复杂的SQL查询应该是可行的,就像在数据库中进行全文搜索(*的搜索就是这样工作的)一样,也应该过滤内存中的对象(在少量的级别上,您可以简单地缓存除内存中的视频帧之外的所有数据)。
#1
2
Yes, if you want to query multiple fields (including things like tags) from within lucene, you'll need to make that data available to lucene. It might sound like this is duplication, but it is not redundant duplication - it is restructuring the data into a very different layout - indexed for search.
是的,如果您想从lucene中查询多个字段(包括诸如标记之类的内容),则需要将这些数据提供给lucene。这听起来像是复制,但它不是冗余的复制——它正在将数据重组成一种非常不同的布局——为搜索建立索引。
It should work fine; it is pretty much how search works here on * (which is using lucene.net to perform the search).
它应该工作得很好;这差不多就是*上的搜索工作原理(它使用lucene.net来执行搜索)。
It should be noted, however, that a few hundred is not a large sample: frankly you could do that any way you like, and it'll take about the same amount of time. Writing a complex SQL query should work, as should full-text-search in the database (that is how *'s search used to work), as should filtering objects in-memory (at the few-hundred level, you could trivially just cache all the data excluding video frames in memory).
但是,需要注意的是,几百个并不是一个大样本:坦白地说,你可以用任何你喜欢的方式去做,而且要花费同样的时间。编写一个复杂的SQL查询应该是可行的,就像在数据库中进行全文搜索(*的搜索就是这样工作的)一样,也应该过滤内存中的对象(在少量的级别上,您可以简单地缓存除内存中的视频帧之外的所有数据)。