使用Lucene提高基于位置的搜索的性能

时间:2023-02-06 03:04:14

I'm using Lucene for a job search portal using .net. Am facing some performance related issues in the following use case. Use case is: When doing job search, user can select job location(for exameple:Atlanta,GA) and select radial distance (say 50 miles).The time required to return job search results from Lucene is pretty high.

我正在使用Lucene作为使用.net的求职门户。我在以下用例中面临一些与性能相关的问题。使用案例是:在进行求职时,用户可以选择工作地点(对于exameple:Atlanta,GA)并选择径向距离(比如50英里)。从Lucene返回工作搜索结果所需的时间非常长。

FYI,we are maintaining a sql server 2005 database where we store US and Canada based city,state,longitude and latitude.(contains a total of about 1 million records).

仅供参考,我们正在维护一个sql server 2005数据库,我们存储美国和加拿大的城市,州,经度和纬度。(总共包含大约100万条记录)。

Is there anyway I can improve the performace of this location based job search?

无论如何,我可以改善这个基于位置的求职的表现吗?

3 个解决方案

#1


Basically, you have two types of search parameters: textual and spatial. You can probably use one type to filter the results you got from the other. For example, for someone looking for a .NET developer job near Atlanta, GA you could either first retrieve all the .NET developer jobs and filter for location, or retrieve all jobs around Atlanta and filter for .NET developer ones. I believe the first should be faster. You can also store the job locations directly in Lucene, and incorporate them in the search. A rough draft is: Indexing: 1. When you receive a new 'wanted' ad, find its geo-location using the database. 2. Store the location as a Lucene field in the ad's document. Retrieval: 1. Retrieve all jobs according to textual matches. 2. Use geometrical calculations for finding distances between the user's place and the job location. 3. Filter jobs according to distance.

基本上,您有两种类型的搜索参数:文本和空间。您可以使用一种类型来过滤从另一种类型中获得的结果。例如,对于在亚特兰大,GA附近寻找.NET开发人员工作的人,您可以首先检索所有.NET开发人员作业并过滤位置,或者检索亚特兰大周围的所有作业并筛选.NET开发人员。我相信第一个应该更快。您还可以直接在Lucene中存储作业位置,并将它们合并到搜索中。草稿是:索引:1。当您收到新的“通缉”广告时,请使用数据库查找其地理位置。 2.将位置存储为广告文档中的Lucene字段。检索:1。根据文本匹配检索所有作业。 2.使用几何计算来查找用户位置和作业位置之间的距离。 3.根据距离过滤作业。

Lucene in Action has an example of spatial search similar in spirit. A second edition is in the making. Also, check out Sujit Pal's suggestions for spatial search with Lucene and Patrick O'Leary's framework. There are also Locallucene and LocalSolr, but I do not know how mature they are.

Lucene in Action有一个类似于精神的空间搜索的例子。第二版正在制作中。另外,查看Sujit Pal对Lucene和Patrick O'Leary框架的空间搜索建议。还有Locallucene和LocalSolr,但我不知道它们有多成熟。

#2


my index size is about 4 MB.Am using the following code for building query for nearest cities:

我的索引大小约为4 MB。使用以下代码构建最近城市的查询:

foreach (string city in htNearestCities.Keys)
                {
                    cityStateQuery = new BooleanQuery();
                    queryCity = queryParserCity.Parse("\"" + city + "\"");
                    queryState = queryParserState.Parse("\"" + ((string[])htNearestCities[city])[1] + "\"");
                    cityStateQuery.Add(queryCity, BooleanClause.Occur.MUST); 
                    cityStateQuery.Add(queryState, BooleanClause.Occur.MUST);

                    findLocationQuery.Add(cityStateQuery, BooleanClause.Occur.SHOULD);
                    }

#3


You may ultimately want to have lucene handle the spatial search by indexing tiles. But if you're certain the lucene query is slow, not the finding of the cities, then start by indexing the state and city together. Much like indexing multiple columns in a relational database: a 'state:city' field with values like 'GA:Atlanta'. Then the intersection isn't done at query time.

您最终可能希望lucene通过索引切片来处理空间搜索。但是如果你确定lucene查询很慢,而不是城市的发现,那么首先将州和城市一起索引。就像在关系数据库中索引多个列一样:'state:city'字段,其值为'GA:Atlanta'。然后交叉点不在查询时完成。

#1


Basically, you have two types of search parameters: textual and spatial. You can probably use one type to filter the results you got from the other. For example, for someone looking for a .NET developer job near Atlanta, GA you could either first retrieve all the .NET developer jobs and filter for location, or retrieve all jobs around Atlanta and filter for .NET developer ones. I believe the first should be faster. You can also store the job locations directly in Lucene, and incorporate them in the search. A rough draft is: Indexing: 1. When you receive a new 'wanted' ad, find its geo-location using the database. 2. Store the location as a Lucene field in the ad's document. Retrieval: 1. Retrieve all jobs according to textual matches. 2. Use geometrical calculations for finding distances between the user's place and the job location. 3. Filter jobs according to distance.

基本上,您有两种类型的搜索参数:文本和空间。您可以使用一种类型来过滤从另一种类型中获得的结果。例如,对于在亚特兰大,GA附近寻找.NET开发人员工作的人,您可以首先检索所有.NET开发人员作业并过滤位置,或者检索亚特兰大周围的所有作业并筛选.NET开发人员。我相信第一个应该更快。您还可以直接在Lucene中存储作业位置,并将它们合并到搜索中。草稿是:索引:1。当您收到新的“通缉”广告时,请使用数据库查找其地理位置。 2.将位置存储为广告文档中的Lucene字段。检索:1。根据文本匹配检索所有作业。 2.使用几何计算来查找用户位置和作业位置之间的距离。 3.根据距离过滤作业。

Lucene in Action has an example of spatial search similar in spirit. A second edition is in the making. Also, check out Sujit Pal's suggestions for spatial search with Lucene and Patrick O'Leary's framework. There are also Locallucene and LocalSolr, but I do not know how mature they are.

Lucene in Action有一个类似于精神的空间搜索的例子。第二版正在制作中。另外,查看Sujit Pal对Lucene和Patrick O'Leary框架的空间搜索建议。还有Locallucene和LocalSolr,但我不知道它们有多成熟。

#2


my index size is about 4 MB.Am using the following code for building query for nearest cities:

我的索引大小约为4 MB。使用以下代码构建最近城市的查询:

foreach (string city in htNearestCities.Keys)
                {
                    cityStateQuery = new BooleanQuery();
                    queryCity = queryParserCity.Parse("\"" + city + "\"");
                    queryState = queryParserState.Parse("\"" + ((string[])htNearestCities[city])[1] + "\"");
                    cityStateQuery.Add(queryCity, BooleanClause.Occur.MUST); 
                    cityStateQuery.Add(queryState, BooleanClause.Occur.MUST);

                    findLocationQuery.Add(cityStateQuery, BooleanClause.Occur.SHOULD);
                    }

#3


You may ultimately want to have lucene handle the spatial search by indexing tiles. But if you're certain the lucene query is slow, not the finding of the cities, then start by indexing the state and city together. Much like indexing multiple columns in a relational database: a 'state:city' field with values like 'GA:Atlanta'. Then the intersection isn't done at query time.

您最终可能希望lucene通过索引切片来处理空间搜索。但是如果你确定lucene查询很慢,而不是城市的发现,那么首先将州和城市一起索引。就像在关系数据库中索引多个列一样:'state:city'字段,其值为'GA:Atlanta'。然后交叉点不在查询时完成。