My lucene index has got latitude and longitudes fields indexed as follows:
我的lucene指数的纬度和经度字段索引如下:
doc.Add(new Field("latitude", latitude.ToString() , Field.Store.YES, Field.Index.UN_TOKENIZED));
doc.Add(new Field("longitude", longitude.ToString(), Field.Store.YES, Field.Index.UN_TOKENIZED));
I want to retrieve a set of documents from this index whose lat and long values are in a given range.
我想从这个索引中检索一组文档,其lat和long值在给定范围内。
As you already know, Lat and long could be negative values. How do i correctly store signed decimal numbers in Lucene? Would the approach mentioned below give correct results or is there any other way to do this?
如您所知,Lat和long可能是负值。如何在Lucene中正确存储带符号的十进制数?下面提到的方法是否会给出正确的结果,还是有其他方法可以做到这一点?
Term lowerLatitude = new Term("latitude", bounds.South.ToString() );
Term upperLatitude = new Term("latitude", bounds.North.ToString());
RangeQuery latitudeRangeQuery = new RangeQuery(lowerLatitude, upperLatitude, true);
findLocationQuery.Add(latitudeRangeQuery, BooleanClause.Occur.SHOULD);
Term lowerLongitude = new Term("longitude", bounds.West.ToString());
Term upperLongitude = new Term("longitude", bounds.East.ToString());
RangeQuery longitudeRangeQuery = new RangeQuery(lowerLongitude, upperLongitude, true);
findLocationQuery.Add(longitudeRangeQuery, BooleanClause.Occur.SHOULD);
Also,I wanted to know how Lucene's ConstantScoreRangeQuery is better than RangeQuery class.
另外,我想知道Lucene的ConstantScoreRangeQuery如何比RangeQuery类更好。
Am facing another problem in this context: I've one of the documents in the index with the following 3 cities:
在这种情况下我面临另一个问题:我在索引中有以下3个城市的文件之一:
-
Lyons, IL
Oak *, IL
伊利诺伊州奥克布鲁克
San Francisco, CA
加利福尼亚州旧金山
If i give input as "Lyons, IL" then this record comes up. But if i give San Francisco, CA as input, then it does not.
如果我输入“Lyons,IL”,则会出现此记录。但如果我将旧金山,加州作为输入,那么它就不会。
However, if i store the cities for this document as follows:
但是,如果我按如下方式存储此文档的城市:
-
San Francisco, CA
加利福尼亚州旧金山
Lyons, IL
Oak *, IL
伊利诺伊州奥克布鲁克
and when i give San Francisco, CA as input, then this record shows in the search results.
当我将旧金山,CA作为输入时,此记录显示在搜索结果中。
What i want here is that if i type any of the 3 cities in input,I should get this document in the search results.
我想要的是,如果我在输入中键入3个城市中的任何一个,我应该在搜索结果中获取此文档。
Please help me achieve this.
请帮我实现这个目标。
Thanks.
3 个解决方案
#1
Following up on skaffman's suggestion, you can use the same tile coordinate system used by all the popular map apps. Choose whatever zoom level is granular enough for your needs, and don't forget to pad with leading zeros.
按照skaffman的建议,您可以使用所有流行的地图应用程序使用的相同拼贴坐标系。根据您的需要选择任何缩放级别,并且不要忘记使用前导零填充。
Regarding RangeQuery, it's slower than ConstantScoreRangeQuery and limits the range of values.
关于RangeQuery,它比ConstantScoreRangeQuery慢,并且限制了值的范围。
Regarding the city-state problem, we can only speculate. But the first things to check are that the indexed terms and the parsed query are what you expect them to be.
关于城邦问题,我们只能推测。但首先要检查的是索引条款和解析后的查询是你期望的。
#2
I think the best way is to convert/normalize the coordinates as suggested in the previous post. This article does exactly this. It's actually quite nice object orientated code.
我认为最好的方法是按照上一篇文章的建议转换/标准化坐标。本文就是这样做的。它实际上是非常好的面向对象的代码。
Regarding your second problem. I would assume you have some sort of Analyzer problem. Are you using the same Analyzer for indexing and querying? Which tokenizers do you use?
关于你的第二个问题。我会假设你有某种分析器问题。您使用相同的Analyzer进行索引和查询吗?你使用哪些标记器?
I recommend to use Luke to inspect your generated index to see what tokens are actually searchable.
我建议使用Luke检查生成的索引,以查看实际可搜索的令牌。
--Hardy
#3
One option here is to convert the coordinates into a system that doesn't have negative numbers. For example, I've had a similar problem for a google maps webapp for the UK, and I stored UK Easting/Northings (which range from 0 to 7 digits) fields in Lucene alongside the lat/long values. By formatting these eastings/northings with left-padded zeroes, I could do lucene range queries.
这里的一个选择是将坐标转换为没有负数的系统。例如,我对英国的谷歌地图webapp有类似的问题,我将Lucene的UK Easting / Northings(范围从0到7位)字段与lat / long值一起存储。通过使用左边填充的零来格式化这些eastings / northings,我可以进行lucene范围查询。
Is there a similar coordinate system for the US?
美国是否有类似的坐标系?
#1
Following up on skaffman's suggestion, you can use the same tile coordinate system used by all the popular map apps. Choose whatever zoom level is granular enough for your needs, and don't forget to pad with leading zeros.
按照skaffman的建议,您可以使用所有流行的地图应用程序使用的相同拼贴坐标系。根据您的需要选择任何缩放级别,并且不要忘记使用前导零填充。
Regarding RangeQuery, it's slower than ConstantScoreRangeQuery and limits the range of values.
关于RangeQuery,它比ConstantScoreRangeQuery慢,并且限制了值的范围。
Regarding the city-state problem, we can only speculate. But the first things to check are that the indexed terms and the parsed query are what you expect them to be.
关于城邦问题,我们只能推测。但首先要检查的是索引条款和解析后的查询是你期望的。
#2
I think the best way is to convert/normalize the coordinates as suggested in the previous post. This article does exactly this. It's actually quite nice object orientated code.
我认为最好的方法是按照上一篇文章的建议转换/标准化坐标。本文就是这样做的。它实际上是非常好的面向对象的代码。
Regarding your second problem. I would assume you have some sort of Analyzer problem. Are you using the same Analyzer for indexing and querying? Which tokenizers do you use?
关于你的第二个问题。我会假设你有某种分析器问题。您使用相同的Analyzer进行索引和查询吗?你使用哪些标记器?
I recommend to use Luke to inspect your generated index to see what tokens are actually searchable.
我建议使用Luke检查生成的索引,以查看实际可搜索的令牌。
--Hardy
#3
One option here is to convert the coordinates into a system that doesn't have negative numbers. For example, I've had a similar problem for a google maps webapp for the UK, and I stored UK Easting/Northings (which range from 0 to 7 digits) fields in Lucene alongside the lat/long values. By formatting these eastings/northings with left-padded zeroes, I could do lucene range queries.
这里的一个选择是将坐标转换为没有负数的系统。例如,我对英国的谷歌地图webapp有类似的问题,我将Lucene的UK Easting / Northings(范围从0到7位)字段与lat / long值一起存储。通过使用左边填充的零来格式化这些eastings / northings,我可以进行lucene范围查询。
Is there a similar coordinate system for the US?
美国是否有类似的坐标系?