Having a MySQL table with more than 20 millions of rows, there is some way with Hibernate to build a criteria in order to get nearest rows given a latitude and longitude?
有一个超过2000万行的MySQL表,Hibernate有一些方法可以建立一个标准,以获得给定纬度和经度的最近行?
Using Criteria
would be great because I need to use more filters (price, category, etc).
使用Criteria会很棒,因为我需要使用更多的过滤器(价格,类别等)。
Finally, it's posible get the rows ordered by distance? Or there are too much rows?
最后,它可以获得按距离排序的行吗?还是行太多了?
1 个解决方案
#1
Plan A With a large number of rows, INDEX(lat)
is a non-starter, performance-wise, even with restricting to a stripe: AND lat BETWEEN 65 AND 69
. INDEX(lat, lng)
is no better because the optimizer would not use both columns, even with AND lng BETWEEN...
计划A对于大量行,INDEX(lat)在性能方面是非启动性的,即使限制条带:AND BET BETEEEN 65和69. INDEX(lat,lng)也不是更好,因为优化器会即使使用AND Lng BETWEEN也不要同时使用这两列
Plan B Your next choice will involve lat and lng, plus a subquery. And version 5.6 would be beneficial. It's something like this (after including INDEX(lat, lng, id)
):
计划B您的下一个选择将涉及lat和lng,以及子查询。版本5.6将是有益的。它是这样的(在包括INDEX(lat,lng,id)之后):
SELECT ... FROM (
SELECT id FROM tbl
WHERE lat BETWEEN...
AND lng BETWEEN... ) x
JOIN tbl USING (id)
WHERE ...;
For various reasons, Plan B is only slightly better than Plan A.
出于各种原因,B计划仅略优于计划A.
Plan C With millions of rows, you will need my pizza parlor algorithm. This involves a Stored Procedure to repeatedly probe the table, looking for enough rows. It also involves PARTITIONing to get a crude 2D index. The link has reference code that includes filtering on things like category.
计划C有数百万行,您将需要我的披萨店算法。这涉及一个存储过程来重复探测表,寻找足够的行。它还涉及PARTITIONing以获得粗略的2D索引。该链接具有参考代码,其中包括对类别等内容进行过滤。
Plans A and B are O(sqrt(N)); Plan C is O(1). That is, for Plans A and B, if you quadruple the number of rows, you double the time taken. Plan C does not get slower as you increase N.
方案A和B是O(sqrt(N));计划C是O(1)。也就是说,对于计划A和B,如果您将行数增加四倍,则会将时间加倍。当你增加N时,计划C不会变慢。
#1
Plan A With a large number of rows, INDEX(lat)
is a non-starter, performance-wise, even with restricting to a stripe: AND lat BETWEEN 65 AND 69
. INDEX(lat, lng)
is no better because the optimizer would not use both columns, even with AND lng BETWEEN...
计划A对于大量行,INDEX(lat)在性能方面是非启动性的,即使限制条带:AND BET BETEEEN 65和69. INDEX(lat,lng)也不是更好,因为优化器会即使使用AND Lng BETWEEN也不要同时使用这两列
Plan B Your next choice will involve lat and lng, plus a subquery. And version 5.6 would be beneficial. It's something like this (after including INDEX(lat, lng, id)
):
计划B您的下一个选择将涉及lat和lng,以及子查询。版本5.6将是有益的。它是这样的(在包括INDEX(lat,lng,id)之后):
SELECT ... FROM (
SELECT id FROM tbl
WHERE lat BETWEEN...
AND lng BETWEEN... ) x
JOIN tbl USING (id)
WHERE ...;
For various reasons, Plan B is only slightly better than Plan A.
出于各种原因,B计划仅略优于计划A.
Plan C With millions of rows, you will need my pizza parlor algorithm. This involves a Stored Procedure to repeatedly probe the table, looking for enough rows. It also involves PARTITIONing to get a crude 2D index. The link has reference code that includes filtering on things like category.
计划C有数百万行,您将需要我的披萨店算法。这涉及一个存储过程来重复探测表,寻找足够的行。它还涉及PARTITIONing以获得粗略的2D索引。该链接具有参考代码,其中包括对类别等内容进行过滤。
Plans A and B are O(sqrt(N)); Plan C is O(1). That is, for Plans A and B, if you quadruple the number of rows, you double the time taken. Plan C does not get slower as you increase N.
方案A和B是O(sqrt(N));计划C是O(1)。也就是说,对于计划A和B,如果您将行数增加四倍,则会将时间加倍。当你增加N时,计划C不会变慢。