两个坐标之间的距离,我如何简化这个和/或使用不同的技术?

时间:2021-03-03 15:22:31

I need to write a query which allows me to find all locations within a range (Miles) from a provided location.

我需要编写一个查询,该查询允许我找到距离所提供位置范围(英里)内的所有位置。

The table is like this:

桌子是这样的:

id  |  name  |  lat  |  lng 

So I have been doing research and found: this my sql presentation

所以我一直在做研究,发现:这是我的sql表示

I have tested it on a table with around 100 rows and will have plenty more! - Must be scalable.

我已经在一个大约有100行的表上测试过它,并且会有更多的行!——必须是可伸缩的。

I tried something more simple like this first:

我首先尝试了一些更简单的东西:

//just some test data this would be required by user input    
set @orig_lat=55.857807; set @orig_lng=-4.242511; set @dist=10;

SELECT *, 3956 * 2 * ASIN(
          SQRT( POWER(SIN((orig.lat - abs(dest.lat)) * pi()/180 / 2), 2) 
              + COS(orig.lat * pi()/180 ) * COS(abs(dest.lat) * pi()/180)  
              * POWER(SIN((orig.lng - dest.lng) * pi()/180 / 2), 2) )) 
          AS distance
  FROM locations dest, locations orig
 WHERE orig.id = '1'
HAVING distance < 1
 ORDER BY distance;

This returned rows in around 50ms which is pretty good! However this would slow down dramatically as the rows increase.

这返回了大约50ms的行,这很好!然而,随着行数的增加,这一速度将显著放缓。

EXPLAIN shows it's only using the PRIMARY key which is obvious.

EXPLAIN表示它只使用主键,这很明显。


Then after reading the article linked above. I tried something like this:

然后读完上面链接的文章。我试过如下方法:

// defining variables - this when made into a stored procedure will call
// the values with a SELECT query.
set @mylon = -4.242511;
set @mylat = 55.857807;
set @dist = 0.5;

-- calculate lon and lat for the rectangle:
set @lon1 = @mylon-@dist/abs(cos(radians(@mylat))*69);
set @lon2 = @mylon+@dist/abs(cos(radians(@mylat))*69);
set @lat1 = @mylat-(@dist/69); 
set @lat2 = @mylat+(@dist/69);

-- run the query:

SELECT *, 3956 * 2 * ASIN(
          SQRT( POWER(SIN((@mylat - abs(dest.lat)) * pi()/180 / 2) ,2)
              + COS(@mylat * pi()/180 ) * COS(abs(dest.lat) * pi()/180)
              * POWER(SIN((@mylon - dest.lng) * pi()/180 / 2), 2) ))
          AS distance
  FROM locations dest
 WHERE dest.lng BETWEEN @lon1 AND @lon2
   AND dest.lat BETWEEN @lat1 AND @lat2
HAVING distance < @dist
 ORDER BY distance;

The time of this query is around 240ms, this is not too bad, but is slower than the last. But I can imagine at much higher number of rows this would work out faster. However anEXPLAIN shows the possible keys as lat,lng or PRIMARY and used PRIMARY.

这个查询的时间大约是240ms,这不算太糟糕,但是比上一个要慢。但是我可以想象在更高的行数下,这个运算速度会更快。然而,anEXPLAIN显示了可能的密钥,如lat、lng或PRIMARY,以及使用PRIMARY。

How can I do this better???

我怎样才能做得更好?

I know I could store the lat lng as a POINT(); but I also haven't found too much documentation on this which shows if it's faster or accurate?

我知道我可以把lat液化天然气存储为一个点();但是我也没有找到太多关于这个的文档说明它是快还是准确?

Any other ideas would be happily accepted!

任何其他的想法都会被欣然接受!

Thanks very much!

非常感谢!

-Stefan

stefan


UPDATE:

更新:

As Jonathan Leffler pointed out I had made a few mistakes which I hadn't noticed:

乔纳森·莱弗勒指出,我犯了一些我没有注意到的错误:

I had only put abs() on one of the lat values. I was using an id search in the WHERE clause in the second one as well, when there was no need. In the first query was purely experimental the second one is more likely to hit production.

我只在其中一个lat值上加了abs()。我在第二个WHERE子句中也使用了id搜索,当时没有必要这么做。在第一个查询纯粹是实验性的,第二个查询更有可能影响生产。

After these changes EXPLAIN shows the key is now using lng column and average time to respond around 180ms now which is an improvement.

在这些变化之后,EXPLAIN显示现在的关键是使用液化天然气柱和平均响应时间在180ms左右,这是一个改进。

5 个解决方案

#1


2  

Any other ideas would be happily accepted!

任何其他的想法都会被欣然接受!

If you want speed (and simplicity) you'll want some decent geospatial support from your database. This introduces geospatial datatypes, geospatial indexes and (a lot of) functions for processing / building / analyzing geospatial data.

如果您想要速度(和简单性),您需要数据库提供一些合适的地理空间支持。这将引入地理空间数据类型、地理空间索引和(许多)用于处理/构建/分析地理空间数据的函数。

MySQL implements a part of the OpenGIS specifications although it is / was (last time I checked it was) very very rough around the edges / premature (not useful for any real work).

MySQL实现了OpenGIS规范的一部分,尽管它是/ was(上次我检查它是)非常粗糙/不成熟(对任何实际工作都没有用处)。

PostGis on PostgreSql would make this trivially easy and readable:

PostgreSql上的PostGis将使这个非常容易读懂:

(this finds all points from tableb which are closer then 1000 meters from point a in tablea with id 123)

(这是从表b中找到的所有点它们比表a中id为123的点离表a近1000米)

select 
    myvalue
from 
    tablea, tableb
where 
    st_dwithin(tablea.the_geom, tableb.the_geom, 1000)
and
    tablea.id = 123

#2


2  

The first query ignores the parameters you set - using 1 instead of @dist for the distance, and using the table alias orig instead of the parameters @orig_lat and @orig_lon.

第一个查询忽略您设置的参数——使用1代替@dist表示距离,使用表别名orig代替参数@orig_lat和@orig_lon。

You then have the query doing a Cartesian product between the table and itself, which is seldom a good idea if you can avoid it. You get away with it because of the filter condition orig.id = 1, which means that there's only one row from orig joined with each of the rows in dest (including the point with dest.id = 1; you should probably have a condition AND orig.id != dest.id). You also have a HAVING clause but no GROUP BY clause, which is indicative of problems. The HAVING clause is not relating any aggregates, but a HAVING clause is (primarily) for comparing aggregate values.

然后让查询在表和本身之间执行笛卡尔积,如果可以避免这种情况,这通常不是一个好主意。你因为过滤条件而侥幸过关。id = 1,这意味着从orig中只有一行与每个行连接在一起(包括指向dest.id = 1的点;你应该有个条件和条件。id ! = dest.id)。你也有一个有条款但没有按条款,这是问题的指示。HAVING子句不涉及任何集合,但有一个HAVING子句(主要)用于比较聚合值。

Unless my memory is failing me, COS(ABS(x)) === COS(x), so you might be able to simplify things by dropping the ABS(). Failing that, it is not clear why one latitude needs the ABS and the other does not - symmetry is crucial in matters of spherical trigonometry.

除非我的记忆出错了,否则COS(ab (x)) == COS(x)你可以通过去掉ABS()来简化。如果做不到这一点,就不清楚为什么一个纬度需要ABS,而另一个不需要——对称在球面三角函数中至关重要。

You have a dose of the magic numbers - the value 69 is presumably number of miles in a degree (of longitude, at the equator), and 3956 is the radius of the earth.

你有一个神奇的数字——69大概是一个度数(经度,赤道)的英里数,3956是地球的半径。

I'm suspicious of the box calculated if the given position is close to a pole. In the extreme case, you might need to allow any longitude at all.

如果给定的位置接近极点,我就会怀疑这个盒子的计算结果。在极端情况下,您可能需要允许任何经度。

The condition dest.id = 1 in the second query is odd; I believe it should be omitted, but its presence should speed things up, because only one row matches that condition. So the extra time taken is puzzling. But using the primary key index is appropriate as written.

在第二个查询中id = 1是奇数;我认为它应该被省略,但是它的存在应该加快速度,因为只有一行符合这个条件。所以额外的时间令人费解。但是使用主键索引是合适的。

You should move the condition in the HAVING clause into the WHERE clause.

你应该把have子句中的条件移到WHERE子句中。

But I'm not sure this is really helping...

但我不确定这是否真的有用……

#3


1  

The NGS Online Inverse Geodesic Calculator is the traditional reference means to calculate the distance between any two locations on the earth ellipsoid:

NGS在线反测地线计算器是计算地球椭球体上任意两个位置之间距离的传统参考方法:

http://www.ngs.noaa.gov/cgi-bin/Inv_Fwd/inverse2.prl

http://www.ngs.noaa.gov/cgi-bin/Inv_Fwd/inverse2.prl

But above calculator is still problematic. Especially between two near-antipodal locations, the computed distance can show an error of some tens of kilometres !!! The origin of the numeric trouble was identified long time ago by Thaddeus Vincenty (page 92):

但是上面的计算器仍然有问题。特别是在两个近对映点之间,计算出的距离可以显示出几十公里的误差!!数字问题的起源很久以前由Thaddeus Vincenty(第92页)指出:

http://www.ngs.noaa.gov/PUBS_LIB/inverse.pdf

http://www.ngs.noaa.gov/PUBS_LIB/inverse.pdf

In any case, it is preferrable to use the reliable and very accurate online calculator by Charles Karney:

无论如何,最好使用Charles Karney的可靠和非常准确的在线计算器:

http://geographiclib.sourceforge.net/cgi-bin/Geod

http://geographiclib.sourceforge.net/cgi-bin/Geod

#4


0  

Some thoughts on improving performance. It wouldn't simplify things from a maintainability standpoint (makes things more complex), but it could help with scalability.

改进性能的一些想法。它不会从可维护性的角度简化事情(使事情变得更复杂),但是它可以帮助实现可扩展性。

  1. Since you know the radius, you can add conditions for the bounding box, which may allow the db to optimize the query to eliminate some rows without having to do the trig calcs.

    由于您知道了半径,您可以为边界框添加条件,这可能允许db优化查询,以消除一些行,而不必进行三角计算。

  2. You could pre-calculate some of the trig values of the lat/lon of stored locations and store them in the table. This would shift some of the performance cost when inserting the record, but if queries outnumber inserts, this would be good. See this answer for an idea of this approach:

    您可以预先计算存储位置的lat/lon的一些三角值,并将它们存储在表中。这将在插入记录时转移一些性能成本,但是如果查询超过插入,这将是很好的。看看这个方法的答案:

    Query to get records based on Radius in SQLite?

    查询以获取SQLite中基于Radius的记录吗?

  3. You could look at something like geohashing.

    你可以看看地理哈希。

When used in a database, the structure of geohashed data has two advantages. ,,, Second, this index structure can be used for a quick-and-dirty proximity search - the closest points are often among the closest geohashes.

在数据库中使用时,geohash数据的结构有两个优点。其次,这种索引结构可以用于快速而又不太干净的接近搜索——最近的点通常位于最近的地理散列中。

You could search SO for some ideas on how to implement: https://*.com/search?q=geohash

您可以在上面搜索一些关于如何实现的想法:https://*.com/search?

#5


0  

If you're only interested in rather small distances, you can approximate the geographical grid by a rectangular grid.

如果你只对很小的距离感兴趣,你可以用一个矩形网格来近似地理网格。

SELECT *, SQRT(POWER(RADIANS(@mylat - dest.lat), 2) +
               POWER(RADIANS(@mylon - dst.lng)*COS(RADIANS(@mylat)), 2)
              )*@radiusOfEarth AS approximateDistance
…

You could make this even more efficient by storing radians instead of (or in addition to) degrees in your database. If your queries may cross the 180° meridian, some extra care would be neccessary there, but many applications don't have to deal with those locations. You could also try to change POWER(x) to x*x, which might get computed faster.

您可以通过存储弧度来提高效率,而不是在数据库中存储度数。如果你的查询可能跨越180°子午线,一些额外的护理是必要的,但是很多应用程序没有处理这些位置。您还可以尝试将幂(x)改为x*x,这样计算起来可能会更快。

#1


2  

Any other ideas would be happily accepted!

任何其他的想法都会被欣然接受!

If you want speed (and simplicity) you'll want some decent geospatial support from your database. This introduces geospatial datatypes, geospatial indexes and (a lot of) functions for processing / building / analyzing geospatial data.

如果您想要速度(和简单性),您需要数据库提供一些合适的地理空间支持。这将引入地理空间数据类型、地理空间索引和(许多)用于处理/构建/分析地理空间数据的函数。

MySQL implements a part of the OpenGIS specifications although it is / was (last time I checked it was) very very rough around the edges / premature (not useful for any real work).

MySQL实现了OpenGIS规范的一部分,尽管它是/ was(上次我检查它是)非常粗糙/不成熟(对任何实际工作都没有用处)。

PostGis on PostgreSql would make this trivially easy and readable:

PostgreSql上的PostGis将使这个非常容易读懂:

(this finds all points from tableb which are closer then 1000 meters from point a in tablea with id 123)

(这是从表b中找到的所有点它们比表a中id为123的点离表a近1000米)

select 
    myvalue
from 
    tablea, tableb
where 
    st_dwithin(tablea.the_geom, tableb.the_geom, 1000)
and
    tablea.id = 123

#2


2  

The first query ignores the parameters you set - using 1 instead of @dist for the distance, and using the table alias orig instead of the parameters @orig_lat and @orig_lon.

第一个查询忽略您设置的参数——使用1代替@dist表示距离,使用表别名orig代替参数@orig_lat和@orig_lon。

You then have the query doing a Cartesian product between the table and itself, which is seldom a good idea if you can avoid it. You get away with it because of the filter condition orig.id = 1, which means that there's only one row from orig joined with each of the rows in dest (including the point with dest.id = 1; you should probably have a condition AND orig.id != dest.id). You also have a HAVING clause but no GROUP BY clause, which is indicative of problems. The HAVING clause is not relating any aggregates, but a HAVING clause is (primarily) for comparing aggregate values.

然后让查询在表和本身之间执行笛卡尔积,如果可以避免这种情况,这通常不是一个好主意。你因为过滤条件而侥幸过关。id = 1,这意味着从orig中只有一行与每个行连接在一起(包括指向dest.id = 1的点;你应该有个条件和条件。id ! = dest.id)。你也有一个有条款但没有按条款,这是问题的指示。HAVING子句不涉及任何集合,但有一个HAVING子句(主要)用于比较聚合值。

Unless my memory is failing me, COS(ABS(x)) === COS(x), so you might be able to simplify things by dropping the ABS(). Failing that, it is not clear why one latitude needs the ABS and the other does not - symmetry is crucial in matters of spherical trigonometry.

除非我的记忆出错了,否则COS(ab (x)) == COS(x)你可以通过去掉ABS()来简化。如果做不到这一点,就不清楚为什么一个纬度需要ABS,而另一个不需要——对称在球面三角函数中至关重要。

You have a dose of the magic numbers - the value 69 is presumably number of miles in a degree (of longitude, at the equator), and 3956 is the radius of the earth.

你有一个神奇的数字——69大概是一个度数(经度,赤道)的英里数,3956是地球的半径。

I'm suspicious of the box calculated if the given position is close to a pole. In the extreme case, you might need to allow any longitude at all.

如果给定的位置接近极点,我就会怀疑这个盒子的计算结果。在极端情况下,您可能需要允许任何经度。

The condition dest.id = 1 in the second query is odd; I believe it should be omitted, but its presence should speed things up, because only one row matches that condition. So the extra time taken is puzzling. But using the primary key index is appropriate as written.

在第二个查询中id = 1是奇数;我认为它应该被省略,但是它的存在应该加快速度,因为只有一行符合这个条件。所以额外的时间令人费解。但是使用主键索引是合适的。

You should move the condition in the HAVING clause into the WHERE clause.

你应该把have子句中的条件移到WHERE子句中。

But I'm not sure this is really helping...

但我不确定这是否真的有用……

#3


1  

The NGS Online Inverse Geodesic Calculator is the traditional reference means to calculate the distance between any two locations on the earth ellipsoid:

NGS在线反测地线计算器是计算地球椭球体上任意两个位置之间距离的传统参考方法:

http://www.ngs.noaa.gov/cgi-bin/Inv_Fwd/inverse2.prl

http://www.ngs.noaa.gov/cgi-bin/Inv_Fwd/inverse2.prl

But above calculator is still problematic. Especially between two near-antipodal locations, the computed distance can show an error of some tens of kilometres !!! The origin of the numeric trouble was identified long time ago by Thaddeus Vincenty (page 92):

但是上面的计算器仍然有问题。特别是在两个近对映点之间,计算出的距离可以显示出几十公里的误差!!数字问题的起源很久以前由Thaddeus Vincenty(第92页)指出:

http://www.ngs.noaa.gov/PUBS_LIB/inverse.pdf

http://www.ngs.noaa.gov/PUBS_LIB/inverse.pdf

In any case, it is preferrable to use the reliable and very accurate online calculator by Charles Karney:

无论如何,最好使用Charles Karney的可靠和非常准确的在线计算器:

http://geographiclib.sourceforge.net/cgi-bin/Geod

http://geographiclib.sourceforge.net/cgi-bin/Geod

#4


0  

Some thoughts on improving performance. It wouldn't simplify things from a maintainability standpoint (makes things more complex), but it could help with scalability.

改进性能的一些想法。它不会从可维护性的角度简化事情(使事情变得更复杂),但是它可以帮助实现可扩展性。

  1. Since you know the radius, you can add conditions for the bounding box, which may allow the db to optimize the query to eliminate some rows without having to do the trig calcs.

    由于您知道了半径,您可以为边界框添加条件,这可能允许db优化查询,以消除一些行,而不必进行三角计算。

  2. You could pre-calculate some of the trig values of the lat/lon of stored locations and store them in the table. This would shift some of the performance cost when inserting the record, but if queries outnumber inserts, this would be good. See this answer for an idea of this approach:

    您可以预先计算存储位置的lat/lon的一些三角值,并将它们存储在表中。这将在插入记录时转移一些性能成本,但是如果查询超过插入,这将是很好的。看看这个方法的答案:

    Query to get records based on Radius in SQLite?

    查询以获取SQLite中基于Radius的记录吗?

  3. You could look at something like geohashing.

    你可以看看地理哈希。

When used in a database, the structure of geohashed data has two advantages. ,,, Second, this index structure can be used for a quick-and-dirty proximity search - the closest points are often among the closest geohashes.

在数据库中使用时,geohash数据的结构有两个优点。其次,这种索引结构可以用于快速而又不太干净的接近搜索——最近的点通常位于最近的地理散列中。

You could search SO for some ideas on how to implement: https://*.com/search?q=geohash

您可以在上面搜索一些关于如何实现的想法:https://*.com/search?

#5


0  

If you're only interested in rather small distances, you can approximate the geographical grid by a rectangular grid.

如果你只对很小的距离感兴趣,你可以用一个矩形网格来近似地理网格。

SELECT *, SQRT(POWER(RADIANS(@mylat - dest.lat), 2) +
               POWER(RADIANS(@mylon - dst.lng)*COS(RADIANS(@mylat)), 2)
              )*@radiusOfEarth AS approximateDistance
…

You could make this even more efficient by storing radians instead of (or in addition to) degrees in your database. If your queries may cross the 180° meridian, some extra care would be neccessary there, but many applications don't have to deal with those locations. You could also try to change POWER(x) to x*x, which might get computed faster.

您可以通过存储弧度来提高效率,而不是在数据库中存储度数。如果你的查询可能跨越180°子午线,一些额外的护理是必要的,但是很多应用程序没有处理这些位置。您还可以尝试将幂(x)改为x*x,这样计算起来可能会更快。