数据库/SQL:如何存储经度/纬度数据?

时间:2022-08-08 16:03:46

Performance question ...

性能问题…

I have a database of houses that have geolocation data (longitude & latitude).

我有一个拥有地理定位数据(经度和纬度)的房屋数据库。

What I want to do is find the best way to store the locational data in my MySQL (v5.0.24a) using InnoDB database-engine so that I can perform a lot of queries where I'm returning all the home records that are between x1 and x2 latitude and y1 and y2 longitude.

我想要做的是找到最好的方法来存储我的MySQL (v5.0.24a)中的位置数据,使用InnoDB数据库引擎,这样我就可以执行很多查询,在这里我返回所有在x1到x2纬度、y1和y2经度之间的home记录。

Right now, my database schema is

现在,我的数据库模式是

---------------------
Homes   
---------------------
geolat - Float (10,6)
geolng - Float (10,6)
---------------------

And my query is:

我查询的方法是:

SELECT ... 
WHERE geolat BETWEEN x1 AND x2
AND geolng BETWEEN y1 AND y2
  • Is what I described above the best way to store the latitude and longitude data in MySQL using Float (10,6) and separating out the longitude/latitude? If not, what is? There exist Float, Decimal and even Spatial as a data type.
  • 我上面描述的使用浮点(10,6)和分隔经度/纬度的MySQL存储纬度和经度数据的最佳方式是什么?如果没有,是什么?作为一种数据类型,存在浮点数、小数甚至空间数。
  • Is this the best way to perform the SQL from a performance standpoint? If not, what is?
  • 从性能的角度来看,这是执行SQL的最佳方式吗?如果没有,是什么?
  • Does using a different MySQL database-engine make sense?
  • 使用不同的MySQL数据库引擎有意义吗?

UPDATE: Still Unanswered

更新:仍然没有回答

I have 3 different answers below. One person say to use Float. One person says to use INT. One person says to use Spatial.

我有三个不同的答案。一个人说用浮动。一个人说要用INT,一个人说要用空间。

So I used MySQL "EXPLAIN" statement to measure the SQL execution speed. It appears that absolutely no difference in SQL execution (result set fetching) exist if using INT or FLOAT for the longitude and latitude data type..

因此,我使用MySQL“EXPLAIN”语句来度量SQL执行速度。如果在经纬度数据类型中使用INT或FLOAT,那么在SQL执行(结果集抓取)中完全没有差异。

It also appears that using the "BETWEEN" statement is SIGNIFICANTLY faster than using the ">" or "<" SQL statements. It's nearly 3x faster to use "BETWEEN" than to use the ">" and "<" statement.

使用“BETWEEN”语句似乎比使用“>”或“<”SQL语句要快得多。使用“BETWEEN”要比使用“>”和“<”语句快3倍。

With that being said, I still am unceratin on what the performance impact would be if using Spatial since it's unclear to me if it's supported with my version of MySQL running (v5.0.24) ... as well as how I enable it if supported.

话虽如此,我仍然不清楚使用Spatial是否支持MySQL版本(v5.0.24)会对性能产生什么影响……以及如何支持它。

Any help would be greatly appreacited

任何帮助都将是极大的感激

9 个解决方案

#1


29  

float(10,6) is just fine.

浮动(10,6)就好了。

Any other convoluted storage schemes will require more translation in and out, and floating-point math is plenty fast.

任何其他复杂的存储方案都需要更多的输入和输出转换,浮点运算速度非常快。

#2


11  

I know you're asking about MySQL, but if spatial data is important to your business, you might want to reconsider. PostgreSQL + PostGIS are also free software, and they have a great reputation for managing spatial and geographic data efficiently. Many people use PostgreSQL only because of PostGIS.

我知道你问的是MySQL,但是如果空间数据对你的业务很重要,你可能需要重新考虑一下。PostgreSQL + PostGIS也是免费软件,它们在有效管理空间和地理数据方面享有很高的声誉。许多人使用PostgreSQL仅仅是因为PostGIS。

I don't know much about the MySQL spatial system though, so perhaps it works well enough for your use-case.

我不太了解MySQL空间系统,所以它可能对您的用例足够有效。

#3


6  

The problem with using any other data type than "spatial" here is that your kind of "rectangular selection" can (usually, this depends on how bright your DBMS is - and MySQL certainly isn't generally the brightest) only be optimised in one single dimension.

使用“空间”之外的任何其他数据类型的问题是,您的“矩形选择”只能在一个维度上进行优化(通常,这取决于您的DBMS的亮度——而且MySQL通常不是最亮的)。

The system can pick either the longitude index or the latitude index, and use that to reduce the set of rows to inspect. But after it has done that, there is a choice of : (a) fetching all found rows and scanning over those and test for the "other dimension", or (b) doing the similar process on the "other dimension" and then afterwards matching those two result sets to see which rows appear in both. This latter option may not be implemented as such in your particular DBMS engine.

系统可以选择经度索引或纬度索引,并使用该索引来减少要检查的行集。但完成后,有一个选择:(一)抓取所有发现行和扫描超过那些“其他维度”和测试,或(b)做类似的过程中“其他维度”然后匹配这两个既行出现在结果集。后一个选项可能不能在特定的DBMS引擎中实现。

Spatial indexes sort of do the latter "automatically", so I think it's safe to say that a spatial index will give the best performance in any case, but it may also be the case that it doesn't significantly outperform the other solutions, and that it's just not worth the bother. This depends on all sorts of things like the volume of and the distribution in your actual data etc. etc.

空间索引会自动地执行后一种操作,所以我认为空间索引在任何情况下都能提供最好的性能,但也有可能它不会显著地优于其他解决方案,而且不值得麻烦。这取决于各种东西,比如实际数据中的体积和分布等等。

It is certainly true that float (tree) indexes are by necessity slower than integer indexes, because of the longer time it usually takes to execute '>' on floats than it does on integers. But I would be surprised if this effect were actually noticeable.

当然,float(树)索引比整数索引更慢,因为在浮点上执行“>”通常比在整数上执行“>”要长。但如果这种影响真的是显而易见的,我会感到惊讶。

#4


5  

I would store it as integers (int, 4-bytes) represented in 1/1,000,000th degrees. That would give you a resolution of few inches.

我将它存储为整数(int, 4字节),以1/ 1000000度表示。这样就能得到几英寸的分辨率。

I don't think there is any intrinsic spatial datatype in MySQL.

我不认为MySQL中有任何固有的空间数据类型。

#5


4  

Float (10,6)

浮动(10,6)

Where is latitude or longitude 5555.123456?

纬度或经度5555.123456在哪里?

Don't you mean Float(9,6) instead?

你不是说浮动(9,6)吗?

#6


4  

Google uses float(10,6) in their "Store locator" example. That's enough for me to go with that.

谷歌在其“存储定位器”示例中使用浮点数(10,6)。这对我来说就足够了。

https://*.com/a/5994082/1094271

https://*.com/a/5994082/1094271

Also, starting MySQL 5.6.x, spatial extensions support is much better and comparable to PostGIS in features and performance.

同时,MySQL 5.6开始。空间扩展支持在特性和性能上比PostGIS要好得多。

#7


1  

I found this answer useful, maybe it can help you too?: Problem Storing Latitude and Longitude values in MySQL database

我发现这个答案很有用,也许它也能帮助你?:MySQL数据库中存储纬度和经度值的问题。

#8


1  

I have the exact same schema (float(10,6)) and query (selecting inside a rectangle) and I found that switching the db engine from innoDB to myisam doubled the speed for a "point in rectangle look-up" in a table with 780,000 records.

我有完全相同的模式(float(10,6))和查询(在矩形中选择),我发现将db引擎从innoDB切换到myisam的速度是在一个有78万条记录的表中“矩形查找点”速度的两倍。

Additionally, I converted all lng/lat values to cartesian integers (x,y) and created a two-column index on the x,y and my speed went from ~27 ms to 1.3 ms for the same look-up.

此外,我将所有lng/lat值转换为笛卡尔整数(x,y),并在x,y上创建了一个两列索引,同样的查找,我的速度从~27 ms增加到1.3 ms。

#9


0  

It really depends on how you are using the data. But in a gross over-simplification of the facts, decimal is faster but less accurate in aproximations. More info here:

这取决于你如何使用这些数据。但在对事实的过分简化中,十进制运算速度更快,但在近似值中却不那么准确。更多信息:

http://msdn.microsoft.com/en-us/library/aa223970(SQL.80).aspx

http://msdn.microsoft.com/en-us/library/aa223970(SQL.80). aspx

Also, The standard for GPS coordinates is specified in ISO 6709:

此外,全球定位系统坐标的标准载于iso6709:

http://en.wikipedia.org/wiki/ISO_6709

http://en.wikipedia.org/wiki/ISO_6709

#1


29  

float(10,6) is just fine.

浮动(10,6)就好了。

Any other convoluted storage schemes will require more translation in and out, and floating-point math is plenty fast.

任何其他复杂的存储方案都需要更多的输入和输出转换,浮点运算速度非常快。

#2


11  

I know you're asking about MySQL, but if spatial data is important to your business, you might want to reconsider. PostgreSQL + PostGIS are also free software, and they have a great reputation for managing spatial and geographic data efficiently. Many people use PostgreSQL only because of PostGIS.

我知道你问的是MySQL,但是如果空间数据对你的业务很重要,你可能需要重新考虑一下。PostgreSQL + PostGIS也是免费软件,它们在有效管理空间和地理数据方面享有很高的声誉。许多人使用PostgreSQL仅仅是因为PostGIS。

I don't know much about the MySQL spatial system though, so perhaps it works well enough for your use-case.

我不太了解MySQL空间系统,所以它可能对您的用例足够有效。

#3


6  

The problem with using any other data type than "spatial" here is that your kind of "rectangular selection" can (usually, this depends on how bright your DBMS is - and MySQL certainly isn't generally the brightest) only be optimised in one single dimension.

使用“空间”之外的任何其他数据类型的问题是,您的“矩形选择”只能在一个维度上进行优化(通常,这取决于您的DBMS的亮度——而且MySQL通常不是最亮的)。

The system can pick either the longitude index or the latitude index, and use that to reduce the set of rows to inspect. But after it has done that, there is a choice of : (a) fetching all found rows and scanning over those and test for the "other dimension", or (b) doing the similar process on the "other dimension" and then afterwards matching those two result sets to see which rows appear in both. This latter option may not be implemented as such in your particular DBMS engine.

系统可以选择经度索引或纬度索引,并使用该索引来减少要检查的行集。但完成后,有一个选择:(一)抓取所有发现行和扫描超过那些“其他维度”和测试,或(b)做类似的过程中“其他维度”然后匹配这两个既行出现在结果集。后一个选项可能不能在特定的DBMS引擎中实现。

Spatial indexes sort of do the latter "automatically", so I think it's safe to say that a spatial index will give the best performance in any case, but it may also be the case that it doesn't significantly outperform the other solutions, and that it's just not worth the bother. This depends on all sorts of things like the volume of and the distribution in your actual data etc. etc.

空间索引会自动地执行后一种操作,所以我认为空间索引在任何情况下都能提供最好的性能,但也有可能它不会显著地优于其他解决方案,而且不值得麻烦。这取决于各种东西,比如实际数据中的体积和分布等等。

It is certainly true that float (tree) indexes are by necessity slower than integer indexes, because of the longer time it usually takes to execute '>' on floats than it does on integers. But I would be surprised if this effect were actually noticeable.

当然,float(树)索引比整数索引更慢,因为在浮点上执行“>”通常比在整数上执行“>”要长。但如果这种影响真的是显而易见的,我会感到惊讶。

#4


5  

I would store it as integers (int, 4-bytes) represented in 1/1,000,000th degrees. That would give you a resolution of few inches.

我将它存储为整数(int, 4字节),以1/ 1000000度表示。这样就能得到几英寸的分辨率。

I don't think there is any intrinsic spatial datatype in MySQL.

我不认为MySQL中有任何固有的空间数据类型。

#5


4  

Float (10,6)

浮动(10,6)

Where is latitude or longitude 5555.123456?

纬度或经度5555.123456在哪里?

Don't you mean Float(9,6) instead?

你不是说浮动(9,6)吗?

#6


4  

Google uses float(10,6) in their "Store locator" example. That's enough for me to go with that.

谷歌在其“存储定位器”示例中使用浮点数(10,6)。这对我来说就足够了。

https://*.com/a/5994082/1094271

https://*.com/a/5994082/1094271

Also, starting MySQL 5.6.x, spatial extensions support is much better and comparable to PostGIS in features and performance.

同时,MySQL 5.6开始。空间扩展支持在特性和性能上比PostGIS要好得多。

#7


1  

I found this answer useful, maybe it can help you too?: Problem Storing Latitude and Longitude values in MySQL database

我发现这个答案很有用,也许它也能帮助你?:MySQL数据库中存储纬度和经度值的问题。

#8


1  

I have the exact same schema (float(10,6)) and query (selecting inside a rectangle) and I found that switching the db engine from innoDB to myisam doubled the speed for a "point in rectangle look-up" in a table with 780,000 records.

我有完全相同的模式(float(10,6))和查询(在矩形中选择),我发现将db引擎从innoDB切换到myisam的速度是在一个有78万条记录的表中“矩形查找点”速度的两倍。

Additionally, I converted all lng/lat values to cartesian integers (x,y) and created a two-column index on the x,y and my speed went from ~27 ms to 1.3 ms for the same look-up.

此外,我将所有lng/lat值转换为笛卡尔整数(x,y),并在x,y上创建了一个两列索引,同样的查找,我的速度从~27 ms增加到1.3 ms。

#9


0  

It really depends on how you are using the data. But in a gross over-simplification of the facts, decimal is faster but less accurate in aproximations. More info here:

这取决于你如何使用这些数据。但在对事实的过分简化中,十进制运算速度更快,但在近似值中却不那么准确。更多信息:

http://msdn.microsoft.com/en-us/library/aa223970(SQL.80).aspx

http://msdn.microsoft.com/en-us/library/aa223970(SQL.80). aspx

Also, The standard for GPS coordinates is specified in ISO 6709:

此外,全球定位系统坐标的标准载于iso6709:

http://en.wikipedia.org/wiki/ISO_6709

http://en.wikipedia.org/wiki/ISO_6709