Background:
I need to store the following data in a database:
我需要将以下数据存储在数据库中:
-
osm nodes with tags;
带标签的osm节点;
-
osm edges with weights (that is an edge between two nodes extracted from 'way' from an .osm file).
具有权重的osm边(即从.osm文件中的'way'提取的两个节点之间的边)。
Nodes that form edges, which are in the same 'way' sets should have the same tags as those ways, i.e. every node in a 'way' set of nodes which is a highway should have a 'highway' tag.
形成边缘的节点以相同的“方式”集合应该具有与这些方式相同的标签,即作为高速公路的“方式”节点集中的每个节点应该具有“高速公路”标签。
I need this structure to easily generate a graph based on various filters, e.g. a graph consisting only of nodes and edges which are highways, or a 'foot paths' graph, etc.
我需要这种结构,以便根据各种过滤器轻松生成图形,例如:仅由高速公路的节点和边或“脚路径”图等组成的图形。
Problem:
I have not heard about the spatial index before, so I just parsed an .osm file into a MySQL database:
我以前没有听说过空间索引,所以我只是将.osm文件解析成MySQL数据库:
- all nodes to a 'nodes' table (with respective coordinates columns) - OK, about 9,000,000 of rows in my case:
所有节点到'节点'表(具有相应的坐标列) - 好吧,在我的情况下大约9,000,000行:
(INSERT INTO nodes VALUES [pseudocode]node_id,lat,lon[/pseudocode]
;
(INSERT INTO节点VALUES [伪代码] node_id,lat,lon [/ pseudocode];
- all ways to an 'edges' table (usually one way creates a few edges) - OK, about 9,000,000 of rows as well:
所有到'edge'表的方法(通常单向创建一些边) - 好的,大约9,000,000行:
(INSERT INTO edges VALUES [pseudocode]edge_id,from_node_id,to_node_id[/pseudocode]
;
(INSERT INTO edge VALUES [pseudocode] edge_id,from_node_id,to_node_id [/ pseudocode];
- add tags to nodes, calculate weights for edges - Problem:
向标记添加标记,计算边的权重 - 问题:
Here is the problematic php script:
这是有问题的PHP脚本:
$query = mysql_query('SELECT * FROM edges');
$i=0;
while ($res = mysql_fetch_object($query)) {
$i++;
echo "$i\n";
$node1 = mysql_query('SELECT * FROM nodes WHERE id='.$res->from);
$node1 = mysql_fetch_object($node1);
$tag1 = $node1->tags;
$node2 = mysql_query('SELECT * FROM nodes WHERE id='.$res->to);
$node2 = mysql_fetch_object($node2);
$tag2 = $node2->tags;
mysql_query('UPDATE nodes SET tags="'.$tag1.$res->tags.'" WHERE nodes.id='.$res->from);
mysql_query('UPDATE nodes SET tags="'.$tag2.$res->tags.'" WHERE nodes.id='.$res->to);`
Nohup shows the output for 'echo "$i\n"' each 55-60 seconds (which can take more than 17 years to finish if the size of the 'edges' table is more than 9,000,000 rows, like in my case).
Nohup显示'echo'的输出$ i \ n“每55-60秒(如果'edge'表的大小超过9,000,000行,可能需要17年才能完成,就像我的情况一样)。
Htop shows a /usr/bin/mysqld process which takes 40-60% of CPU.
Htop显示了一个/ usr / bin / mysqld进程占用了40-60%的CPU。
The same problem exists for the script which tries to calculate the weight (the distance) of an edge (select all edges, take an edge, then select two nodes of this edge from 'nodes' table, then calculate the distance, then update the edges table).
试图计算边的权重(距离)的脚本存在同样的问题(选择所有边,取边,然后从'节'表中选择该边的两个节点,然后计算距离,然后更新边桌)。
Question:
How can I make this SQL updates faster? Should I tweak any of MySQL config settings? Or should I use PostgreSQL with PostGIS extension? Should I use another structure for my data? Or should I somehow utilize the spatial index?
如何更快地进行SQL更新?我应该调整任何MySQL配置设置吗?或者我应该将PostgreSQL与PostGIS扩展一起使用?我应该为我的数据使用其他结构吗?或者我应该以某种方式利用空间索引?
2 个解决方案
#1
3
If I understand you right there is two things to discuss.
如果我理解你,有两件事要讨论。
First, your idea of putting the highway-tag on the start and stop nodes. A node can have more than one edge connected, where to put the tag from the second edge? Or third or fourth if it is a crossing? The reason the highway tag is putted in the edges table in the first place is that from a relational point of view that is where it belongs.
首先,您将高速公路标签放在启动和停止节点上的想法。一个节点可以连接多个边缘,从第二个边缘放置标签?如果是十字架,还是第三或第四?高速公路标签首先放在边缘表中的原因是从它所属的关系角度来看。
Second, to get the whole table and process it outside the database is not the right way. What a relational database is really good at is taking care of this whole process.
其次,获取整个表并在数据库外处理它不是正确的方法。关系数据库真正擅长的是处理整个过程。
I have not worked with mysql, and I fully agree that you will probably get a lot more fun if migrating to PostGIS since PostGIS has a lot better spatial capabilities (even if you don't need any spatial capabilities for this particular task) from what I have heard.
我没有使用过mysql,我完全同意,如果迁移到PostGIS,你可能会获得更多的乐趣,因为PostGIS具有更好的空间功能(即使你不需要任何空间功能来完成这个特定任务)我听说。
So if we ignore the first problem and just for showing the concept say that there is only two edges connected to one node and that each node has two tag-fields. tag1 and tag2. Then it could look something like this in PostGIS:
因此,如果我们忽略第一个问题,只是为了显示概念,说只有两个边连接到一个节点,并且每个节点有两个标记字段。 tag1和tag2。然后在PostGIS中看起来像这样:
UPDATE nodes set tag1=edges.tags from edges where nodes.id=edges.from;
UPDATE nodes set tag2=edges.tags from edges where nodes.id=edges.to;
If you disable the indexes that should be very fast.
如果禁用应该非常快的索引。
Again, if I have understood you right.
再说一遍,如果我理解你的话。
#2
3
PostgreSQL
Openstreetmap itself uses PostgreSQL, so I guess that's recommended.
See: http://wiki.openstreetmap.org/wiki/PostgreSQL
PostgreSQL Openstreetmap本身使用PostgreSQL,所以我猜这是推荐的。请参阅:http://wiki.openstreetmap.org/wiki/PostgreSQL
You can see OSM's database schema at: http://wiki.openstreetmap.org/wiki/Database_Schema
您可以在以下位置查看OSM的数据库架构:http://wiki.openstreetmap.org/wiki/Database_Schema
So you can use the same fields, fieldtypes and indexes that OSM uses for maximum compatibility.
因此,您可以使用OSM用于最大兼容性的相同字段,字段类型和索引。
MySQL
If you want to import .osm files into a MySQL database, have a look at:
http://wiki.openstreetmap.org/wiki/OsmDB.pm
Here you will find perl code that will create MySQL tables, parse a OSM file and import it into your MySQL database.
MySQL如果要将.osm文件导入MySQL数据库,请查看:http://wiki.openstreetmap.org/wiki/OsmDB.pm在这里,您将找到将创建MySQL表的perl代码,解析OSM文件并将其导入MySQL数据库。
Making it faster
If you are updating in bulk, you don't need to update the indexes after every update.
You can just disable the indexes, do all your updates and re-enable the index.
I'm guessing that should be a whole lot faster.
提高速度如果要批量更新,则无需在每次更新后更新索引。您可以只禁用索引,执行所有更新并重新启用索引。我猜这应该快得多。
Good luck
#1
3
If I understand you right there is two things to discuss.
如果我理解你,有两件事要讨论。
First, your idea of putting the highway-tag on the start and stop nodes. A node can have more than one edge connected, where to put the tag from the second edge? Or third or fourth if it is a crossing? The reason the highway tag is putted in the edges table in the first place is that from a relational point of view that is where it belongs.
首先,您将高速公路标签放在启动和停止节点上的想法。一个节点可以连接多个边缘,从第二个边缘放置标签?如果是十字架,还是第三或第四?高速公路标签首先放在边缘表中的原因是从它所属的关系角度来看。
Second, to get the whole table and process it outside the database is not the right way. What a relational database is really good at is taking care of this whole process.
其次,获取整个表并在数据库外处理它不是正确的方法。关系数据库真正擅长的是处理整个过程。
I have not worked with mysql, and I fully agree that you will probably get a lot more fun if migrating to PostGIS since PostGIS has a lot better spatial capabilities (even if you don't need any spatial capabilities for this particular task) from what I have heard.
我没有使用过mysql,我完全同意,如果迁移到PostGIS,你可能会获得更多的乐趣,因为PostGIS具有更好的空间功能(即使你不需要任何空间功能来完成这个特定任务)我听说。
So if we ignore the first problem and just for showing the concept say that there is only two edges connected to one node and that each node has two tag-fields. tag1 and tag2. Then it could look something like this in PostGIS:
因此,如果我们忽略第一个问题,只是为了显示概念,说只有两个边连接到一个节点,并且每个节点有两个标记字段。 tag1和tag2。然后在PostGIS中看起来像这样:
UPDATE nodes set tag1=edges.tags from edges where nodes.id=edges.from;
UPDATE nodes set tag2=edges.tags from edges where nodes.id=edges.to;
If you disable the indexes that should be very fast.
如果禁用应该非常快的索引。
Again, if I have understood you right.
再说一遍,如果我理解你的话。
#2
3
PostgreSQL
Openstreetmap itself uses PostgreSQL, so I guess that's recommended.
See: http://wiki.openstreetmap.org/wiki/PostgreSQL
PostgreSQL Openstreetmap本身使用PostgreSQL,所以我猜这是推荐的。请参阅:http://wiki.openstreetmap.org/wiki/PostgreSQL
You can see OSM's database schema at: http://wiki.openstreetmap.org/wiki/Database_Schema
您可以在以下位置查看OSM的数据库架构:http://wiki.openstreetmap.org/wiki/Database_Schema
So you can use the same fields, fieldtypes and indexes that OSM uses for maximum compatibility.
因此,您可以使用OSM用于最大兼容性的相同字段,字段类型和索引。
MySQL
If you want to import .osm files into a MySQL database, have a look at:
http://wiki.openstreetmap.org/wiki/OsmDB.pm
Here you will find perl code that will create MySQL tables, parse a OSM file and import it into your MySQL database.
MySQL如果要将.osm文件导入MySQL数据库,请查看:http://wiki.openstreetmap.org/wiki/OsmDB.pm在这里,您将找到将创建MySQL表的perl代码,解析OSM文件并将其导入MySQL数据库。
Making it faster
If you are updating in bulk, you don't need to update the indexes after every update.
You can just disable the indexes, do all your updates and re-enable the index.
I'm guessing that should be a whole lot faster.
提高速度如果要批量更新,则无需在每次更新后更新索引。您可以只禁用索引,执行所有更新并重新启用索引。我猜这应该快得多。
Good luck