
时间:2022-09-16 14:21:51


I need to store the following data in a database:


  • osm nodes with tags;


  • osm edges with weights (that is an edge between two nodes extracted from 'way' from an .osm file).


Nodes that form edges, which are in the same 'way' sets should have the same tags as those ways, i.e. every node in a 'way' set of nodes which is a highway should have a 'highway' tag.


I need this structure to easily generate a graph based on various filters, e.g. a graph consisting only of nodes and edges which are highways, or a 'foot paths' graph, etc.



I have not heard about the spatial index before, so I just parsed an .osm file into a MySQL database:


  • all nodes to a 'nodes' table (with respective coordinates columns) - OK, about 9,000,000 of rows in my case:
  • 所有节点到'节点'表(具有相应的坐标列) - 好吧,在我的情况下大约9,000,000行:

(INSERT INTO nodes VALUES [pseudocode]node_id,lat,lon[/pseudocode];

(INSERT INTO节点VALUES [伪代码] node_id,lat,lon [/ pseudocode];

  • all ways to an 'edges' table (usually one way creates a few edges) - OK, about 9,000,000 of rows as well:
  • 所有到'edge'表的方法(通常单向创建一些边) - 好的,大约9,000,000行:

(INSERT INTO edges VALUES [pseudocode]edge_id,from_node_id,to_node_id[/pseudocode];

(INSERT INTO edge VALUES [pseudocode] edge_id,from_node_id,to_node_id [/ pseudocode];

  • add tags to nodes, calculate weights for edges - Problem:
  • 向标记添加标记,计算边的权重 - 问题:

Here is the problematic php script:


$query = mysql_query('SELECT * FROM edges');
while ($res = mysql_fetch_object($query))  {
echo "$i\n";
$node1 = mysql_query('SELECT * FROM nodes WHERE id='.$res->from);
$node1 = mysql_fetch_object($node1);
$tag1 = $node1->tags;
$node2 = mysql_query('SELECT * FROM nodes WHERE id='.$res->to);
$node2 = mysql_fetch_object($node2);
$tag2 = $node2->tags;

mysql_query('UPDATE nodes SET tags="'.$tag1.$res->tags.'" WHERE nodes.id='.$res->from);
mysql_query('UPDATE nodes SET tags="'.$tag2.$res->tags.'" WHERE nodes.id='.$res->to);`

Nohup shows the output for 'echo "$i\n"' each 55-60 seconds (which can take more than 17 years to finish if the size of the 'edges' table is more than 9,000,000 rows, like in my case).

Nohup显示'echo'的输出$ i \ n“每55-60秒(如果'edge'表的大小超过9,000,000行,可能需要17年才能完成,就像我的情况一样)。

Htop shows a /usr/bin/mysqld process which takes 40-60% of CPU.

Htop显示了一个/ usr / bin / mysqld进程占用了40-60%的CPU。

The same problem exists for the script which tries to calculate the weight (the distance) of an edge (select all edges, take an edge, then select two nodes of this edge from 'nodes' table, then calculate the distance, then update the edges table).



How can I make this SQL updates faster? Should I tweak any of MySQL config settings? Or should I use PostgreSQL with PostGIS extension? Should I use another structure for my data? Or should I somehow utilize the spatial index?


2 个解决方案



If I understand you right there is two things to discuss.


First, your idea of putting the highway-tag on the start and stop nodes. A node can have more than one edge connected, where to put the tag from the second edge? Or third or fourth if it is a crossing? The reason the highway tag is putted in the edges table in the first place is that from a relational point of view that is where it belongs.


Second, to get the whole table and process it outside the database is not the right way. What a relational database is really good at is taking care of this whole process.


I have not worked with mysql, and I fully agree that you will probably get a lot more fun if migrating to PostGIS since PostGIS has a lot better spatial capabilities (even if you don't need any spatial capabilities for this particular task) from what I have heard.


So if we ignore the first problem and just for showing the concept say that there is only two edges connected to one node and that each node has two tag-fields. tag1 and tag2. Then it could look something like this in PostGIS:

因此,如果我们忽略第一个问题,只是为了显示概念,说只有两个边连接到一个节点,并且每个节点有两个标记字段。 tag1和tag2。然后在PostGIS中看起来像这样:

UPDATE nodes set tag1=edges.tags from edges where nodes.id=edges.from;
UPDATE nodes set tag2=edges.tags from edges where nodes.id=edges.to;

If you disable the indexes that should be very fast.


Again, if I have understood you right.




Openstreetmap itself uses PostgreSQL, so I guess that's recommended.
See: http://wiki.openstreetmap.org/wiki/PostgreSQL

PostgreSQL Openstreetmap本身使用PostgreSQL,所以我猜这是推荐的。请参阅:http://wiki.openstreetmap.org/wiki/PostgreSQL

You can see OSM's database schema at: http://wiki.openstreetmap.org/wiki/Database_Schema


So you can use the same fields, fieldtypes and indexes that OSM uses for maximum compatibility.


If you want to import .osm files into a MySQL database, have a look at:
Here you will find perl code that will create MySQL tables, parse a OSM file and import it into your MySQL database.


Making it faster
If you are updating in bulk, you don't need to update the indexes after every update.
You can just disable the indexes, do all your updates and re-enable the index.
I'm guessing that should be a whole lot faster.


Good luck



If I understand you right there is two things to discuss.


First, your idea of putting the highway-tag on the start and stop nodes. A node can have more than one edge connected, where to put the tag from the second edge? Or third or fourth if it is a crossing? The reason the highway tag is putted in the edges table in the first place is that from a relational point of view that is where it belongs.


Second, to get the whole table and process it outside the database is not the right way. What a relational database is really good at is taking care of this whole process.


I have not worked with mysql, and I fully agree that you will probably get a lot more fun if migrating to PostGIS since PostGIS has a lot better spatial capabilities (even if you don't need any spatial capabilities for this particular task) from what I have heard.


So if we ignore the first problem and just for showing the concept say that there is only two edges connected to one node and that each node has two tag-fields. tag1 and tag2. Then it could look something like this in PostGIS:

因此,如果我们忽略第一个问题,只是为了显示概念,说只有两个边连接到一个节点,并且每个节点有两个标记字段。 tag1和tag2。然后在PostGIS中看起来像这样:

UPDATE nodes set tag1=edges.tags from edges where nodes.id=edges.from;
UPDATE nodes set tag2=edges.tags from edges where nodes.id=edges.to;

If you disable the indexes that should be very fast.


Again, if I have understood you right.




Openstreetmap itself uses PostgreSQL, so I guess that's recommended.
See: http://wiki.openstreetmap.org/wiki/PostgreSQL

PostgreSQL Openstreetmap本身使用PostgreSQL,所以我猜这是推荐的。请参阅:http://wiki.openstreetmap.org/wiki/PostgreSQL

You can see OSM's database schema at: http://wiki.openstreetmap.org/wiki/Database_Schema


So you can use the same fields, fieldtypes and indexes that OSM uses for maximum compatibility.


If you want to import .osm files into a MySQL database, have a look at:
Here you will find perl code that will create MySQL tables, parse a OSM file and import it into your MySQL database.


Making it faster
If you are updating in bulk, you don't need to update the indexes after every update.
You can just disable the indexes, do all your updates and re-enable the index.
I'm guessing that should be a whole lot faster.


Good luck