I'm trying to store an unweighted, directed graph of over 5GB in a MySQL database in an efficient way for finding shortest paths. Currently it is stored in a single table with a column source and a comlumn targets (comma seperated), but I am getting the feeling this isn't the way to go so I am planning on converting it to a table with vertices and a table with edges.
我试图在MySQL数据库中存储一个未加权的、有向图的5GB,这是找到最短路径的有效方法。目前,它存储在一个具有列源和comlumn目标(逗号分隔)的单一表中,但我感觉这不是正确的方法,因此我打算将它转换为具有顶点的表和具有边的表。
I've got two questions:
我有两个问题:
- What is the best way of storing the graph?
- 存储图形的最佳方式是什么?
- What shortest path algorithm should I use?
- 我应该使用什么最短路径算法?
2 个解决方案
#1
2
You should have two tables. One for nodes and one for edges. In the edges table you should have source_node_id and dest_node_id. This way you can easily make queries on the edges table to get all the outgoing nodes that are used by Dijkstra algorithm.
你应该有两张桌子。一个用于节点,一个用于边。在棱角表中,应该有source_node_id和dest_node_id。这样,您可以很容易地在棱角表上进行查询,以获取Dijkstra算法使用的所有传出节点。
For a simple Dijksra algorithm explanation see this: http://www.sce.carleton.ca/faculty/chinneck/po/Chapter8.pdf
对于一个简单的Dijksra算法解释,请参见:http://www.sce.carleton.ca/兼/chinneck/ poter8.pdf
#2
1
Another very efficient way to store dense graphs(sparse graphs are not so efficient) is to use an adjacency matrix. Here is a link which explains it -
另一种存储密集图(稀疏图不是那么有效)的非常有效的方法是使用邻接矩阵。这里有一个链接来解释它-
Storing graphs using adjacency matrix
使用邻接矩阵存储图
Now, to store a matrix in MySQL database you have to use the rowid as the vertex id for the rows(assuming you id your vertices as 1,2,...). The columns can just be the normal vertex names or the vertex ids again. You can keep a table which maps the vertex names to ids.
现在,要在MySQL数据库中存储一个矩阵,您必须使用rowid作为行的顶点id(假设您的顶点为1,2,…)。列可以是普通的顶点名,也可以是顶点id。您可以保留一个将顶点名映射到id的表。
One problem you will face is the max number of columns. If your matrix is too big, you might have to split the columns into multiple tables. If you have an indexing scheme/hashing scheme to tell you immediately the name of the table from the node you want, your query should be relatively fast.
您将面临的一个问题是列的最大数目。如果您的矩阵太大,您可能需要将列拆分为多个表。如果您有一个索引方案/散列方案来立即告诉您想要的节点的表的名称,那么您的查询应该是相对较快的。
And for the shortest path, as mentioned by others, Dijkstra algorithm is the best shortest path finding algorithm out there.
对于最短路径,正如其他人提到的,Dijkstra算法是目前最好的最短路径查找算法。
#1
2
You should have two tables. One for nodes and one for edges. In the edges table you should have source_node_id and dest_node_id. This way you can easily make queries on the edges table to get all the outgoing nodes that are used by Dijkstra algorithm.
你应该有两张桌子。一个用于节点,一个用于边。在棱角表中,应该有source_node_id和dest_node_id。这样,您可以很容易地在棱角表上进行查询,以获取Dijkstra算法使用的所有传出节点。
For a simple Dijksra algorithm explanation see this: http://www.sce.carleton.ca/faculty/chinneck/po/Chapter8.pdf
对于一个简单的Dijksra算法解释,请参见:http://www.sce.carleton.ca/兼/chinneck/ poter8.pdf
#2
1
Another very efficient way to store dense graphs(sparse graphs are not so efficient) is to use an adjacency matrix. Here is a link which explains it -
另一种存储密集图(稀疏图不是那么有效)的非常有效的方法是使用邻接矩阵。这里有一个链接来解释它-
Storing graphs using adjacency matrix
使用邻接矩阵存储图
Now, to store a matrix in MySQL database you have to use the rowid as the vertex id for the rows(assuming you id your vertices as 1,2,...). The columns can just be the normal vertex names or the vertex ids again. You can keep a table which maps the vertex names to ids.
现在,要在MySQL数据库中存储一个矩阵,您必须使用rowid作为行的顶点id(假设您的顶点为1,2,…)。列可以是普通的顶点名,也可以是顶点id。您可以保留一个将顶点名映射到id的表。
One problem you will face is the max number of columns. If your matrix is too big, you might have to split the columns into multiple tables. If you have an indexing scheme/hashing scheme to tell you immediately the name of the table from the node you want, your query should be relatively fast.
您将面临的一个问题是列的最大数目。如果您的矩阵太大,您可能需要将列拆分为多个表。如果您有一个索引方案/散列方案来立即告诉您想要的节点的表的名称,那么您的查询应该是相对较快的。
And for the shortest path, as mentioned by others, Dijkstra algorithm is the best shortest path finding algorithm out there.
对于最短路径,正如其他人提到的,Dijkstra算法是目前最好的最短路径查找算法。