This is possibly a problem with possibly no optimal solution. Suppose I have a directed graph, have no clue whether it has any cycles or not(cycle detection will be one of the aspects of this problem). Given a set of vertices(possibly millions of vertices) I need to count all the distinct paths(paths with no duplicate vertices) between all the unique pairs of the given graph. How would I go about tackling this situation?
这可能是一个没有最佳解决方案的问题。假设我有一个有向图,不知道它是否有任何周期(周期检测将是这个问题的一个方面)。给定一组顶点(可能有数百万个顶点),我需要计算给定图形的所有唯一对之间的所有不同路径(没有重复顶点的路径)。我该如何解决这个问题呢?
Lets look at a brute-force way to do this:
让我们看一看蛮力的方法:
-
Compute all possible pairs from the graph.
从图中计算所有可能的对。
-
For each pair of the graph use DFS to get all the paths from Source to Destination.
对于每一对图形,使用DFS从源到目标的所有路径。
- Assuming the pairs are represented in a hash table, put the path count as the value against this pair.
- 假设这些对在哈希表中表示,将路径计数作为对这对的值。
- Repeat for the rest of the pairs.
- 对其余的结对重复。
Can people point out what are some of the things that can go wrong with this? Lets think of the problem in this manner, what are the computational challenges of finding all the distinct paths between all the cities of the planet and if one would even attempt to solve this problem, where would one start?
人们能指出哪些事情会出问题吗?让我们以这种方式来思考问题,找出地球上所有城市之间的所有不同路径的计算挑战,如果有人试图解决这个问题,从哪里开始?
Edit: Some of the algorithms presented produce results in O(n!) factorial time. This is somewhat of a daunting computation for a single machine with limited resources to handle. Does anyone know of map-reduce techniques to break down the problem of graph traversal into smaller chunks?
编辑:一些算法产生的结果在O(n!)阶乘时间。对于一个资源有限的单一机器来说,这有点令人畏惧。有没有人知道mapreduce技术将图形遍历的问题分解成更小的块?
3 个解决方案
#1
3
The Floyd-Warshall generalization that gets a rough approximation of paths is this:
Floyd-Warshall泛化得到一个大致的路径近似:
procedure FloydWarshall ()
for k := 1 to n
for i := 1 to n
for j := 1 to n
path[i][j] = path[i][j] + path[i][k]*path[k][j]
Here is a very rough idea on how to scale this. Disclaimer - This isn't concrete -it's very hand wavy, but hopefully it helps more than confuses. It helps to understand how the algorithm works. It starts on an adjacency matrix of the graph. In each iteration k, we're saying the number of paths from i to j is equal to the number of paths in prior iterations from i to j plus the number of paths from i to j via k.
这里有一个非常粗略的概念,关于如何缩放。免责声明- - -这不是具体的- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -它有助于理解算法的工作原理。它从图的邻接矩阵开始。在每个迭代k中,我们说从i到j的路径的数量等于在之前的迭代中从i到j的路径的数量加上从i到j的路径的数量。
Thus Break the graph in to n arbitrary sub-adjacency matricies of size k x k, compute above on each of them. Now you have the number of paths and start combining submatricies by recomputing part of the above on the combined matrix. I.e, we only need to recompute n/2 values for k when recombining. I found this, which looks like a similar direction http://theory.stanford.edu/~oldham/publications/generalized/asgsp.pdf.
因此,将图形分解为k×k大小的任意次邻接矩阵,在每个子邻接矩阵上计算。现在你已经有了路径的数量,并开始将这些子矩阵结合起来,再将上面的部分重新计算在合并的矩阵上。我。当重新组合时,我们只需要重新计算k的n/2值。我发现了这个,它看起来类似于一个类似的方向:http://y.stanford.edu/ ~oldham/publications/ general/asgsp.pdf。
#2
3
Have you thought how many such paths can exist?
你想过有多少这样的路径可以存在吗?
In such a graph with V vertices, you have about V^2 different pairs. Let's imagine a worst case scenario where you have a full graph (one where edges exist between every pair). Paths can have anywhere between 1 edge and V-1 edges, because you do not allow duplicate vertices in a path.
在这样一个有V顶点的图中,你有关于V的两个不同的对。让我们想象一个最坏的情况,你有一个完整的图(每对之间有一个边)。路径可以在1条边和V-1边之间,因为在路径中不允许重复的顶点。
Between each pair of vertices:
每对顶点之间:
- There is only one path with 1 edge;
- 只有一条有1条边的路径;
- There are V-2 paths with 2 edges: using any intermediate vertex that isn't the origin or destination;
- 有2条边的V-2路径:使用任何不是原点或终点的中间顶点;
- There are (V-2)(V-3) paths with 3 edges: using any two distinct intermediate vertices;
- 有3条边的V-2 (V-2)(V-3)路径:使用任意两个不同的中间顶点;
- There are (V-2)(V-3)(V-4) paths with 4 edges;
- 有(V-2)(V-3)(V-4)路径4条边;
- There are prod{k=1 -> n-1}(V-k-1) paths with n edges.
- 有prod{k=1 -> n-1}(V-k-1)路径的n条边。
Given that, we know that there are at most V^2*sum{i=1 -> V-1}(prod{k=1 -> i-1}(V-k-1)) total paths for a graph with V vertices.
鉴于此,我们知道有最多V ^ 2 *总和{ i = 1 - >与它们}(prod {张k = 1 - > }(V-k-1))总路径图与V顶点。
The total number of paths is therefore P(V) = V^2*sum{i=1 -> V-1}(prod{k=1 -> i-1}(V-k-1)) = V^2*sum{i=1 -> V-1}(O(V^V)) = O(V^3*V^V) = O(V!).
因此路径P的总数(V)= V ^ 2 *总和{ i = 1 - >与它们}(prod {张k = 1 - > }(V-k-1))= V ^ 2 *总和{ i = 1 - >与它们}(O(V ^))= O(V ^ 3 * V ^)= O(V)。
Now imagine an ideal world where you can compute each path in constant time. Your algorithm would take O(1*V!) = O(V!) time to run, which is impratical.
现在想象一个理想的世界,你可以在恒定时间内计算每条路径。你的算法会用O(1*V!) = O(V!)时间来运行,这是不实用的。
There might be an algorithm that can count the paths without enumerating them, and thus get a more efficient algorithm.
可能有一种算法可以计算路径,而不用枚举它们,从而得到一个更有效的算法。
#3
0
This SO page describes a DFS method for printing all non-cyclic paths between any two vertices--it also includes java code. You can modify it to find all such paths for all pairs of of vertices, but it's not the most efficient way of counting all paths between all vertices.
这个页面描述了一个DFS方法,用于打印任意两个顶点之间的所有非循环路径——它还包括java代码。你可以修改它来为所有的顶点配对找到所有的路径,但这并不是计算所有顶点之间所有路径的最有效的方法。
#1
3
The Floyd-Warshall generalization that gets a rough approximation of paths is this:
Floyd-Warshall泛化得到一个大致的路径近似:
procedure FloydWarshall ()
for k := 1 to n
for i := 1 to n
for j := 1 to n
path[i][j] = path[i][j] + path[i][k]*path[k][j]
Here is a very rough idea on how to scale this. Disclaimer - This isn't concrete -it's very hand wavy, but hopefully it helps more than confuses. It helps to understand how the algorithm works. It starts on an adjacency matrix of the graph. In each iteration k, we're saying the number of paths from i to j is equal to the number of paths in prior iterations from i to j plus the number of paths from i to j via k.
这里有一个非常粗略的概念,关于如何缩放。免责声明- - -这不是具体的- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -它有助于理解算法的工作原理。它从图的邻接矩阵开始。在每个迭代k中,我们说从i到j的路径的数量等于在之前的迭代中从i到j的路径的数量加上从i到j的路径的数量。
Thus Break the graph in to n arbitrary sub-adjacency matricies of size k x k, compute above on each of them. Now you have the number of paths and start combining submatricies by recomputing part of the above on the combined matrix. I.e, we only need to recompute n/2 values for k when recombining. I found this, which looks like a similar direction http://theory.stanford.edu/~oldham/publications/generalized/asgsp.pdf.
因此,将图形分解为k×k大小的任意次邻接矩阵,在每个子邻接矩阵上计算。现在你已经有了路径的数量,并开始将这些子矩阵结合起来,再将上面的部分重新计算在合并的矩阵上。我。当重新组合时,我们只需要重新计算k的n/2值。我发现了这个,它看起来类似于一个类似的方向:http://y.stanford.edu/ ~oldham/publications/ general/asgsp.pdf。
#2
3
Have you thought how many such paths can exist?
你想过有多少这样的路径可以存在吗?
In such a graph with V vertices, you have about V^2 different pairs. Let's imagine a worst case scenario where you have a full graph (one where edges exist between every pair). Paths can have anywhere between 1 edge and V-1 edges, because you do not allow duplicate vertices in a path.
在这样一个有V顶点的图中,你有关于V的两个不同的对。让我们想象一个最坏的情况,你有一个完整的图(每对之间有一个边)。路径可以在1条边和V-1边之间,因为在路径中不允许重复的顶点。
Between each pair of vertices:
每对顶点之间:
- There is only one path with 1 edge;
- 只有一条有1条边的路径;
- There are V-2 paths with 2 edges: using any intermediate vertex that isn't the origin or destination;
- 有2条边的V-2路径:使用任何不是原点或终点的中间顶点;
- There are (V-2)(V-3) paths with 3 edges: using any two distinct intermediate vertices;
- 有3条边的V-2 (V-2)(V-3)路径:使用任意两个不同的中间顶点;
- There are (V-2)(V-3)(V-4) paths with 4 edges;
- 有(V-2)(V-3)(V-4)路径4条边;
- There are prod{k=1 -> n-1}(V-k-1) paths with n edges.
- 有prod{k=1 -> n-1}(V-k-1)路径的n条边。
Given that, we know that there are at most V^2*sum{i=1 -> V-1}(prod{k=1 -> i-1}(V-k-1)) total paths for a graph with V vertices.
鉴于此,我们知道有最多V ^ 2 *总和{ i = 1 - >与它们}(prod {张k = 1 - > }(V-k-1))总路径图与V顶点。
The total number of paths is therefore P(V) = V^2*sum{i=1 -> V-1}(prod{k=1 -> i-1}(V-k-1)) = V^2*sum{i=1 -> V-1}(O(V^V)) = O(V^3*V^V) = O(V!).
因此路径P的总数(V)= V ^ 2 *总和{ i = 1 - >与它们}(prod {张k = 1 - > }(V-k-1))= V ^ 2 *总和{ i = 1 - >与它们}(O(V ^))= O(V ^ 3 * V ^)= O(V)。
Now imagine an ideal world where you can compute each path in constant time. Your algorithm would take O(1*V!) = O(V!) time to run, which is impratical.
现在想象一个理想的世界,你可以在恒定时间内计算每条路径。你的算法会用O(1*V!) = O(V!)时间来运行,这是不实用的。
There might be an algorithm that can count the paths without enumerating them, and thus get a more efficient algorithm.
可能有一种算法可以计算路径,而不用枚举它们,从而得到一个更有效的算法。
#3
0
This SO page describes a DFS method for printing all non-cyclic paths between any two vertices--it also includes java code. You can modify it to find all such paths for all pairs of of vertices, but it's not the most efficient way of counting all paths between all vertices.
这个页面描述了一个DFS方法,用于打印任意两个顶点之间的所有非循环路径——它还包括java代码。你可以修改它来为所有的顶点配对找到所有的路径,但这并不是计算所有顶点之间所有路径的最有效的方法。