
时间:2023-02-02 11:09:12

How can one neatly represent a graph in Python? (Starting from scratch i.e. no libraries!)
What data structure (e.g. dicts/tuples/dict(tuples)) will be fast but also memory efficient?
One must be able to do various graph operations on it.

As pointed out, the various graph representations might help. How does one go about implementing them in Python?

As for the libraries, this question has quite good answers.

如何在Python中巧妙地表示图形? (从头开始,即没有库!)什么数据结构(例如dicts / tuples / dict(元组))将快速但内存效率高?一个人必须能够对其进行各种图形操作。正如所指出的,各种图表表示可能会有所帮助。如何在Python中实现它们?至于图书馆,这个问题有很好的答案。

4 个解决方案



Even though this is a somewhat old question, I thought I'd give a practical answer for anyone stumbling across this.


Let's say you get your input data for your connections as a list of tuples like so:


[('A', 'B'), ('B', 'C'), ('B', 'D'), ('C', 'D'), ('E', 'F'), ('F', 'C')]

The data structure I've found to be most useful and efficient for graphs in Python is a dict of sets. This will be the underlying structure for our Graph class. You also have to know if these connections are arcs (directed, connect one way) or edges (undirected, connect both ways). We'll handle that by adding a directed parameter to the Graph.__init__ method. We'll also add some other helpful methods.

我发现对Python中的图形最有用和最有效的数据结构是集合的字典。这将是我们的Graph类的底层结构。您还必须知道这些连接是弧形(定向,单向连接)还是边缘(无向,双向连接)。我们将通过向Graph .__ init__方法添加一个有向参数来处理这个问题。我们还将添加一些其他有用的方法。

from collections import defaultdict

class Graph(object):
    """ Graph data structure, undirected by default. """

    def __init__(self, connections, directed=False):
        self._graph = defaultdict(set)
        self._directed = directed

    def add_connections(self, connections):
        """ Add connections (list of tuple pairs) to graph """

        for node1, node2 in connections:
            self.add(node1, node2)

    def add(self, node1, node2):
        """ Add connection between node1 and node2 """

        if not self._directed:

    def remove(self, node):
        """ Remove all references to node """

        for n, cxns in self._graph.iteritems():
            except KeyError:
            del self._graph[node]
        except KeyError:

    def is_connected(self, node1, node2):
        """ Is node1 directly connected to node2 """

        return node1 in self._graph and node2 in self._graph[node1]

    def find_path(self, node1, node2, path=[]):
        """ Find any path between node1 and node2 (may not be shortest) """

        path = path + [node1]
        if node1 == node2:
            return path
        if node1 not in self._graph:
            return None
        for node in self._graph[node1]:
            if node not in path:
                new_path = self.find_path(node, node2, path)
                if new_path:
                    return new_path
        return None

    def __str__(self):
        return '{}({})'.format(self.__class__.__name__, dict(self._graph))

I'll leave it as an "exercise for the reader" to create a find_shortest_path and other methods.


Let's see this in action though...


>>> connections = [('A', 'B'), ('B', 'C'), ('B', 'D'),
                   ('C', 'D'), ('E', 'F'), ('F', 'C')]
>>> g = Graph(connections, directed=True)
>>> pprint(g._graph)
{'A': {'B'},
 'B': {'D', 'C'},
 'C': {'D'},
 'E': {'F'},
 'F': {'C'}}

>>> g = Graph(connections)  # undirected
>>> pprint(g._graph)
{'A': {'B'},
 'B': {'D', 'A', 'C'},
 'C': {'D', 'F', 'B'},
 'D': {'C', 'B'},
 'E': {'F'},
 'F': {'E', 'C'}}

>>> g.add('E', 'D')
>>> pprint(g._graph)
{'A': {'B'},
 'B': {'D', 'A', 'C'},
 'C': {'D', 'F', 'B'},
 'D': {'C', 'E', 'B'},
 'E': {'D', 'F'},
 'F': {'E', 'C'}}

>>> g.remove('A')
>>> pprint(g._graph)
{'B': {'D', 'C'},
 'C': {'D', 'F', 'B'},
 'D': {'C', 'E', 'B'},
 'E': {'D', 'F'},
 'F': {'E', 'C'}}

>>> g.add('G', 'B')
>>> pprint(g._graph)
{'B': {'D', 'G', 'C'},
 'C': {'D', 'F', 'B'},
 'D': {'C', 'E', 'B'},
 'E': {'D', 'F'},
 'F': {'E', 'C'},
 'G': {'B'}}

>>> g.find_path('G', 'E')
['G', 'B', 'D', 'C', 'F', 'E']



NetworkX is an awesome Python graph library. You'll be hard pressed to find something you need that it doesn't already do.




First, the choice of classical list vs. matrix representations depends on the purpose (on what do you want to do with the representation). The well-known problems and algorithms are related to the choice. The choice of the abstract representation kind of dictates how it should be implemented.


Second, the question is whether the vertices and edges should be expressed only in terms of existence, or whether they carry some extra information.


From Python built-in data types point-of-view, any value contained elsewhere is expressed as a (hidden) reference to the target object. If it is a variable (i.e. named reference), then the name and the reference is always stored in (an internal) dictionary. If you do not need names, then the reference can be stored in your own container -- here probably Python list will always be used for the list as abstraction.

从Python内置数据类型的观点来看,其他地方包含的任何值都表示为对目标对象的(隐藏)引用。如果它是变量(即命名引用),则名称和引用始终存储在(内部)字典中。如果您不需要名称,那么引用可以存储在您自己的容器中 - 这里可能Python列表将始终用作列表作为抽象。

Python list is implemented as a dynamic array of references, Python tuple is implemented as static array of references with constant content (the value of references cannot be changed). Because of that they can be easily indexed. This way, the list can be used also for implementation of matrices.


Another way to represent matrices are the arrays implemented by the standard module array -- more constrained with respect to the stored type, homogeneous value. The elements store the value directly. (The list stores the references to the value objects instead). This way, it is more memory efficient and also the access to the value is faster.

表示矩阵的另一种方法是由标准模块阵列实现的数组 - 相对于存储类型,均匀值更受限制。元素直接存储值。 (该列表存储对值对象的引用)。这样,它可以提高内存效率,并且对值的访问速度也更快。

Sometimes, you may find useful even more restricted representation like bytearray.




There are two excellent graph libraries NetworkX and igraph. You can find both library source codes on GitHub. You can always see how the functions are written. But I prefer NetworkX because of its ease to understand.
See their codes how they make the functions. You will get multiples idea and then chose how you want to make a graph using data structures.




Even though this is a somewhat old question, I thought I'd give a practical answer for anyone stumbling across this.


Let's say you get your input data for your connections as a list of tuples like so:


[('A', 'B'), ('B', 'C'), ('B', 'D'), ('C', 'D'), ('E', 'F'), ('F', 'C')]

The data structure I've found to be most useful and efficient for graphs in Python is a dict of sets. This will be the underlying structure for our Graph class. You also have to know if these connections are arcs (directed, connect one way) or edges (undirected, connect both ways). We'll handle that by adding a directed parameter to the Graph.__init__ method. We'll also add some other helpful methods.

我发现对Python中的图形最有用和最有效的数据结构是集合的字典。这将是我们的Graph类的底层结构。您还必须知道这些连接是弧形(定向,单向连接)还是边缘(无向,双向连接)。我们将通过向Graph .__ init__方法添加一个有向参数来处理这个问题。我们还将添加一些其他有用的方法。

from collections import defaultdict

class Graph(object):
    """ Graph data structure, undirected by default. """

    def __init__(self, connections, directed=False):
        self._graph = defaultdict(set)
        self._directed = directed

    def add_connections(self, connections):
        """ Add connections (list of tuple pairs) to graph """

        for node1, node2 in connections:
            self.add(node1, node2)

    def add(self, node1, node2):
        """ Add connection between node1 and node2 """

        if not self._directed:

    def remove(self, node):
        """ Remove all references to node """

        for n, cxns in self._graph.iteritems():
            except KeyError:
            del self._graph[node]
        except KeyError:

    def is_connected(self, node1, node2):
        """ Is node1 directly connected to node2 """

        return node1 in self._graph and node2 in self._graph[node1]

    def find_path(self, node1, node2, path=[]):
        """ Find any path between node1 and node2 (may not be shortest) """

        path = path + [node1]
        if node1 == node2:
            return path
        if node1 not in self._graph:
            return None
        for node in self._graph[node1]:
            if node not in path:
                new_path = self.find_path(node, node2, path)
                if new_path:
                    return new_path
        return None

    def __str__(self):
        return '{}({})'.format(self.__class__.__name__, dict(self._graph))

I'll leave it as an "exercise for the reader" to create a find_shortest_path and other methods.


Let's see this in action though...


>>> connections = [('A', 'B'), ('B', 'C'), ('B', 'D'),
                   ('C', 'D'), ('E', 'F'), ('F', 'C')]
>>> g = Graph(connections, directed=True)
>>> pprint(g._graph)
{'A': {'B'},
 'B': {'D', 'C'},
 'C': {'D'},
 'E': {'F'},
 'F': {'C'}}

>>> g = Graph(connections)  # undirected
>>> pprint(g._graph)
{'A': {'B'},
 'B': {'D', 'A', 'C'},
 'C': {'D', 'F', 'B'},
 'D': {'C', 'B'},
 'E': {'F'},
 'F': {'E', 'C'}}

>>> g.add('E', 'D')
>>> pprint(g._graph)
{'A': {'B'},
 'B': {'D', 'A', 'C'},
 'C': {'D', 'F', 'B'},
 'D': {'C', 'E', 'B'},
 'E': {'D', 'F'},
 'F': {'E', 'C'}}

>>> g.remove('A')
>>> pprint(g._graph)
{'B': {'D', 'C'},
 'C': {'D', 'F', 'B'},
 'D': {'C', 'E', 'B'},
 'E': {'D', 'F'},
 'F': {'E', 'C'}}

>>> g.add('G', 'B')
>>> pprint(g._graph)
{'B': {'D', 'G', 'C'},
 'C': {'D', 'F', 'B'},
 'D': {'C', 'E', 'B'},
 'E': {'D', 'F'},
 'F': {'E', 'C'},
 'G': {'B'}}

>>> g.find_path('G', 'E')
['G', 'B', 'D', 'C', 'F', 'E']



NetworkX is an awesome Python graph library. You'll be hard pressed to find something you need that it doesn't already do.




First, the choice of classical list vs. matrix representations depends on the purpose (on what do you want to do with the representation). The well-known problems and algorithms are related to the choice. The choice of the abstract representation kind of dictates how it should be implemented.


Second, the question is whether the vertices and edges should be expressed only in terms of existence, or whether they carry some extra information.


From Python built-in data types point-of-view, any value contained elsewhere is expressed as a (hidden) reference to the target object. If it is a variable (i.e. named reference), then the name and the reference is always stored in (an internal) dictionary. If you do not need names, then the reference can be stored in your own container -- here probably Python list will always be used for the list as abstraction.

从Python内置数据类型的观点来看,其他地方包含的任何值都表示为对目标对象的(隐藏)引用。如果它是变量(即命名引用),则名称和引用始终存储在(内部)字典中。如果您不需要名称,那么引用可以存储在您自己的容器中 - 这里可能Python列表将始终用作列表作为抽象。

Python list is implemented as a dynamic array of references, Python tuple is implemented as static array of references with constant content (the value of references cannot be changed). Because of that they can be easily indexed. This way, the list can be used also for implementation of matrices.


Another way to represent matrices are the arrays implemented by the standard module array -- more constrained with respect to the stored type, homogeneous value. The elements store the value directly. (The list stores the references to the value objects instead). This way, it is more memory efficient and also the access to the value is faster.

表示矩阵的另一种方法是由标准模块阵列实现的数组 - 相对于存储类型,均匀值更受限制。元素直接存储值。 (该列表存储对值对象的引用)。这样,它可以提高内存效率,并且对值的访问速度也更快。

Sometimes, you may find useful even more restricted representation like bytearray.




There are two excellent graph libraries NetworkX and igraph. You can find both library source codes on GitHub. You can always see how the functions are written. But I prefer NetworkX because of its ease to understand.
See their codes how they make the functions. You will get multiples idea and then chose how you want to make a graph using data structures.
