Networkx:计算图表上的最短路径并将其存储到Pandas数据帧

时间:2021-11-15 16:57:19

I have a pandas dataframe as shown below. There are many more columns in that frame that are not important concerning the task. The column id shows the sentenceID while the columns e1 and e2 contain entities (=words) of the sentence with their relationship in the column r

我有一个pandas数据帧,如下所示。该框架中有许多列对于任务而言并不重要。列id显示了句子ID,而列e1和e2包含句子的实体(=单词),它们在列r中的关系

id     e1        e2          r
10     a-5       b-17        A 
10     b-17      a-5         N
17     c-1       a-23        N
17     a-23      c-1         N
17     d-30      g-2         N
17     g-20      d-30        B

I also created a graph for each sentence. The graph is created from a list of edges that looks somewhat like this

我还为每个句子创建了一个图表。图形是从边缘列表中创建的,看起来有点像这样

[('wordB-5', 'wordA-1'), ('wordC-8', 'wordA-1'), ...]

[('wordB-5','wordA-1'),('wordC-8','wordA-1'),...]

All of those edges are in one list (of lists). Each element in that list contains all the edges of each sentence. Meaning list[0] has the edges of sentence 0 and so on.

所有这些边都在一个列表中(列表中)。该列表中的每个元素都包含每个句子的所有边。含义列表[0]具有句子0的边缘,依此类推。

Now I want to perform operations like these:

现在我想执行以下操作:

graph = nx.Graph(graph_edges[i])
shortest_path = nx.shortest_path(graph, source="e1", 
target="e2")
result_length = len(shortest_path)
result_path = shortest_path

For each row in the data frame, I'd like to calculate the shortest paths (from the entity in e1 to the entity in e2 and save all of the results in a new column in the DataFrame but I have no idea how to do that.

对于数据框中的每一行,我想计算最短路径(从e1中的实体到e2中的实体,并将所有结果保存在DataFrame的新列中,但我不知道如何做到这一点。

I tried using constructions such as these

我尝试使用这些结构

e1 = DF["e1"].tolist()
e2 = DF["e2"].tolist()
for id in Df["sentenceID"]:
    graph = nx.Graph(graph_edges[id])
    shortest_path = nx.shortest_path(graph,source=e1, target=e2)
result_length = len(shortest_path)
result_path = shortest_path

to create the data but it says the target is not in the graph.

创建数据,但它表示目标不在图表中。

new df=

id     e1        e2          r     length     path
10     a-5       b-17        A       4         ..
10     b-17      a-5         N       4         ..
17     c-1       a-23        N       3         ..
17     a-23      c-1         N       3         ..
17     d-30      g-2         N       7         ..
17     g-20      d-30        B       7         ..

2 个解决方案

#1


2  

Here's one way to do what you are trying to do, in three distinct steps so that it is easier to follow along.

这是一种通过三个不同步骤完成您要执行的操作的方法,以便更容易理解。

  • Step 1: From a list of edges, build the networkx graph object.
  • 步骤1:从边缘列表中构建networkx图形对象。

  • Step 2: Create a data frame with 2 columns (For each row in this DF, we want the shortest distance and path from the e1 column to the entity in e2)
  • 步骤2:创建一个包含2列的数据框(对于此DF中的每一行,我们需要从e1列到e2中的实体的最短距离和路径)

  • Step 3: Row by row for the DF, calculate shortest path and length. Store them in the DF as new columns.
  • 步骤3:逐行为DF,计算最短路径和长度。将它们作为新列存储在DF中。

Step 1: Build the graph and add edges, one by one

import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt

elist = [[('a-5', 'b-17'), ('b-17', 'c-1')], #sentence 1
         [('c-1', 'a-23'), ('a-23', 'c-1')], #sentence 2
         [('b-17', 'g-2'), ('g-20', 'c-1')]] #sentence 3

graph = nx.Graph()

for sentence_edges in elist:
    for fromnode, tonode in sentence_edges:
        graph.add_edge(fromnode, tonode)

nx.draw(graph, with_labels=True, node_color='lightblue')

Networkx:计算图表上的最短路径并将其存储到Pandas数据帧

Step 2: Create a data frame of desired distances

#Create a data frame to store distances from the element in column e1 to e2
DF = pd.DataFrame({"e1":['c-1', 'a-23', 'c-1', 'g-2'],
             "e2":['b-17', 'a-5', 'g-20', 'g-20']})
DF

Networkx:计算图表上的最短路径并将其存储到Pandas数据帧

Step 3: Calculate Shortest path and length, and store in the data frame

This is the final step. Calculate shortest paths and store them.

这是最后一步。计算最短路径并存储它们。

pathlist, len_list = [], [] #placeholders

for row in DF.itertuples():
    so, tar = row[1], row[2]
    path = nx.shortest_path(graph, source=so, target=tar)
    length=nx.shortest_path_length(graph,source=so, target=tar)
    pathlist.append(path)
    len_list.append(length)

#Add these lists as new columns in the DF
DF['length'] = len_list
DF['path'] = pathlist

Which produces the desired resulting data frame:

这会产生所需的结果数据框:

Networkx:计算图表上的最短路径并将其存储到Pandas数据帧

Hope this helps you.

希望这对你有所帮助。

#2


1  

For anyone that's interested in the solution (thanks to Ram Narasimhan) :

对于任何对解决方案感兴趣的人(感谢Ram Narasimhan):

 pathlist, len_list = [], []
 so, tar = DF["e1"].tolist(), DF["e2"].tolist()
 id = DF["id"].tolist()

 for _,s,t in zip(id, so, tar):
     graph = nx.Graph(graph_edges[_]) #Constructing each Graph
     try:
         path = nx.shortest_path(graph, source=s, target=t)
         length = nx.shortest_path_length(graph,source=s, target=t)
         pathlist.append(path)
         len_list.append(length)
     except nx.NetworkXNoPath:
         path = "No Path"
         length = "No Pathlength"
         pathlist.append(path)
         len_list.append(length)

 #Add these lists as new columns in the DF
 DF['length'] = len_list
 DF['path'] = pathlist

#1


2  

Here's one way to do what you are trying to do, in three distinct steps so that it is easier to follow along.

这是一种通过三个不同步骤完成您要执行的操作的方法,以便更容易理解。

  • Step 1: From a list of edges, build the networkx graph object.
  • 步骤1:从边缘列表中构建networkx图形对象。

  • Step 2: Create a data frame with 2 columns (For each row in this DF, we want the shortest distance and path from the e1 column to the entity in e2)
  • 步骤2:创建一个包含2列的数据框(对于此DF中的每一行,我们需要从e1列到e2中的实体的最短距离和路径)

  • Step 3: Row by row for the DF, calculate shortest path and length. Store them in the DF as new columns.
  • 步骤3:逐行为DF,计算最短路径和长度。将它们作为新列存储在DF中。

Step 1: Build the graph and add edges, one by one

import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt

elist = [[('a-5', 'b-17'), ('b-17', 'c-1')], #sentence 1
         [('c-1', 'a-23'), ('a-23', 'c-1')], #sentence 2
         [('b-17', 'g-2'), ('g-20', 'c-1')]] #sentence 3

graph = nx.Graph()

for sentence_edges in elist:
    for fromnode, tonode in sentence_edges:
        graph.add_edge(fromnode, tonode)

nx.draw(graph, with_labels=True, node_color='lightblue')

Networkx:计算图表上的最短路径并将其存储到Pandas数据帧

Step 2: Create a data frame of desired distances

#Create a data frame to store distances from the element in column e1 to e2
DF = pd.DataFrame({"e1":['c-1', 'a-23', 'c-1', 'g-2'],
             "e2":['b-17', 'a-5', 'g-20', 'g-20']})
DF

Networkx:计算图表上的最短路径并将其存储到Pandas数据帧

Step 3: Calculate Shortest path and length, and store in the data frame

This is the final step. Calculate shortest paths and store them.

这是最后一步。计算最短路径并存储它们。

pathlist, len_list = [], [] #placeholders

for row in DF.itertuples():
    so, tar = row[1], row[2]
    path = nx.shortest_path(graph, source=so, target=tar)
    length=nx.shortest_path_length(graph,source=so, target=tar)
    pathlist.append(path)
    len_list.append(length)

#Add these lists as new columns in the DF
DF['length'] = len_list
DF['path'] = pathlist

Which produces the desired resulting data frame:

这会产生所需的结果数据框:

Networkx:计算图表上的最短路径并将其存储到Pandas数据帧

Hope this helps you.

希望这对你有所帮助。

#2


1  

For anyone that's interested in the solution (thanks to Ram Narasimhan) :

对于任何对解决方案感兴趣的人(感谢Ram Narasimhan):

 pathlist, len_list = [], []
 so, tar = DF["e1"].tolist(), DF["e2"].tolist()
 id = DF["id"].tolist()

 for _,s,t in zip(id, so, tar):
     graph = nx.Graph(graph_edges[_]) #Constructing each Graph
     try:
         path = nx.shortest_path(graph, source=s, target=t)
         length = nx.shortest_path_length(graph,source=s, target=t)
         pathlist.append(path)
         len_list.append(length)
     except nx.NetworkXNoPath:
         path = "No Path"
         length = "No Pathlength"
         pathlist.append(path)
         len_list.append(length)

 #Add these lists as new columns in the DF
 DF['length'] = len_list
 DF['path'] = pathlist