I have a pandas dataframe as shown below. There are many more columns in that frame that are not important concerning the task. The column id shows the sentenceID while the columns e1 and e2 contain entities (=words) of the sentence with their relationship in the column r


id     e1        e2          r
10     a-5       b-17        A 
10     b-17      a-5         N
17     c-1       a-23        N
17     a-23      c-1         N
17     d-30      g-2         N
17     g-20      d-30        B

I also created a graph for each sentence. The graph is created from a list of edges that looks somewhat like this


[('wordB-5', 'wordA-1'), ('wordC-8', 'wordA-1'), ...]


All of those edges are in one list (of lists). Each element in that list contains all the edges of each sentence. Meaning list[0] has the edges of sentence 0 and so on.


Now I want to perform operations like these:


graph = nx.Graph(graph_edges[i])
shortest_path = nx.shortest_path(graph, source="e1", 
result_length = len(shortest_path)
result_path = shortest_path

For each row in the data frame, I'd like to calculate the shortest paths (from the entity in e1 to the entity in e2 and save all of the results in a new column in the DataFrame but I have no idea how to do that.


I tried using constructions such as these


e1 = DF["e1"].tolist()
e2 = DF["e2"].tolist()
for id in Df["sentenceID"]:
    graph = nx.Graph(graph_edges[id])
    shortest_path = nx.shortest_path(graph,source=e1, target=e2)
result_length = len(shortest_path)
result_path = shortest_path

to create the data but it says the target is not in the graph.


new df=

id     e1        e2          r     length     path
10     a-5       b-17        A       4         ..
10     b-17      a-5         N       4         ..
17     c-1       a-23        N       3         ..
17     a-23      c-1         N       3         ..
17     d-30      g-2         N       7         ..
17     g-20      d-30        B       7         ..

2 个解决方案



Here's one way to do what you are trying to do, in three distinct steps so that it is easier to follow along.


  • Step 1: From a list of edges, build the networkx graph object.
  • 步骤1:从边缘列表中构建networkx图形对象。

  • Step 2: Create a data frame with 2 columns (For each row in this DF, we want the shortest distance and path from the e1 column to the entity in e2)
  • 步骤2:创建一个包含2列的数据框(对于此DF中的每一行,我们需要从e1列到e2中的实体的最短距离和路径)

  • Step 3: Row by row for the DF, calculate shortest path and length. Store them in the DF as new columns.
  • 步骤3:逐行为DF,计算最短路径和长度。将它们作为新列存储在DF中。

Step 1: Build the graph and add edges, one by one

import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt

elist = [[('a-5', 'b-17'), ('b-17', 'c-1')], #sentence 1
         [('c-1', 'a-23'), ('a-23', 'c-1')], #sentence 2
         [('b-17', 'g-2'), ('g-20', 'c-1')]] #sentence 3

graph = nx.Graph()

for sentence_edges in elist:
    for fromnode, tonode in sentence_edges:
        graph.add_edge(fromnode, tonode)

nx.draw(graph, with_labels=True, node_color='lightblue')


Step 2: Create a data frame of desired distances

#Create a data frame to store distances from the element in column e1 to e2
DF = pd.DataFrame({"e1":['c-1', 'a-23', 'c-1', 'g-2'],
             "e2":['b-17', 'a-5', 'g-20', 'g-20']})


Step 3: Calculate Shortest path and length, and store in the data frame

This is the final step. Calculate shortest paths and store them.


pathlist, len_list = [], [] #placeholders

for row in DF.itertuples():
    so, tar = row[1], row[2]
    path = nx.shortest_path(graph, source=so, target=tar)
    length=nx.shortest_path_length(graph,source=so, target=tar)

#Add these lists as new columns in the DF
DF['length'] = len_list
DF['path'] = pathlist

Which produces the desired resulting data frame:



Hope this helps you.




For anyone that's interested in the solution (thanks to Ram Narasimhan) :

对于任何对解决方案感兴趣的人(感谢Ram Narasimhan):

 pathlist, len_list = [], []
 so, tar = DF["e1"].tolist(), DF["e2"].tolist()
 id = DF["id"].tolist()

 for _,s,t in zip(id, so, tar):
     graph = nx.Graph(graph_edges[_]) #Constructing each Graph
         path = nx.shortest_path(graph, source=s, target=t)
         length = nx.shortest_path_length(graph,source=s, target=t)
     except nx.NetworkXNoPath:
         path = "No Path"
         length = "No Pathlength"

 #Add these lists as new columns in the DF
 DF['length'] = len_list
 DF['path'] = pathlist



