如何用Python / pyspark运行graphx ?

I am attempting to run Spark graphx with Python using pyspark. My installation appears correct, as I am able to run the pyspark tutorials and the (Java) GraphX tutorials just fine. Presumably since GraphX is part of Spark, pyspark should be able to interface it, correct?

我正在尝试用pyspark运行Spark graphx。我的安装看起来是正确的，因为我可以运行pyspark教程和(Java) GraphX教程。可能由于GraphX是Spark的一部分，pyspark应该能够对它进行接口，对吗?

Here are the tutorials for pyspark: http://spark.apache.org/docs/0.9.0/quick-start.html http://spark.apache.org/docs/0.9.0/python-programming-guide.html

以下是pyspark的教程:http://spark.apache.org/docs/0.9.0/quick-start.html http://spark.apache.org/docs/0.9.0/python-编程指南。html。

Here are the ones for GraphX: http://spark.apache.org/docs/0.9.0/graphx-programming-guide.html http://ampcamp.berkeley.edu/big-data-mini-course/graph-analytics-with-graphx.html

以下是GraphX: http://spark.apache.org/docs/0.9.0/ GraphX -编程指南。html http://ampcamp.berkeley.edu/big-data-迷你课程/graph-analytics-with-graphx.html。

Can anyone convert the GraphX tutorial to be in Python?

任何人都可以将GraphX教程转换成Python吗?

3 个解决方案

#1

It looks like the python bindings to GraphX are delayed at least to Spark ~~1.4~~ ~~1.5~~ ∞. It is waiting behind the Java API.

它看起来像python绑定GraphX延误至少引发1.4 - 1.5∞。它在Java API后面等待。

You can track the status at SPARK-3789 GRAPHX Python bindings for GraphX - ASF JIRA

您可以在SPARK-3789 GRAPHX Python绑定中跟踪GRAPHX - ASF JIRA的状态。

#2

You should look at GraphFrames (https://github.com/graphframes/graphframes), which wraps GraphX algorithms under the DataFrames API and it provides Python interface.

您应该看看graphframe (https://github.com/graphframes/graphframe)，它在DataFrames API下封装了GraphX算法，并提供了Python接口。

Here is a quick example from http://graphframes.github.io/quick-start.html, with slight modification so that it works

这里有一个来自http://graphframes.github.io/quick-start的快速示例。html，稍微修改一下就可以了。

first start pyspark with the graphframes pkg loaded

首先启动pyspark与graphframe pkg加载。

pyspark --packages graphframes:graphframes:0.1.0-spark1.6

pyspark——包graphframes:graphframes:0.1.0-spark1.6

python code:

python代码:

from graphframes import *

# Create a Vertex DataFrame with unique ID column "id"
v = sqlContext.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
], ["id", "name", "age"])

# Create an Edge DataFrame with "src" and "dst" columns
e = sqlContext.createDataFrame([
  ("a", "b", "friend"),
  ("b", "c", "follow"),
  ("c", "b", "follow"),
], ["src", "dst", "relationship"])
# Create a GraphFrame
g = GraphFrame(v, e)

# Query: Get in-degree of each vertex.
g.inDegrees.show()

# Query: Count the number of "follow" connections in the graph.
g.edges.filter("relationship = 'follow'").count()

# Run PageRank algorithm, and show results.
results = g.pageRank(resetProbability=0.01, maxIter=20)
results.vertices.select("id", "pagerank").show()

#3

GraphX 0.9.0 doesn't have python API yet. It's expected in upcoming releases.

GraphX 0.9.0还没有python API。预计在即将发布的版本中。

#1