I am retrieving the data from Neo4j using Bolt Driver in Python Language. The returned result should be stored as RDD(or atleast into CSV). I am able to see the returned results but unable to store it as an RDD or a Data frame or atleast into a csv.
我正在使用Python语言中的Bolt Driver从Neo4j中检索数据。返回的结果应存储为RDD(或至少存储为CSV)。我能够看到返回的结果,但无法将其存储为RDD或数据帧或至少存储到csv中。
Here is how I am seeing the result:
以下是我看到结果的方式:
session = driver.session()
result = session.run('MATCH (n) RETURN n.hobby,id(n)')
session.close()
Here, how can I store this data into RDD or CSV file.
在这里,如何将此数据存储到RDD或CSV文件中。
2 个解决方案
#1
0
I deleted the old post and reposted the same question. But I haven't received any pointers. So, I am posting my way of approach so that it may help others.
我删除了旧帖并重新发布了相同的问题。但我没有收到任何指示。所以,我发布我的方法,以便它可以帮助别人。
'''
Storing the return result into RDD
'''
session = driver.session()
result = session.run('MATCH (n:Hobby) RETURN n.hobby AS hobby,id(n) As id LIMIT 10')
session.close()
'''
Pulling the keys
'''
keys = result.peek().keys()
'''
Reading all the property values and storing it in a list
'''
values=list()
for record in result:
rec= list()
for key in keys:
rec.append(record[key])
values.append(rec)
'''
Converting list of values into a pandas dataframe
'''
df = DataFrame(values, columns=keys)
print df
'''
Converting the pandas DataFrame to Spark DataFrame
'''
sqlCtx = SQLContext(sc)
spark_df = sqlCtx.createDataFrame(df)
print spark_df.show()
'''
Converting the Pandas DataFrame to SparkRdd (via Spark Dataframes)
'''
rdd = spark_df.rdd.map(tuple)
print rdd.take(10)
Any suggestions to improve the efficiency is highly appreciated.
任何提高效率的建议都受到高度赞赏。
#2
0
Instead of going from python to spark, why not use the Neo4j Spark connector? I think this would save python from being a bottle neck if you were moving a lot of data. You can put your cypher query inside of the spark session and save it as an RDD.
而不是从python到spark,为什么不使用Neo4j Spark连接器呢?我认为如果移动大量数据,这将使python成为一个瓶颈。您可以将您的密码查询放在spark会话中并将其另存为RDD。
There has been talk on the Neo4J slack group about a pyspark implementation, which will hopefully be available later this fall. I know the ability to query neo4j from pyspark and sparkr would be very useful.
有关Pyspark实施的Neo4J松弛小组已经有人谈论过,希望能在今年秋天晚些时候推出。我知道从pyspark和sparkr查询neo4j的能力非常有用。
#1
0
I deleted the old post and reposted the same question. But I haven't received any pointers. So, I am posting my way of approach so that it may help others.
我删除了旧帖并重新发布了相同的问题。但我没有收到任何指示。所以,我发布我的方法,以便它可以帮助别人。
'''
Storing the return result into RDD
'''
session = driver.session()
result = session.run('MATCH (n:Hobby) RETURN n.hobby AS hobby,id(n) As id LIMIT 10')
session.close()
'''
Pulling the keys
'''
keys = result.peek().keys()
'''
Reading all the property values and storing it in a list
'''
values=list()
for record in result:
rec= list()
for key in keys:
rec.append(record[key])
values.append(rec)
'''
Converting list of values into a pandas dataframe
'''
df = DataFrame(values, columns=keys)
print df
'''
Converting the pandas DataFrame to Spark DataFrame
'''
sqlCtx = SQLContext(sc)
spark_df = sqlCtx.createDataFrame(df)
print spark_df.show()
'''
Converting the Pandas DataFrame to SparkRdd (via Spark Dataframes)
'''
rdd = spark_df.rdd.map(tuple)
print rdd.take(10)
Any suggestions to improve the efficiency is highly appreciated.
任何提高效率的建议都受到高度赞赏。
#2
0
Instead of going from python to spark, why not use the Neo4j Spark connector? I think this would save python from being a bottle neck if you were moving a lot of data. You can put your cypher query inside of the spark session and save it as an RDD.
而不是从python到spark,为什么不使用Neo4j Spark连接器呢?我认为如果移动大量数据,这将使python成为一个瓶颈。您可以将您的密码查询放在spark会话中并将其另存为RDD。
There has been talk on the Neo4J slack group about a pyspark implementation, which will hopefully be available later this fall. I know the ability to query neo4j from pyspark and sparkr would be very useful.
有关Pyspark实施的Neo4J松弛小组已经有人谈论过,希望能在今年秋天晚些时候推出。我知道从pyspark和sparkr查询neo4j的能力非常有用。