如何优化这个neo4j查询？

The query is for loading the 1 million ratings from Grouplens dataset. I have already created nodes for users and movies, and now am merging them in relationships with movies.

该查询用于从Grouplens数据集加载100万个评级。我已经为用户和电影创建了节点，现在正在将它们与电影的关系中合并。

load csv from "file:///ratings.csv" as row fieldterminator ';' 
MERGE (u:User {userID:toInt(row[0])} ) 
MERGE (m:Movie {movieID:toInt(row[1])} ) 
MERGE (u)-[r:RATING {value:toInt(row[3])} ]->(m)

This query takes a very long time when allocated 2GB RAM in the JVM (laptop, 4GB RAM), although runs reasonably fast with 4-6 GB RAM (desktop). Also, I have indexes on Users and Movies with their respective IDs.

在JVM（笔记本电脑，4GB RAM）中分配2GB RAM时，此查询需要很长时间，尽管使用4-6 GB RAM（桌面）运行速度相当快。此外，我有用户和电影的索引及其各自的ID。

The profile of this query looks like this-

此查询的配置文件如下所示 -

The amount of db hits look perverse, and I think I can optimize this query.

db命中量看起来有悖常理，我想我可以优化这个查询。

(Follow up question): How could I run that optimized cypher query in neo4j-shell? Is this the correct syntax -

（跟进问题）：我如何在neo4j-shell中运行优化的cypher查询？这是正确的语法 -

start [CYPHER_QUERY] ;

1 个解决方案

#1

Try USING PERIODIC COMMIT. http://neo4j.com/docs/stable/query-periodic-commit.html

尝试使用PERICODIC COMMIT。 http://neo4j.com/docs/stable/query-periodic-commit.html

Also, consider using CREATE instead of MERGE for the last line to create the relationship, as I'm assuming ratings aren't repeated in your .csv file.

另外，考虑在最后一行使用CREATE而不是MERGE来创建关系，因为我假设您的.csv文件中不会重复评级。

#1