从Node.js在Neo4j中引入大量数据时出错

时间:2022-11-12 18:05:14

I am trying to introduce a huge amount of data in neo4j from a file. I am using node.js code, simple javascript with no much complexity.

我试图从一个文件中引入neo4j中的大量数据。我使用node.js代码,简单的javascript没有太多的复杂性。

The thing is that I have 386213 lines or 'nodes' to introduce, but when executed (and wait 3 hours) I only see the half moreless. I think some of the queries are lost in the way, but I do not know why...

问题是我有386213行或“节点”要引入,但是当执行(并等待3个小时)时,我只看到一半没有更多。我认为有些问题会丢失,但我不知道为什么......

I am using npm node-neo4j package for the connection and that.

我正在使用npm node-neo4j包进行连接。

Here my node.js code:

这是我的node.js代码:

    var neo4j = require('neo4j');
    var readline = require("readline");
    var fs = require("fs")

    var db = new neo4j.GraphDatabase('http://neo4j:Gemitis26@localhost:7474');

    var rl = readline.createInterface({
       input: fs.createReadStream('C:/Users/RRamos/Documents/Projects/test-neo4j/Files/kaggle_songs.txt')
    });

    var i=1;

    rl.on('line', function (line) {
        var str = line.split(" ");
        db.cypher({
            query: "CREATE (:Song {id: '{line1}', num_id: {line2}})",
            params: {
            line1: str[0],
            line2: str[1],
            },
        }, callback);
        console.log(i + " " + "CREATE (:Song {id: '"+str[0]+"', num_id: "+str[1]+"})");
        i = i+1;
    });


    function callback(err, results){
        if(err) throw err;

    }

1 个解决方案

#1


0  

Making 386213 separate Cypher REST queries (in separate transactions) is probably the slowest possible way to create such a large number of nodes.

制作386213单独的Cypher REST查询(在单独的事务中)可能是创建如此大量节点的最慢的方法。

There are at least 3 better ways (in order of increasing performance):

至少有3种更好的方法(按性能提升的顺序):

  1. Create multiple nodes at a time by sending as a parameter an array containing the data for multiple nodes. For example, you can create 8 nodes by sending this array parameter: [['a', 1],['b', 2],['c', 3],['d', 4],['e', 5],['f', 6],['g', 7],['h', 8]], and using this query:

    通过将包含多个节点数据的数组作为参数发送,一次创建多个节点。例如,您可以通过发送此数组参数来创建8个节点:[['a',1],['b',2],['c',3],['d',4],['e ',5],['f',6],['g',7],['h',8]],并使用此查询:

    UNWIND {data} AS d
    CREATE (:Song {id: d[0], num_id: d[0]})
    
  2. You can use the LOAD CSV clause to create the nodes. Since your input file seems to use a space to separate node property values, this might work for you:

    您可以使用LOAD CSV子句创建节点。由于您的输入文件似乎使用空格来分隔节点属性值,因此这可能对您有用:

    LOAD CSV FROM 'file:///C:/Users/RRamos/Documents/Projects/test-neo4j/Files/kaggle_songs.txt' AS line
    FIELDTERMINATOR ' '
    CREATE (:Song {id: line[0], num_id: line[1]})
    
  3. For even better performance, you could use the Import tool, which is a command line tool for initializing a new DB.

    为了获得更好的性能,您可以使用导入工具,这是一个用于初始化新数据库的命令行工具。

#1


0  

Making 386213 separate Cypher REST queries (in separate transactions) is probably the slowest possible way to create such a large number of nodes.

制作386213单独的Cypher REST查询(在单独的事务中)可能是创建如此大量节点的最慢的方法。

There are at least 3 better ways (in order of increasing performance):

至少有3种更好的方法(按性能提升的顺序):

  1. Create multiple nodes at a time by sending as a parameter an array containing the data for multiple nodes. For example, you can create 8 nodes by sending this array parameter: [['a', 1],['b', 2],['c', 3],['d', 4],['e', 5],['f', 6],['g', 7],['h', 8]], and using this query:

    通过将包含多个节点数据的数组作为参数发送,一次创建多个节点。例如,您可以通过发送此数组参数来创建8个节点:[['a',1],['b',2],['c',3],['d',4],['e ',5],['f',6],['g',7],['h',8]],并使用此查询:

    UNWIND {data} AS d
    CREATE (:Song {id: d[0], num_id: d[0]})
    
  2. You can use the LOAD CSV clause to create the nodes. Since your input file seems to use a space to separate node property values, this might work for you:

    您可以使用LOAD CSV子句创建节点。由于您的输入文件似乎使用空格来分隔节点属性值,因此这可能对您有用:

    LOAD CSV FROM 'file:///C:/Users/RRamos/Documents/Projects/test-neo4j/Files/kaggle_songs.txt' AS line
    FIELDTERMINATOR ' '
    CREATE (:Song {id: line[0], num_id: line[1]})
    
  3. For even better performance, you could use the Import tool, which is a command line tool for initializing a new DB.

    为了获得更好的性能,您可以使用导入工具,这是一个用于初始化新数据库的命令行工具。