从节点中解析的csv文件构建对象数组

时间:2022-11-19 21:34:12

I have multiple csv files of the form

我有这个表单的多个csv文件

  • model1A
  • model1A
  • model1B
  • model1B
  • model2A
  • model2A
  • model2B
  • model2B

where each csv is an array i.e. model1A = [1, 1, 1]

其中每个csv都是一个数组,例如model1A = [1,1,1]

I want to parse these csvs and create a single array containing all these models, where each element in the array is an object corresponding to one particular model, i.e.

我想要解析这些csv并创建一个包含所有这些模型的数组,其中数组中的每个元素都是与一个特定模型对应的对象,例如。

finalArray = [ 
  { 
    "model" :   "model1",
    "A"     :   [1, 1, 1],
    "B"     :   [2, 2, 2]
  },
  { 
    "model" :   "model2",
    "A"     :   [3, 3, 3],
    "B"     :   [4, 4, 4]
  }
]

The code I have so far is

到目前为止我的代码是

var csv = require('csv');
var fs = require('fs');
var parser = csv.parse();
var util = require('util');
var junk = require('junk');
var _ = require('lodash');
var models = [];


fs.readdir(__dirname+'/data', function(err, files) {
    var model = {};
    _.forEach(files, function(n, key) {

        console.log('Analysing file: ' + n);
        var modelName;
        var modelNum;
        var modelParam;


        modelNum = n.match(/\d+/)[0];
        modelName = 'model' + modelNum;
        modelParam = (n.substring(0, n.indexOf('.'))).replace(modelName,'');

        model.model = modelName;
        model[modelParam] = [];
        models.push(model);

        //if (Object.keys(model).length === 3) {
        //    models.push(model);
        //    model = {};
        //}


        fs.createReadStream(__dirname+'/data/'+n).pipe(csv.parse()).pipe(csv.transform(function(row) {
            model[modelParam].push(row);

        })).on('readable', function(){
            while(this.read()){}
        }).on('end', function() {
            console.log('finished reading file ' + n);
            if (key === (files.length - 1)) {
                fs.writeFile('result.json', JSON.stringify(models), function (err) {
                    if (err) throw err;
                    console.log(models.length + ' model(s) parsed');
                    console.log('done');
                });
            }

        }).on('error', function(error) {
            console.log(error);
        });    
    });
});

I know one of my issues is I am pushing the model to the array to soon, resulting in a final array of the form below, where model1 is being overwritten by model2

我知道我的一个问题是,我正在将模型推到数组中,从而导致了下面的表单的最后一个数组,其中model2正在重写模型1。

[ { model: 'model2', A: [], B: [] },
  { model: 'model2', A: [], B: [] },
  { model: 'model2', A: [], B: [] },
  { model: 'model2', A: [], B: [] } ]

That's why I tried this code

这就是为什么我尝试了这个代码

if (Object.keys(model).length === 3) {
  models.push(model);
  model = {};
}

but of course this couldn't work because the fs.createReadStream is async and I am clearing the model with model = {} before it can run properly.

但当然,这是行不通的,因为f。createReadStream是异步的,我正在用model ={}清除模型,然后才能正常运行。

I'm at the stage now where I feel I'm going around in circles and just making things worse. I wanted to create something more generic, however, now I would be delighted to get it working for the case presented here and then I can look at improving it.

我现在正处在这样一个阶段,我觉得自己在兜圈子,让事情变得更糟。我想要创建一些更通用的东西,但是,现在我很高兴让它适用于这里的案例,然后我可以考虑改进它。

Any help would be really appreciated!

非常感谢您的帮助!


Update 1

Following saquib khan's suggestion of moving the var model = {} inside the loop has helped get me closer to my goal, but it's still not right. Below is the current result

按照saquib khan的建议,在循环中移动var模型={}帮助我更接近我的目标,但这仍然是不对的。下面是当前的结果

[
    {
        "model": "model1",
        "A": [
            [
                "1"
            ],
            [
                "2"
            ],
            [
                "3"
            ],
            [
                "4"
            ]
        ]
    },
    {
        "model": "model1",
        "B": [
            [
                "1"
            ],
            [
                "2"
            ],
            [
                "3"
            ],
            [
                "4"
            ]
        ]
    },
    {
        "model": "model2",
        "A": [
            [
                "1"
            ],
            [
                "2"
            ],
            [
                "3"
            ],
            [
                "4"
            ]
        ]
    },
    {
        "model": "model2",
        "B": [
            [
                "1"
            ],
            [
                "2"
            ],
            [
                "3"
            ],
            [
                "4"
            ]
        ]
    }
]

Update 2

Also following Denys Denysiuk's suggestion, the result is closer to what I want, but still just short

同样是按照Denys Denysiuk的建议,结果更接近我想要的,但仍然很短

[
    {
        "model": "model1",
        "A": [
            "1",
            "2",
            "3",
            "4"
        ]
    },
    {
        "model": "model1",
        "B": [
            "1",
            "2",
            "3",
            "4"
        ]
    },
    {
        "model": "model2",
        "A": [
            "1",
            "2",
            "3",
            "4"
        ]
    },
    {
        "model": "model2",
        "B": [
            "1",
            "2",
            "3",
            "4"
        ]
    }
]

This would work, if I could just somehow iterate over that final array of objects, merging objects with a matching model name. I'm currently looking through the lodash docs to see if I can figure something out. I will post back here if I do.

如果我能以某种方式遍历最终的对象数组,并将对象与匹配的模型名合并,那么这将是可行的。我目前正在查看lodash docs,看我是否能找到解决方案。如果我这么做的话,我就把它寄回来。

3 个解决方案

#1


2  

Try this out:

试试这个:

fs.readdir(__dirname+'/data', function(err, files) {

    _.forEach(files, function(n, key) {

        console.log('Analysing file: ' + n);            

        var modelNum = n.match(/\d+/)[0];
        var modelName = 'model' + modelNum;
        var modelParam = (n.substring(0, n.indexOf('.'))).replace(modelName,'');

        var model = {};
        var isNewModel = true;
        for(var i = 0; i < models.length; i++) {
            if(models[i].model == modelName) {
               model = models[i];
               isNewModel = false;
               break;
            }
        }
        if(isNewModel) {
            model.model = modelName;
            models.push(model);
        }

        model[modelParam] = [];

        fs.createReadStream(__dirname+'/data/'+n).pipe(csv.parse()).pipe(csv.transform(function(row) {
            model[modelParam].push(row[0]);

        })).on('readable', function(){
            while(this.read()){}
        }).on('end', function() {
            console.log('finished reading file ' + n);
            if (key === (files.length - 1)) {
                fs.writeFile('result.json', JSON.stringify(models), function (err) {
                    if (err) throw err;
                    console.log(models.length + ' model(s) parsed');
                    console.log('done');
                });
            }

        }).on('error', function(error) {
            console.log(error);
        });    
    });

#2


3  

There is a very small coding error in your code.

代码中有一个很小的编码错误。

var model = {}; should be inside forEach loop.

var模型= { };应该在forEach循环中。

Try below code:

试试下面的代码:

var csv = require('csv');
var fs = require('fs');
var parser = csv.parse();
var util = require('util');
var junk = require('junk');
var _ = require('lodash');
var models = [];


fs.readdir(__dirname+'/data', function(err, files) {

    _.forEach(files, function(n, key) {

        console.log('Analysing file: ' + n);
        var model = {};
        var modelName;
        var modelNum;
        var modelParam;


        modelNum = n.match(/\d+/)[0];
        modelName = 'model' + modelNum;
        modelParam = (n.substring(0, n.indexOf('.'))).replace(modelName,'');

        model.model = modelName;
        model[modelParam] = [];
        models.push(model);

        //if (Object.keys(model).length === 3) {
        //    models.push(model);
        //    model = {};
        //}


        fs.createReadStream(__dirname+'/data/'+n).pipe(csv.parse()).pipe(csv.transform(function(row) {
            model[modelParam].push(row);

        })).on('readable', function(){
            while(this.read()){}
        }).on('end', function() {
            console.log('finished reading file ' + n);
            if (key === (files.length - 1)) {
                fs.writeFile('result.json', JSON.stringify(models), function (err) {
                    if (err) throw err;
                    console.log(models.length + ' model(s) parsed');
                    console.log('done');
                });
            }

        }).on('error', function(error) {
            console.log(error);
        });    
    });
});

#3


1  

Node.js is event driven so maybe you could base your code using the Event module: https://nodejs.org/api/events.html

节点。js是事件驱动的,所以您可以使用事件模块:https://nodejs.org/api/events.html来编写代码

Your problem seems like you are overriding previous entries in your array, so maybe you should go to the next step (reading the other CSV ?) only when the previous one has finished to write everything it needed to.

您的问题似乎是要在数组中重写之前的条目,所以您应该只在前一个条目完成编写所需的所有内容时,才进入下一个步骤(读取其他的CSV ?)。

You can add this logic to your code with Event.

您可以将此逻辑添加到带有事件的代码中。

#1


2  

Try this out:

试试这个:

fs.readdir(__dirname+'/data', function(err, files) {

    _.forEach(files, function(n, key) {

        console.log('Analysing file: ' + n);            

        var modelNum = n.match(/\d+/)[0];
        var modelName = 'model' + modelNum;
        var modelParam = (n.substring(0, n.indexOf('.'))).replace(modelName,'');

        var model = {};
        var isNewModel = true;
        for(var i = 0; i < models.length; i++) {
            if(models[i].model == modelName) {
               model = models[i];
               isNewModel = false;
               break;
            }
        }
        if(isNewModel) {
            model.model = modelName;
            models.push(model);
        }

        model[modelParam] = [];

        fs.createReadStream(__dirname+'/data/'+n).pipe(csv.parse()).pipe(csv.transform(function(row) {
            model[modelParam].push(row[0]);

        })).on('readable', function(){
            while(this.read()){}
        }).on('end', function() {
            console.log('finished reading file ' + n);
            if (key === (files.length - 1)) {
                fs.writeFile('result.json', JSON.stringify(models), function (err) {
                    if (err) throw err;
                    console.log(models.length + ' model(s) parsed');
                    console.log('done');
                });
            }

        }).on('error', function(error) {
            console.log(error);
        });    
    });

#2


3  

There is a very small coding error in your code.

代码中有一个很小的编码错误。

var model = {}; should be inside forEach loop.

var模型= { };应该在forEach循环中。

Try below code:

试试下面的代码:

var csv = require('csv');
var fs = require('fs');
var parser = csv.parse();
var util = require('util');
var junk = require('junk');
var _ = require('lodash');
var models = [];


fs.readdir(__dirname+'/data', function(err, files) {

    _.forEach(files, function(n, key) {

        console.log('Analysing file: ' + n);
        var model = {};
        var modelName;
        var modelNum;
        var modelParam;


        modelNum = n.match(/\d+/)[0];
        modelName = 'model' + modelNum;
        modelParam = (n.substring(0, n.indexOf('.'))).replace(modelName,'');

        model.model = modelName;
        model[modelParam] = [];
        models.push(model);

        //if (Object.keys(model).length === 3) {
        //    models.push(model);
        //    model = {};
        //}


        fs.createReadStream(__dirname+'/data/'+n).pipe(csv.parse()).pipe(csv.transform(function(row) {
            model[modelParam].push(row);

        })).on('readable', function(){
            while(this.read()){}
        }).on('end', function() {
            console.log('finished reading file ' + n);
            if (key === (files.length - 1)) {
                fs.writeFile('result.json', JSON.stringify(models), function (err) {
                    if (err) throw err;
                    console.log(models.length + ' model(s) parsed');
                    console.log('done');
                });
            }

        }).on('error', function(error) {
            console.log(error);
        });    
    });
});

#3


1  

Node.js is event driven so maybe you could base your code using the Event module: https://nodejs.org/api/events.html

节点。js是事件驱动的,所以您可以使用事件模块:https://nodejs.org/api/events.html来编写代码

Your problem seems like you are overriding previous entries in your array, so maybe you should go to the next step (reading the other CSV ?) only when the previous one has finished to write everything it needed to.

您的问题似乎是要在数组中重写之前的条目,所以您应该只在前一个条目完成编写所需的所有内容时,才进入下一个步骤(读取其他的CSV ?)。

You can add this logic to your code with Event.

您可以将此逻辑添加到带有事件的代码中。