大的CSV到JSON/对象在Node.js

时间:2022-05-06 22:02:19

I am trying to do something that seems like it should not only be fairly simple to accomplish but a common enough task that there would be straightforward packages available to do it. I wish to take a large CSV file (an export from a relational database table) and convert it to an array of JavaScript objects. Furthermore, I would like to export it to a .json file fixture.

我正在尝试做一些看起来不应该仅仅是简单的完成的事情,而是一项很普通的任务,可以有简单的包来完成它。我希望获取一个大型的CSV文件(从关系数据库表导出),并将其转换为一个JavaScript对象数组。此外,我还想将它导出到.json文件fixture中。

Example CSV:

例CSV:

a,b,c,d
1,2,3,4
5,6,7,8
...

Desired JSON:

想要的JSON:

[
{"a": 1,"b": 2,"c": 3,"d": 4},
{"a": 5,"b": 6,"c": 7,"d": 8},
...
]

I've tried several node CSV parsers, streamers, self-proclaimed CSV-to-JSON libraries, but I can't seem to get the result I want, or if I can it only works if the files are smaller. My file is nearly 1 GB in size with ~40m rows (which would create 40m objects). I expect that it would require streaming the input and/or output to avoid memory problems.

我尝试过几个节点CSV解析器、流、自定义的CSV-to- json库,但是我似乎不能得到我想要的结果,或者如果我可以,它只能在文件更小的情况下才能工作。我的文件大小接近1gb,行数约为40m(这会创建40m对象)。我期望它将需要流化输入和/或输出以避免内存问题。

Here are the packages I've tried:

这是我试过的包装:

I'm using Node 0.10.6 and would like a recommendation on how to easily accomplish this. Rolling my own might be best but I'm not sure where to begin with all of Node's streaming features, especially since they changed the API in 0.10.x.

我使用的是节点0.10.6,我想要一个关于如何轻松实现这一点的建议。使用自己的工具可能是最好的,但是我不确定从哪里开始使用Node的所有流特性,特别是因为它们在0.10.x中修改了API。

6 个解决方案

#1


3  

While this is far from a complete answer, you may be able to base your solution on https://github.com/dominictarr/event-stream . Adapted example from the readme:

虽然这还不是一个完整的答案,但是您可以将您的解决方案建立在https://github.com/dominictarr/event-stream上。改编自自述:

    var es = require('event-stream')
    es.pipeline(                         //connect streams together with `pipe`
      process.openStdin(),              //open stdin
      es.split(),                       //split stream to break on newlines
      es.map(function (data, callback) { //turn this async function into a stream
        callback(null
          , JSON.stringify(parseCSVLine(data)))  // deal with one line of CSV data
      }), 
      process.stdout
      )

After that, I expect you have a bunch of stringified JSON objects on each line. This then needs to be converted to an array, which you may be able to do with and appending , to end of every line, removing it on the last, and then adding [ and ] to beginning and end of the file.

在此之后,我希望在每一行上都有一些经过字符串化的JSON对象。然后需要将它转换为一个数组,您可以使用并附加到每一行的末尾,在最后一行删除它,然后在文件的开头和结尾添加[and]。

parseCSVLine function must be configured to assign the CSV values to the right object properties. This can be fairly easily done after passing the first line of the file.

必须配置parseCSVLine函数,将CSV值分配给正确的对象属性。在传递文件的第一行之后,这是很容易做到的。

I do notice the library is not tested on 0.10 (at least not with Travis), so beware. Maybe run npm test on the source yourself.

我注意到这个库没有在0.10上测试(至少在Travis上没有),所以要小心。可能自己在源代码上运行npm测试。

#2


7  

Check node.js csvtojson module which can be used as a library, command line tools, or web server plugin. https://www.npmjs.org/package/csvtojson. the source code can be found at: https://github.com/Keyang/node-csvtojson

检查节点。js csvtojson模块,可作为库、命令行工具或web服务器插件使用。https://www.npmjs.org/package/csvtojson。源代码可以在:https://github.com/Keyang/node-csvtojson找到

or install from NPM repo:

或从NPM repo安装:

npm install -g csvtojson

It supports any size csv data / field type / nested json etc. A bunch of features.

它支持任何大小的csv数据/字段类型/嵌套json等特性。

Example

例子

var Converter=require("csvtojson").core.Converter;

var csvConverter=new Converter({constructResult:false, toArrayString:true}); // The constructResult parameter=false will turn off final result construction in memory for stream feature. toArrayString will stream out a normal JSON array object.

var readStream=require("fs").createReadStream("inputData.csv"); 

var writeStream=require("fs").createWriteStream("outpuData.json");

readStream.pipe(csvConverter).pipe(writeStream);

You can also use it as a cli tool:

您还可以将其用作cli工具:

csvtojson myCSVFile.csv

#3


3  

I found something more easier way to read csv data using csvtojson.

我找到了一种更简单的方法来使用csvtojson读取csv数据。

Here's the code:

这是代码:

var Converter = require("csvtojson").Converter;
var converter = new Converter({});
converter.fromFile("sample.csv",function(err,result){
  var csvData = JSON.stringify
  ([
    {resultdata : result[0]},
    {resultdata : result[1]},
    {resultdata : result[2]},
    {resultdata : result[3]},
    {resultdata : result[4]}
  ]);
  csvData = JSON.parse(csvData);
  console.log(csvData);
});

or you can easily do this:

或者你可以很容易做到:

var Converter = require("csvtojson").Converter;
var converter = new Converter({});
converter.fromFile("sample.csv",function(err,result){ 
  console.log(result);
});

Here's the result from the 1st code:

这是第一个代码的结果:

[ { resultdata: 
     { 'Header 1': 'A_1',
       'Header 2': 'B_1',
       'Header 3': 'C_1',
       'Header 4': 'D_1',
       'Header 5': 'E_1' } },
  { resultdata: 
     { 'Header 1': 'A_2',
       'Header 2': 'B_2',
       'Header 3': 'C_2',
       'Header 4': 'D_2',
       'Header 5': 'E_2' } },
  { resultdata: 
     { 'Header 1': 'A_3',
       'Header 2': 'B_3',
       'Header 3': 'C_3',
       'Header 4': 'D_3',
       'Header 5': 'E_3' } },
  { resultdata: 
     { 'Header 1': 'A_4',
       'Header 2': 'B_4',
       'Header 3': 'C_4',
       'Header 4': 'D_4',
       'Header 5': 'E_4' } },
  { resultdata: 
     { 'Header 1': 'A_5',
       'Header 2': 'B_5',
       'Header 3': 'C_5',
       'Header 4': 'D_5',
       'Header 5': 'E_5' } } ]

Source of this code is found in: https://www.npmjs.com/package/csvtojson#installation

此代码的源代码可以在:https://www.npmjs.com/package/csvtojson#安装中找到

I hope you got some idea.

我希望你有一些想法。

#4


0  

I recommend implementing the logic yourself. Node.js is actually pretty good at these kinds of tasks.

我建议您自己实现这个逻辑。节点。js实际上很擅长这类任务。

The following solution is using streams since they won't blow up your memory.

下面的解决方案是使用流,因为它们不会损坏您的内存。

Install Dependencies

npm install through2 split2 --save

Code

import through2 from 'through2'
import split2 from 'split2'

fs.createReadStream('<yourFilePath>')
  // Read line by line
  .pipe(split2())
  // Parse CSV line
  .pipe(parseCSV()) 
  // Process your Records
  .pipe(processRecord()) 

const parseCSV = () => {
  let templateKeys = []
  let parseHeadline = true
  return through2.obj((data, enc, cb) => {
    if (parseHeadline) {
      templateKeys = data
        .toString()
        .split(';')
      parseHeadline = false
      return cb(null, null)
    }
    const entries = data
      .toString()
      .split(';')
    const obj = {}
    templateKeys.forEach((el, index) => {
      obj[el] = entries[index]
    })
    return cb(null, obj)
  })
}

const processRecord = () => {
  return through2.obj(function (data, enc, cb) {
    // Implement your own processing 
    // logic here e.g.:
    MyDB
      .insert(data)
      .then(() => cb())
      .catch(cb)
  })
}

For more infos about this topic visit Stefan Baumgartners excellent tutorial on this topic.

关于这个主题的更多信息,请访问Stefan Baumgartners关于这个主题的优秀教程。

#5


0  

You can use streams so that you ca process big files. Here is what you need to do. This should work just fine.

您可以使用流来处理大文件。这是你需要做的。这应该没问题。

npm i --save csv2json fs-extra // install the modules

const csv2json = require('csv2json');
const fs = require('fs-extra');

const source = fs.createReadStream(__dirname + '/data.csv');
const output = fs.createWriteStream(__dirname + '/result.json');
 source
   .pipe(csv2json())
   .pipe(output );

#6


0  

Hmm... lots of solutions, I'll add one more with scramjet:

嗯…有很多解决方案,我将在超燃冲压发动机上再添加一个:

$ npm install --save scramjet

And then

然后

process.stdin.pipe(
    new (require("scramjet").StringStream)("utf-8")
)
    .CSVParse()
    .toJSONArray()
    .pipe(process.stdout)

This will result in exactly what you described in a streamed way.

这将导致您以流方式描述的内容。

#1


3  

While this is far from a complete answer, you may be able to base your solution on https://github.com/dominictarr/event-stream . Adapted example from the readme:

虽然这还不是一个完整的答案,但是您可以将您的解决方案建立在https://github.com/dominictarr/event-stream上。改编自自述:

    var es = require('event-stream')
    es.pipeline(                         //connect streams together with `pipe`
      process.openStdin(),              //open stdin
      es.split(),                       //split stream to break on newlines
      es.map(function (data, callback) { //turn this async function into a stream
        callback(null
          , JSON.stringify(parseCSVLine(data)))  // deal with one line of CSV data
      }), 
      process.stdout
      )

After that, I expect you have a bunch of stringified JSON objects on each line. This then needs to be converted to an array, which you may be able to do with and appending , to end of every line, removing it on the last, and then adding [ and ] to beginning and end of the file.

在此之后,我希望在每一行上都有一些经过字符串化的JSON对象。然后需要将它转换为一个数组,您可以使用并附加到每一行的末尾,在最后一行删除它,然后在文件的开头和结尾添加[and]。

parseCSVLine function must be configured to assign the CSV values to the right object properties. This can be fairly easily done after passing the first line of the file.

必须配置parseCSVLine函数,将CSV值分配给正确的对象属性。在传递文件的第一行之后,这是很容易做到的。

I do notice the library is not tested on 0.10 (at least not with Travis), so beware. Maybe run npm test on the source yourself.

我注意到这个库没有在0.10上测试(至少在Travis上没有),所以要小心。可能自己在源代码上运行npm测试。

#2


7  

Check node.js csvtojson module which can be used as a library, command line tools, or web server plugin. https://www.npmjs.org/package/csvtojson. the source code can be found at: https://github.com/Keyang/node-csvtojson

检查节点。js csvtojson模块,可作为库、命令行工具或web服务器插件使用。https://www.npmjs.org/package/csvtojson。源代码可以在:https://github.com/Keyang/node-csvtojson找到

or install from NPM repo:

或从NPM repo安装:

npm install -g csvtojson

It supports any size csv data / field type / nested json etc. A bunch of features.

它支持任何大小的csv数据/字段类型/嵌套json等特性。

Example

例子

var Converter=require("csvtojson").core.Converter;

var csvConverter=new Converter({constructResult:false, toArrayString:true}); // The constructResult parameter=false will turn off final result construction in memory for stream feature. toArrayString will stream out a normal JSON array object.

var readStream=require("fs").createReadStream("inputData.csv"); 

var writeStream=require("fs").createWriteStream("outpuData.json");

readStream.pipe(csvConverter).pipe(writeStream);

You can also use it as a cli tool:

您还可以将其用作cli工具:

csvtojson myCSVFile.csv

#3


3  

I found something more easier way to read csv data using csvtojson.

我找到了一种更简单的方法来使用csvtojson读取csv数据。

Here's the code:

这是代码:

var Converter = require("csvtojson").Converter;
var converter = new Converter({});
converter.fromFile("sample.csv",function(err,result){
  var csvData = JSON.stringify
  ([
    {resultdata : result[0]},
    {resultdata : result[1]},
    {resultdata : result[2]},
    {resultdata : result[3]},
    {resultdata : result[4]}
  ]);
  csvData = JSON.parse(csvData);
  console.log(csvData);
});

or you can easily do this:

或者你可以很容易做到:

var Converter = require("csvtojson").Converter;
var converter = new Converter({});
converter.fromFile("sample.csv",function(err,result){ 
  console.log(result);
});

Here's the result from the 1st code:

这是第一个代码的结果:

[ { resultdata: 
     { 'Header 1': 'A_1',
       'Header 2': 'B_1',
       'Header 3': 'C_1',
       'Header 4': 'D_1',
       'Header 5': 'E_1' } },
  { resultdata: 
     { 'Header 1': 'A_2',
       'Header 2': 'B_2',
       'Header 3': 'C_2',
       'Header 4': 'D_2',
       'Header 5': 'E_2' } },
  { resultdata: 
     { 'Header 1': 'A_3',
       'Header 2': 'B_3',
       'Header 3': 'C_3',
       'Header 4': 'D_3',
       'Header 5': 'E_3' } },
  { resultdata: 
     { 'Header 1': 'A_4',
       'Header 2': 'B_4',
       'Header 3': 'C_4',
       'Header 4': 'D_4',
       'Header 5': 'E_4' } },
  { resultdata: 
     { 'Header 1': 'A_5',
       'Header 2': 'B_5',
       'Header 3': 'C_5',
       'Header 4': 'D_5',
       'Header 5': 'E_5' } } ]

Source of this code is found in: https://www.npmjs.com/package/csvtojson#installation

此代码的源代码可以在:https://www.npmjs.com/package/csvtojson#安装中找到

I hope you got some idea.

我希望你有一些想法。

#4


0  

I recommend implementing the logic yourself. Node.js is actually pretty good at these kinds of tasks.

我建议您自己实现这个逻辑。节点。js实际上很擅长这类任务。

The following solution is using streams since they won't blow up your memory.

下面的解决方案是使用流,因为它们不会损坏您的内存。

Install Dependencies

npm install through2 split2 --save

Code

import through2 from 'through2'
import split2 from 'split2'

fs.createReadStream('<yourFilePath>')
  // Read line by line
  .pipe(split2())
  // Parse CSV line
  .pipe(parseCSV()) 
  // Process your Records
  .pipe(processRecord()) 

const parseCSV = () => {
  let templateKeys = []
  let parseHeadline = true
  return through2.obj((data, enc, cb) => {
    if (parseHeadline) {
      templateKeys = data
        .toString()
        .split(';')
      parseHeadline = false
      return cb(null, null)
    }
    const entries = data
      .toString()
      .split(';')
    const obj = {}
    templateKeys.forEach((el, index) => {
      obj[el] = entries[index]
    })
    return cb(null, obj)
  })
}

const processRecord = () => {
  return through2.obj(function (data, enc, cb) {
    // Implement your own processing 
    // logic here e.g.:
    MyDB
      .insert(data)
      .then(() => cb())
      .catch(cb)
  })
}

For more infos about this topic visit Stefan Baumgartners excellent tutorial on this topic.

关于这个主题的更多信息,请访问Stefan Baumgartners关于这个主题的优秀教程。

#5


0  

You can use streams so that you ca process big files. Here is what you need to do. This should work just fine.

您可以使用流来处理大文件。这是你需要做的。这应该没问题。

npm i --save csv2json fs-extra // install the modules

const csv2json = require('csv2json');
const fs = require('fs-extra');

const source = fs.createReadStream(__dirname + '/data.csv');
const output = fs.createWriteStream(__dirname + '/result.json');
 source
   .pipe(csv2json())
   .pipe(output );

#6


0  

Hmm... lots of solutions, I'll add one more with scramjet:

嗯…有很多解决方案,我将在超燃冲压发动机上再添加一个:

$ npm install --save scramjet

And then

然后

process.stdin.pipe(
    new (require("scramjet").StringStream)("utf-8")
)
    .CSVParse()
    .toJSONArray()
    .pipe(process.stdout)

This will result in exactly what you described in a streamed way.

这将导致您以流方式描述的内容。