nodejs:单独文件中的每一行

时间:2021-12-21 22:37:09

I want to split a file: each line in a separate file. The initial file is really big. I finished with code bellow:

我想拆分文件:每行在一个单独的文件中。初始文件非常大。我完成了以下代码:

var fileCounter = -1;

function getWritable() {
      fileCounter++;
      writable = fs.createWriteStream('data/part'+ fileCounter + '.txt', {flags:'w'});
      return writable;
}

var readable = fs.createReadStream(file).pipe(split());
readable.on('data', function (line) {
    var flag = getWritable().write(line, function() {
      readable.resume();
    });
    if (!flag) {
      readable.pause();
    }
});

It works but it is ugly. Is there more nodish way to do that? maybe with piping and without pause/resume.

它有效,但很难看。是否有更多的方法可以做到这一点?也许有管道,没有暂停/恢复。

NB: it's not a question about lines/files/etc . The question is about streams and I just try to illustrate it with the problem

注意:这不是关于行/文件/等的问题。问题是关于流,我只是试图用问题说明它

2 个解决方案

#1


You can use Node's built-in readline module.

您可以使用Node的内置readline模块。

var fs = require('fs');
var readline = require('readline');
var fileCounter = -1;

var file = "foo.txt";
readline.createInterface({
    input: fs.createReadStream(file),
    terminal: false
}).on('line', function(line) {
   var writable = fs.createWriteStream('data/part'+ fileCounter + '.txt', {flags:'w'});
   writable.write(line);
   fileCounter++
});

Note that this will lose the last line of the file if there is no newline at the end, so make sure your last line of data is followed by a newline.

请注意,如果最后没有换行符,则会丢失文件的最后一行,因此请确保最后一行数据后跟换行符。

Also note that the docs indicate that it is Stability index 2, meaning:

另请注意,文档表明它是稳定性指数2,意思是:

Stability: 2 - Unstable The API is in the process of settling, but has not yet had sufficient real-world testing to be considered stable. Backwards-compatibility will be maintained if reasonable.

稳定性:2 - 不稳定API正处于稳定阶段,但还没有足够的实际测试才能被认为是稳定的。如果合理,将保持向后兼容性。

#2


How about the following? Did you try? Pause and resume logic isn't realy needed here.

以下怎么样?你试过了吗?这里不需要暂停和恢复逻辑。

var split = require('split');
var fs = require('fs');
var fileCounter = -1;

var readable = fs.createReadStream(file).pipe(split());
readable.on('data', function (line) {
    fileCounter++;
    var writable = fs.createWriteStream('data/part'+ fileCounter + '.txt', {flags:'w'});
    writable.write(line);
    writable.close();
});

Piping dynamically would be hard...

动态管道很难......


EDIT: You could create a writable (so pipe()able) object that would, on('data') event, do the "create file, open it, write the data, close it" but it :

编辑:你可以创建一个可写的(所以管道())能够对象,在('数据')事件,做“创建文件,打开它,写入数据,关闭它”,但它:

  • wouldn't be reusable
  • 不会重复使用

  • wouldn't follow the KISS principle
  • 不会遵循KISS原则

  • would require a special and specific logic for file naming (It would accept a string pattern as an argument in its constructor with a placeholder for the number. Etc...)
  • 将需要一个特殊的和特定的文件命名逻辑(它将接受一个字符串模式作为其构造函数中的参数与一个占位符的数字。等...)

I realy don't recommend that path, or you're going to take ages implementing a non-realy-reusable module. Though, that would make a good writable implementation exercise.

我真的不建议使用这条路径,否则你将花费很长时间来实现一个非实际可重用的模块。虽然,这将是一个很好的可写实施练习。

#1


You can use Node's built-in readline module.

您可以使用Node的内置readline模块。

var fs = require('fs');
var readline = require('readline');
var fileCounter = -1;

var file = "foo.txt";
readline.createInterface({
    input: fs.createReadStream(file),
    terminal: false
}).on('line', function(line) {
   var writable = fs.createWriteStream('data/part'+ fileCounter + '.txt', {flags:'w'});
   writable.write(line);
   fileCounter++
});

Note that this will lose the last line of the file if there is no newline at the end, so make sure your last line of data is followed by a newline.

请注意,如果最后没有换行符,则会丢失文件的最后一行,因此请确保最后一行数据后跟换行符。

Also note that the docs indicate that it is Stability index 2, meaning:

另请注意,文档表明它是稳定性指数2,意思是:

Stability: 2 - Unstable The API is in the process of settling, but has not yet had sufficient real-world testing to be considered stable. Backwards-compatibility will be maintained if reasonable.

稳定性:2 - 不稳定API正处于稳定阶段,但还没有足够的实际测试才能被认为是稳定的。如果合理,将保持向后兼容性。

#2


How about the following? Did you try? Pause and resume logic isn't realy needed here.

以下怎么样?你试过了吗?这里不需要暂停和恢复逻辑。

var split = require('split');
var fs = require('fs');
var fileCounter = -1;

var readable = fs.createReadStream(file).pipe(split());
readable.on('data', function (line) {
    fileCounter++;
    var writable = fs.createWriteStream('data/part'+ fileCounter + '.txt', {flags:'w'});
    writable.write(line);
    writable.close();
});

Piping dynamically would be hard...

动态管道很难......


EDIT: You could create a writable (so pipe()able) object that would, on('data') event, do the "create file, open it, write the data, close it" but it :

编辑:你可以创建一个可写的(所以管道())能够对象,在('数据')事件,做“创建文件,打开它,写入数据,关闭它”,但它:

  • wouldn't be reusable
  • 不会重复使用

  • wouldn't follow the KISS principle
  • 不会遵循KISS原则

  • would require a special and specific logic for file naming (It would accept a string pattern as an argument in its constructor with a placeholder for the number. Etc...)
  • 将需要一个特殊的和特定的文件命名逻辑(它将接受一个字符串模式作为其构造函数中的参数与一个占位符的数字。等...)

I realy don't recommend that path, or you're going to take ages implementing a non-realy-reusable module. Though, that would make a good writable implementation exercise.

我真的不建议使用这条路径,否则你将花费很长时间来实现一个非实际可重用的模块。虽然,这将是一个很好的可写实施练习。