Node.js:如何将流读入缓冲区?

时间:2021-11-30 02:30:59

I wrote a pretty simple function that downloads an image from a given URL, resize it and upload to S3 (using 'gm' and 'knox'), I have no idea if I'm doing the reading of a stream to a buffer correctly. (everything is working, but is it the correct way?)

我写了一个非常简单的函数,从给定的URL下载图像,调整大小并上传到S3(使用'gm'和'knox'),我不知道我是否正在正确读取缓冲区的流。 (一切正常,但这是正确的方法吗?)

also, I want to understand something about the event loop, how do I know that one invocation of the function won't leak anything or change the 'buf' variable to another already running invocation (or this scenario is impossible because the callbacks are anonymous functions?)

另外,我想了解一下事件循环,我怎么知道函数的一次调用不会泄漏任何东西或者将'buf'变量更改为另一个已经运行的调用(或者这种情况是不可能的,因为回调是匿名的功能?)

var http = require('http');
var https = require('https');
var s3 = require('./s3');
var gm = require('gm');

module.exports.processImageUrl = function(imageUrl, filename, callback) {
var client = http;
if (imageUrl.substr(0, 5) == 'https') { client = https; }

client.get(imageUrl, function(res) {
    if (res.statusCode != 200) {
        return callback(new Error('HTTP Response code ' + res.statusCode));
    }

    gm(res)
        .geometry(1024, 768, '>')
        .stream('jpg', function(err, stdout, stderr) {
            if (!err) {
                var buf = new Buffer(0);
                stdout.on('data', function(d) {
                    buf = Buffer.concat([buf, d]);
                });

                stdout.on('end', function() {
                    var headers = {
                        'Content-Length': buf.length
                        , 'Content-Type': 'Image/jpeg'
                        , 'x-amz-acl': 'public-read'
                    };

                    s3.putBuffer(buf, '/img/d/' + filename + '.jpg', headers, function(err, res) {
                        if(err) {
                            return callback(err);
                        } else {
                            return callback(null, res.client._httpMessage.url);
                        }
                    });
                });
            } else {
                callback(err);
            }
        });
    }).on('error', function(err) {
        callback(err);
    });
};

5 个解决方案

#1


56  

Overall I don't see anything that would break in your code.

总的来说,我没有看到任何会破坏你的代码。

Two suggestions:

两个建议:

The way you are combining Buffer objects is a suboptimal because it has to copy all the pre-existing data on every 'data' event. It would be better to put the chunks in an array and concat them all at the end.

组合缓冲区对象的方式不是最理想的,因为它必须复制每个“数据”事件上的所有预先存在的数据。最好将块放在一个数组中,最后将它们全部连接起来。

var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
  var buf = Buffer.concat(bufs);
}

For performance, I would look into if the S3 library you are using supports streams. Ideally you wouldn't need to create one large buffer at all, and instead just pass the stdout stream directly to the S3 library.

为了提高性能,我将调查您使用的S3库是否支持流。理想情况下,您根本不需要创建一个大缓冲区,而只是将stdout流直接传递给S3库。

As for the second part of your question, that isn't possible. When a function is called, it is allocated its own private context, and everything defined inside of that will only be accessible from other items defined inside that function.

至于你问题的第二部分,那是不可能的。调用函数时,会为其分配自己的私有上下文,其中定义的所有内容只能从该函数内定义的其他项中访问。

Update

Dumping the file to the filesystem would probably mean less memory usage per request, but file IO can be pretty slow so it might not be worth it. I'd say that you shouldn't optimize too much until you can profile and stress-test this function. If the garbage collector is doing its job you may be overoptimizing.

将文件转储到文件系统可能意味着每个请求的内存使用量更少,但文件IO可能非常慢,因此可能不值得。我会说你不应该优化太多,直到你能分析和压力测试这个功能。如果垃圾收集器正在完成其工作,您可能会过度优化。

With all that said, there are better ways anyway, so don't use files. Since all you want is the length, you can calculate that without needing to append all of the buffers together, so then you don't need to allocate a new Buffer at all.

尽管如此,还是有更好的方法,所以不要使用文件。由于您只需要长度,因此无需将所有缓冲区附加在一起即可计算出来,因此您根本不需要分配新的缓冲区。

var pause_stream = require('pause-stream');

// Your other code.

var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
  var contentLength = bufs.reduce(function(sum, buf){
    return sum + buf.length;
  }, 0);

  // Create a stream that will emit your chunks when resumed.
  var stream = pause_stream();
  stream.pause();
  while (bufs.length) stream.write(bufs.shift());
  stream.end();

  var headers = {
      'Content-Length': contentLength,
      // ...
  };

  s3.putStream(stream, ....);

#2


3  

A related project is node-stream-buffer. Description: "Readable and Writable Streams that use backing Buffers".

相关项目是节点流缓冲区。描述:“使用后备缓冲区的可读和可写流”。

#3


1  

I suggest to have array of buffers and concat to resulting buffer only once at the end. Its easy to do manually, or one could use node-buffers

我建议在结束时只有一次缓冲区和concat到结果缓冲区。它很容易手动完成,或者可以使用节点缓冲区

#4


1  

I just want to post my solution. Previous answers was pretty helpful for my research. I use length-stream to get the size of the stream, but the problem here is that the callback is fired near the end of the stream, so i also use stream-cache to cache the stream and pipe it to res object once i know the content-length. In case on an error,

我只是想发布我的解决方案。以前的答案对我的研究很有帮助。我使用length-stream来获取流的大小,但问题是回调是在流的末尾附近触发的,所以我也使用stream-cache来缓存流并将其传递给res对象一旦我知道内容长度。如果出现错误,

var StreamCache = require('stream-cache');
var lengthStream = require('length-stream');

var _streamFile = function(res , stream , cb){
    var cache = new StreamCache();

    var lstream = lengthStream(function(length) {
        res.header("Content-Length", length);
        cache.pipe(res);
    });

    stream.on('error', function(err){
        return cb(err);
    });

    stream.on('end', function(){
        return cb(null , true);
    });

    return stream.pipe(lstream).pipe(cache);
}

#5


1  

You can easily do this using node-fetch if you are pulling from http(s) URIs.

如果从http(s)URI中提取,可以使用node-fetch轻松完成此操作。

From the readme:

从自述文件:

fetch('https://assets-cdn.github.com/images/modules/logos_page/Octocat.png')
    .then(res => res.buffer())
    .then(buffer => console.log)

#1


56  

Overall I don't see anything that would break in your code.

总的来说,我没有看到任何会破坏你的代码。

Two suggestions:

两个建议:

The way you are combining Buffer objects is a suboptimal because it has to copy all the pre-existing data on every 'data' event. It would be better to put the chunks in an array and concat them all at the end.

组合缓冲区对象的方式不是最理想的,因为它必须复制每个“数据”事件上的所有预先存在的数据。最好将块放在一个数组中,最后将它们全部连接起来。

var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
  var buf = Buffer.concat(bufs);
}

For performance, I would look into if the S3 library you are using supports streams. Ideally you wouldn't need to create one large buffer at all, and instead just pass the stdout stream directly to the S3 library.

为了提高性能,我将调查您使用的S3库是否支持流。理想情况下,您根本不需要创建一个大缓冲区,而只是将stdout流直接传递给S3库。

As for the second part of your question, that isn't possible. When a function is called, it is allocated its own private context, and everything defined inside of that will only be accessible from other items defined inside that function.

至于你问题的第二部分,那是不可能的。调用函数时,会为其分配自己的私有上下文,其中定义的所有内容只能从该函数内定义的其他项中访问。

Update

Dumping the file to the filesystem would probably mean less memory usage per request, but file IO can be pretty slow so it might not be worth it. I'd say that you shouldn't optimize too much until you can profile and stress-test this function. If the garbage collector is doing its job you may be overoptimizing.

将文件转储到文件系统可能意味着每个请求的内存使用量更少,但文件IO可能非常慢,因此可能不值得。我会说你不应该优化太多,直到你能分析和压力测试这个功能。如果垃圾收集器正在完成其工作,您可能会过度优化。

With all that said, there are better ways anyway, so don't use files. Since all you want is the length, you can calculate that without needing to append all of the buffers together, so then you don't need to allocate a new Buffer at all.

尽管如此,还是有更好的方法,所以不要使用文件。由于您只需要长度,因此无需将所有缓冲区附加在一起即可计算出来,因此您根本不需要分配新的缓冲区。

var pause_stream = require('pause-stream');

// Your other code.

var bufs = [];
stdout.on('data', function(d){ bufs.push(d); });
stdout.on('end', function(){
  var contentLength = bufs.reduce(function(sum, buf){
    return sum + buf.length;
  }, 0);

  // Create a stream that will emit your chunks when resumed.
  var stream = pause_stream();
  stream.pause();
  while (bufs.length) stream.write(bufs.shift());
  stream.end();

  var headers = {
      'Content-Length': contentLength,
      // ...
  };

  s3.putStream(stream, ....);

#2


3  

A related project is node-stream-buffer. Description: "Readable and Writable Streams that use backing Buffers".

相关项目是节点流缓冲区。描述:“使用后备缓冲区的可读和可写流”。

#3


1  

I suggest to have array of buffers and concat to resulting buffer only once at the end. Its easy to do manually, or one could use node-buffers

我建议在结束时只有一次缓冲区和concat到结果缓冲区。它很容易手动完成,或者可以使用节点缓冲区

#4


1  

I just want to post my solution. Previous answers was pretty helpful for my research. I use length-stream to get the size of the stream, but the problem here is that the callback is fired near the end of the stream, so i also use stream-cache to cache the stream and pipe it to res object once i know the content-length. In case on an error,

我只是想发布我的解决方案。以前的答案对我的研究很有帮助。我使用length-stream来获取流的大小,但问题是回调是在流的末尾附近触发的,所以我也使用stream-cache来缓存流并将其传递给res对象一旦我知道内容长度。如果出现错误,

var StreamCache = require('stream-cache');
var lengthStream = require('length-stream');

var _streamFile = function(res , stream , cb){
    var cache = new StreamCache();

    var lstream = lengthStream(function(length) {
        res.header("Content-Length", length);
        cache.pipe(res);
    });

    stream.on('error', function(err){
        return cb(err);
    });

    stream.on('end', function(){
        return cb(null , true);
    });

    return stream.pipe(lstream).pipe(cache);
}

#5


1  

You can easily do this using node-fetch if you are pulling from http(s) URIs.

如果从http(s)URI中提取,可以使用node-fetch轻松完成此操作。

From the readme:

从自述文件:

fetch('https://assets-cdn.github.com/images/modules/logos_page/Octocat.png')
    .then(res => res.buffer())
    .then(buffer => console.log)