I am writing an API using node.js with express. Part of the API will allow users to POST large payloads of binary data (perhaps hundreds of MB) to be stored in the server database.
我正在使用带有express的node.js编写API。部分API将允许用户将大量二进制数据(可能数百MB)的有效负载存储在服务器数据库中。
As it stands now, the express request handler does not get called until the entire upload is ready and stored in memory on the server (req.body). Then it has to be saved to a database. There are two things I don't like about this. The first is that it requires a lot of server memory to hold all that binary data at once. The second is that many databases like MongoDB and S3 allow for streaming so you don't really need to have all the data in place before you start writing it, so there's no reason to wait around.
就目前而言,在整个上传准备好并存储在服务器(req.body)的内存中之前,不会调用快速请求处理程序。然后它必须保存到数据库。我不喜欢这两件事。首先,它需要大量的服务器内存来同时保存所有二进制数据。第二个是像MongoDB和S3这样的许多数据库允许流式传输,因此在开始编写之前你并不需要掌握所有数据,因此没有理由等待。
So my question is, can node (through express or some other way) be configured to start streaming to the database before the entire request has come in?
所以我的问题是,可以将节点(通过快速或其他方式)配置为在整个请求进入之前开始流式传输到数据库吗?
1 个解决方案
#1
3
After further research, I have found that the native "http" module does in fact support streaming in the way I mentioned. I'm not sure if express supports this. I would guess that it does, but in the case of an upload you probably cannot use the bodyParser middleware since that probably blocks until the entire request body is received.
经过进一步的研究,我发现原生的“http”模块确实支持我提到的方式的流媒体。我不确定express是否支持这一点。我猜它确实如此,但在上传的情况下你可能无法使用bodyParser中间件,因为这可能阻塞直到收到整个请求体。
Anyway, here is some code that shows how you can stream an incoming request to MongoDB's GridFS:
无论如何,这里有一些代码显示如何将传入的请求流式传输到MongoDB的GridFS:
var http = require('http');
var mongo = require('mongodb');
var db = new mongo.Db('somedb', new mongo.Server("localhost", 27017), { safe: true });
db.open(function(err) {
if (err)
console.log(err);
http.createServer(function(req, res) {
var numToSave = 0;
var endCalled = false;
new mongo.GridStore(db, new mongo.ObjectID(), "w", { root: "fs", filename: "test" }).open(function(err, gridStore) {
if(err)
console.log(err);
gridStore.chunkSize = 1024 * 256;
req.on("data", function(chunk) {
numToSave++;
gridStore.write(chunk, function(err, gridStore) {
if(err)
console.log(err);
numToSave--;
if(numToSave === 0 && endCalled)
finishUp(gridStore, res);
});
});
req.on("end", function() {
endCalled = true;
console.log("end called");
if(numToSave === 0)
finishUp(gridStore, res);
});
});
}).listen(8000);
});
function finishUp(gridStore, res) {
gridStore.close();
res.end();
console.log("finishing up");
}
The gist is that the req object is actually a stream with "data" and "end" events. Every time a "data" event occurs, you write a chunk of data to mongo. When the "end" event occurs, you close the mongo connection and send out the response.
要点是req对象实际上是一个带有“data”和“end”事件的流。每次发生“数据”事件时,都会向mongo写入一大块数据。发生“结束”事件时,关闭mongo连接并发送响应。
There is some yuckiness related to coordinating all the different async activities. You don't want to close the mongo connection before you have had a chance to actually write out all the data. I achieve this with a counter and a boolean but there might be a better way using some library.
与协调所有不同的异步活动有关。在您有机会实际写出所有数据之前,您不希望关闭mongo连接。我用计数器和布尔值实现了这个,但是使用某些库可能有更好的方法。
#1
3
After further research, I have found that the native "http" module does in fact support streaming in the way I mentioned. I'm not sure if express supports this. I would guess that it does, but in the case of an upload you probably cannot use the bodyParser middleware since that probably blocks until the entire request body is received.
经过进一步的研究,我发现原生的“http”模块确实支持我提到的方式的流媒体。我不确定express是否支持这一点。我猜它确实如此,但在上传的情况下你可能无法使用bodyParser中间件,因为这可能阻塞直到收到整个请求体。
Anyway, here is some code that shows how you can stream an incoming request to MongoDB's GridFS:
无论如何,这里有一些代码显示如何将传入的请求流式传输到MongoDB的GridFS:
var http = require('http');
var mongo = require('mongodb');
var db = new mongo.Db('somedb', new mongo.Server("localhost", 27017), { safe: true });
db.open(function(err) {
if (err)
console.log(err);
http.createServer(function(req, res) {
var numToSave = 0;
var endCalled = false;
new mongo.GridStore(db, new mongo.ObjectID(), "w", { root: "fs", filename: "test" }).open(function(err, gridStore) {
if(err)
console.log(err);
gridStore.chunkSize = 1024 * 256;
req.on("data", function(chunk) {
numToSave++;
gridStore.write(chunk, function(err, gridStore) {
if(err)
console.log(err);
numToSave--;
if(numToSave === 0 && endCalled)
finishUp(gridStore, res);
});
});
req.on("end", function() {
endCalled = true;
console.log("end called");
if(numToSave === 0)
finishUp(gridStore, res);
});
});
}).listen(8000);
});
function finishUp(gridStore, res) {
gridStore.close();
res.end();
console.log("finishing up");
}
The gist is that the req object is actually a stream with "data" and "end" events. Every time a "data" event occurs, you write a chunk of data to mongo. When the "end" event occurs, you close the mongo connection and send out the response.
要点是req对象实际上是一个带有“data”和“end”事件的流。每次发生“数据”事件时,都会向mongo写入一大块数据。发生“结束”事件时,关闭mongo连接并发送响应。
There is some yuckiness related to coordinating all the different async activities. You don't want to close the mongo connection before you have had a chance to actually write out all the data. I achieve this with a counter and a boolean but there might be a better way using some library.
与协调所有不同的异步活动有关。在您有机会实际写出所有数据之前,您不希望关闭mongo连接。我用计数器和布尔值实现了这个,但是使用某些库可能有更好的方法。