I want to download a zip file from the internet and unzip it in memory without saving to a temporary file. How can I do this?
我想从互联网上下载一个zip文件,并在内存中解压,而不需要保存到一个临时文件。我该怎么做呢?
Here is what I tried:
以下是我的尝试:
var url = 'http://bdn-ak.bloomberg.com/precanned/Comdty_Calendar_Spread_Option_20120428.txt.zip';
var request = require('request'), fs = require('fs'), zlib = require('zlib');
request.get(url, function(err, res, file) {
if(err) throw err;
zlib.unzip(file, function(err, txt) {
if(err) throw err;
console.log(txt.toString()); //outputs nothing
});
});
[EDIT] As, suggested, I tried using the adm-zip library and I still cannot make this work:
[编辑]正如我建议的那样,我试过使用adm-zip库,但我还是不能让它工作:
var ZipEntry = require('adm-zip/zipEntry');
request.get(url, function(err, res, zipFile) {
if(err) throw err;
var zip = new ZipEntry();
zip.setCompressedData(new Buffer(zipFile.toString('utf-8')));
var text = zip.getData();
console.log(text.toString()); // fails
});
5 个解决方案
#1
56
-
You need a library that can handle buffers. The latest version of
adm-zip
will do:您需要一个能够处理缓冲区的库。最新的ad -zip版本将会:
npm install git://github.com/cthackers/adm-zip.git
npm安装git:/ /github.com/cthackers/adm-zip.git
-
My solution uses the
http.get
method, since it returns Buffer chunks.我的解决方案使用http。get方法,因为它返回缓冲区块。
Code:
代码:
var file_url = 'http://bdn-ak.bloomberg.com/precanned/Comdty_Calendar_Spread_Option_20120428.txt.zip';
var request = require('request');
var fs = require('fs');
var AdmZip = require('adm-zip');
var http = require('http');
var url = require('url');
var options = {
host: url.parse(file_url).host,
port: 80,
path: url.parse(file_url).pathname
};
http.get(options, function(res) {
var data = [], dataLen = 0;
res.on('data', function(chunk) {
data.push(chunk);
dataLen += chunk.length;
}).on('end', function() {
var buf = new Buffer(dataLen);
for (var i=0, len = data.length, pos = 0; i < len; i++) {
data[i].copy(buf, pos);
pos += data[i].length;
}
var zip = new AdmZip(buf);
var zipEntries = zip.getEntries();
console.log(zipEntries.length)
for (var i = 0; i < zipEntries.length; i++)
console.log(zip.readAsText(zipEntries[i]));
});
});
The idea is to create an array of buffers and concatenate them into a new one at the end. This is due to the fact that buffers cannot be resized.
其思想是创建一个缓冲区数组,并在末尾将它们连接到一个新的缓冲区中。这是因为缓冲区不能被调整大小。
#2
5
Sadly you can't pipe the response stream into the unzip job as node zlib
lib allows you to do, you have to cache and wait the end of the response. I suggest you to pipe the response to a fs
stream in case of big files, otherwise you will full fill your memory in a blink!
遗憾的是,不能像节点zlib允许的那样将响应流导入unzip作业,必须缓存并等待响应结束。我建议您将响应传输到fs流,以防出现大文件,否则您将在一眨眼的时间内将内存填满!
I don't completely understand what you are trying to do, but imho this is the best approach. You should keep your data in memory only the time you really need it, and then stream to the csv parser.
我不完全明白你想做什么,但我认为这是最好的办法。您应该只在真正需要数据的时候将数据保存在内存中,然后流到csv解析器。
If you want to keep all your data in memory you can replace the csv parser method fromPath
with from
that takes a buffer instead and in getData return directly unzipped
如果您想将所有数据保存在内存中,您可以将csv解析器方法fromPath替换为使用缓冲区的from,并在getData中直接解压返回
You can use the AMDZip
(as @mihai said) instead of node-zip
, just pay attention because AMDZip
is not yet published in npm so you need:
您可以使用AMDZip(正如@mihai所说的)而不是node-zip,只需注意,因为AMDZip还没有在npm中发布,所以您需要:
$ npm install git://github.com/cthackers/adm-zip.git
N.B. Assumption: the zip file contains only one file
注意:假定zip文件只包含一个文件
var request = require('request'),
fs = require('fs'),
csv = require('csv')
NodeZip = require('node-zip')
function getData(tmpFolder, url, callback) {
var tempZipFilePath = tmpFolder + new Date().getTime() + Math.random()
var tempZipFileStream = fs.createWriteStream(tempZipFilePath)
request.get({
url: url,
encoding: null
}).on('end', function() {
fs.readFile(tempZipFilePath, 'base64', function (err, zipContent) {
var zip = new NodeZip(zipContent, { base64: true })
Object.keys(zip.files).forEach(function (filename) {
var tempFilePath = tmpFolder + new Date().getTime() + Math.random()
var unzipped = zip.files[filename].data
fs.writeFile(tempFilePath, unzipped, function (err) {
callback(err, tempFilePath)
})
})
})
}).pipe(tempZipFileStream)
}
getData('/tmp/', 'http://bdn-ak.bloomberg.com/precanned/Comdty_Calendar_Spread_Option_20120428.txt.zip', function (err, path) {
if (err) {
return console.error('error: %s' + err.message)
}
var metadata = []
csv().fromPath(path, {
delimiter: '|',
columns: true
}).transform(function (data){
// do things with your data
if (data.NAME[0] === '#') {
metadata.push(data.NAME)
} else {
return data
}
}).on('data', function (data, index) {
console.log('#%d %s', index, JSON.stringify(data, null, ' '))
}).on('end',function (count) {
console.log('Metadata: %s', JSON.stringify(metadata, null, ' '))
console.log('Number of lines: %d', count)
}).on('error', function (error) {
console.error('csv parsing error: %s', error.message)
})
})
#3
4
If you're under MacOS or Linux, you can use the unzip
command to unzip from stdin
.
如果在MacOS或Linux下,可以使用unzip命令从stdin解压缩。
In this example I'm reading the zip file from the filesystem into a Buffer
object but it works with a downloaded file as well:
在本例中,我将zip文件从文件系统中读取到一个缓冲区对象中,但它也适用于下载的文件:
// Get a Buffer with the zip content
var fs = require("fs")
, zip = fs.readFileSync(__dirname + "/test.zip");
// Now the actual unzipping:
var spawn = require('child_process').spawn
, fileToExtract = "test.js"
// -p tells unzip to extract to stdout
, unzip = spawn("unzip", ["-p", "/dev/stdin", fileToExtract ])
;
// Write the Buffer to stdin
unzip.stdin.write(zip);
// Handle errors
unzip.stderr.on('data', function (data) {
console.log("There has been an error: ", data.toString("utf-8"));
});
// Handle the unzipped stdout
unzip.stdout.on('data', function (data) {
console.log("Unzipped file: ", data.toString("utf-8"));
});
unzip.stdin.end();
Which is actually just the node version of:
它实际上只是节点版本:
cat test.zip | unzip -p /dev/stdin test.js
EDIT: It's worth noting that this will not work if the input zip is too big to be read in one chunk from stdin. If you need to read bigger files, and your zip file contains only one file, you can use funzip instead of unzip
:
编辑:值得注意的是,如果输入压缩文件太大,无法从stdin中读取一段内容,那么这将不起作用。如果您需要读取较大的文件,并且您的zip文件只包含一个文件,您可以使用funzip而不是unzip:
var unzip = spawn("funzip");
If your zip file contains multiple files (and the file you want isn't the first one) I'm afraid to say you're out of luck. Unzip needs to seek in the .zip
file since zip files are just a container, and unzip may just unzip the last file in it. In that case you have to save the file temporarily (node-temp comes in handy).
如果你的zip文件包含多个文件(你想要的文件不是第一个),我恐怕会说你运气不好。Unzip需要在.zip文件中查找,因为zip文件只是一个容器,而Unzip可能只是解压其中的最后一个文件。在这种情况下,您必须临时保存文件(node-temp非常有用)。
#4
1
Two days ago the module node-zip
has been released, which is a wrapper for the JavaScript only version of Zip: JSZip.
两天前,模块node-zip已经发布,它是Zip的JavaScript唯一版本的包装器:JSZip。
var NodeZip = require('node-zip')
, zip = new NodeZip(zipBuffer.toString("base64"), { base64: true })
, unzipped = zip.files["your-text-file.txt"].data;
#5
-3
var fs = require('fs); var unzip = require('unzip');
var fs =要求(fs);var =需要解压缩(解压缩);
//unzip a.zip to current dictionary
/ /解压。邮政目前字典
fs.createReadStream('./path/a.zip').pipe(unzip.Extract({ path: './path/' }));
fs.createReadStream('。/道路/ a.zip).pipe(解压缩。提取({路径:”。/道路/ ' }));
I used unzip module, and it worked .
我使用了unzip模块,它成功了。
#1
56
-
You need a library that can handle buffers. The latest version of
adm-zip
will do:您需要一个能够处理缓冲区的库。最新的ad -zip版本将会:
npm install git://github.com/cthackers/adm-zip.git
npm安装git:/ /github.com/cthackers/adm-zip.git
-
My solution uses the
http.get
method, since it returns Buffer chunks.我的解决方案使用http。get方法,因为它返回缓冲区块。
Code:
代码:
var file_url = 'http://bdn-ak.bloomberg.com/precanned/Comdty_Calendar_Spread_Option_20120428.txt.zip';
var request = require('request');
var fs = require('fs');
var AdmZip = require('adm-zip');
var http = require('http');
var url = require('url');
var options = {
host: url.parse(file_url).host,
port: 80,
path: url.parse(file_url).pathname
};
http.get(options, function(res) {
var data = [], dataLen = 0;
res.on('data', function(chunk) {
data.push(chunk);
dataLen += chunk.length;
}).on('end', function() {
var buf = new Buffer(dataLen);
for (var i=0, len = data.length, pos = 0; i < len; i++) {
data[i].copy(buf, pos);
pos += data[i].length;
}
var zip = new AdmZip(buf);
var zipEntries = zip.getEntries();
console.log(zipEntries.length)
for (var i = 0; i < zipEntries.length; i++)
console.log(zip.readAsText(zipEntries[i]));
});
});
The idea is to create an array of buffers and concatenate them into a new one at the end. This is due to the fact that buffers cannot be resized.
其思想是创建一个缓冲区数组,并在末尾将它们连接到一个新的缓冲区中。这是因为缓冲区不能被调整大小。
#2
5
Sadly you can't pipe the response stream into the unzip job as node zlib
lib allows you to do, you have to cache and wait the end of the response. I suggest you to pipe the response to a fs
stream in case of big files, otherwise you will full fill your memory in a blink!
遗憾的是,不能像节点zlib允许的那样将响应流导入unzip作业,必须缓存并等待响应结束。我建议您将响应传输到fs流,以防出现大文件,否则您将在一眨眼的时间内将内存填满!
I don't completely understand what you are trying to do, but imho this is the best approach. You should keep your data in memory only the time you really need it, and then stream to the csv parser.
我不完全明白你想做什么,但我认为这是最好的办法。您应该只在真正需要数据的时候将数据保存在内存中,然后流到csv解析器。
If you want to keep all your data in memory you can replace the csv parser method fromPath
with from
that takes a buffer instead and in getData return directly unzipped
如果您想将所有数据保存在内存中,您可以将csv解析器方法fromPath替换为使用缓冲区的from,并在getData中直接解压返回
You can use the AMDZip
(as @mihai said) instead of node-zip
, just pay attention because AMDZip
is not yet published in npm so you need:
您可以使用AMDZip(正如@mihai所说的)而不是node-zip,只需注意,因为AMDZip还没有在npm中发布,所以您需要:
$ npm install git://github.com/cthackers/adm-zip.git
N.B. Assumption: the zip file contains only one file
注意:假定zip文件只包含一个文件
var request = require('request'),
fs = require('fs'),
csv = require('csv')
NodeZip = require('node-zip')
function getData(tmpFolder, url, callback) {
var tempZipFilePath = tmpFolder + new Date().getTime() + Math.random()
var tempZipFileStream = fs.createWriteStream(tempZipFilePath)
request.get({
url: url,
encoding: null
}).on('end', function() {
fs.readFile(tempZipFilePath, 'base64', function (err, zipContent) {
var zip = new NodeZip(zipContent, { base64: true })
Object.keys(zip.files).forEach(function (filename) {
var tempFilePath = tmpFolder + new Date().getTime() + Math.random()
var unzipped = zip.files[filename].data
fs.writeFile(tempFilePath, unzipped, function (err) {
callback(err, tempFilePath)
})
})
})
}).pipe(tempZipFileStream)
}
getData('/tmp/', 'http://bdn-ak.bloomberg.com/precanned/Comdty_Calendar_Spread_Option_20120428.txt.zip', function (err, path) {
if (err) {
return console.error('error: %s' + err.message)
}
var metadata = []
csv().fromPath(path, {
delimiter: '|',
columns: true
}).transform(function (data){
// do things with your data
if (data.NAME[0] === '#') {
metadata.push(data.NAME)
} else {
return data
}
}).on('data', function (data, index) {
console.log('#%d %s', index, JSON.stringify(data, null, ' '))
}).on('end',function (count) {
console.log('Metadata: %s', JSON.stringify(metadata, null, ' '))
console.log('Number of lines: %d', count)
}).on('error', function (error) {
console.error('csv parsing error: %s', error.message)
})
})
#3
4
If you're under MacOS or Linux, you can use the unzip
command to unzip from stdin
.
如果在MacOS或Linux下,可以使用unzip命令从stdin解压缩。
In this example I'm reading the zip file from the filesystem into a Buffer
object but it works with a downloaded file as well:
在本例中,我将zip文件从文件系统中读取到一个缓冲区对象中,但它也适用于下载的文件:
// Get a Buffer with the zip content
var fs = require("fs")
, zip = fs.readFileSync(__dirname + "/test.zip");
// Now the actual unzipping:
var spawn = require('child_process').spawn
, fileToExtract = "test.js"
// -p tells unzip to extract to stdout
, unzip = spawn("unzip", ["-p", "/dev/stdin", fileToExtract ])
;
// Write the Buffer to stdin
unzip.stdin.write(zip);
// Handle errors
unzip.stderr.on('data', function (data) {
console.log("There has been an error: ", data.toString("utf-8"));
});
// Handle the unzipped stdout
unzip.stdout.on('data', function (data) {
console.log("Unzipped file: ", data.toString("utf-8"));
});
unzip.stdin.end();
Which is actually just the node version of:
它实际上只是节点版本:
cat test.zip | unzip -p /dev/stdin test.js
EDIT: It's worth noting that this will not work if the input zip is too big to be read in one chunk from stdin. If you need to read bigger files, and your zip file contains only one file, you can use funzip instead of unzip
:
编辑:值得注意的是,如果输入压缩文件太大,无法从stdin中读取一段内容,那么这将不起作用。如果您需要读取较大的文件,并且您的zip文件只包含一个文件,您可以使用funzip而不是unzip:
var unzip = spawn("funzip");
If your zip file contains multiple files (and the file you want isn't the first one) I'm afraid to say you're out of luck. Unzip needs to seek in the .zip
file since zip files are just a container, and unzip may just unzip the last file in it. In that case you have to save the file temporarily (node-temp comes in handy).
如果你的zip文件包含多个文件(你想要的文件不是第一个),我恐怕会说你运气不好。Unzip需要在.zip文件中查找,因为zip文件只是一个容器,而Unzip可能只是解压其中的最后一个文件。在这种情况下,您必须临时保存文件(node-temp非常有用)。
#4
1
Two days ago the module node-zip
has been released, which is a wrapper for the JavaScript only version of Zip: JSZip.
两天前,模块node-zip已经发布,它是Zip的JavaScript唯一版本的包装器:JSZip。
var NodeZip = require('node-zip')
, zip = new NodeZip(zipBuffer.toString("base64"), { base64: true })
, unzipped = zip.files["your-text-file.txt"].data;
#5
-3
var fs = require('fs); var unzip = require('unzip');
var fs =要求(fs);var =需要解压缩(解压缩);
//unzip a.zip to current dictionary
/ /解压。邮政目前字典
fs.createReadStream('./path/a.zip').pipe(unzip.Extract({ path: './path/' }));
fs.createReadStream('。/道路/ a.zip).pipe(解压缩。提取({路径:”。/道路/ ' }));
I used unzip module, and it worked .
我使用了unzip模块,它成功了。