I've got a big 1.8gb XML file with all of its contents on one line.
The main structure of the file is this:
我有一个1.8gb的XML文件,它的所有内容都在一行上。该文件的主要结构是:
<xml>
<mutateieoverzicht>
<mutatiebericht> ... </mutatiebericht>
<mutatiebericht> ... </mutatiebericht>
...
</mutatieoverzicht>
</xml>
But then on one line :)
但有一句话:
I want to parse the file and do some actions on the mutatiebericht
elements (storing in db). Because loading and parsing the whole document at once takes to much memory and is terribly slow, I was thinking of parsing the file line-by-line. But the original file has only one line.
我想解析文件并对mutatiebericht元素(以db格式存储)执行一些操作。因为一次加载和解析整个文档需要占用大量内存,而且速度非常慢,所以我正在考虑逐行解析文件。但是原始文件只有一行。
So my first step would be to traverse the file and create a new file with every mutatiebericht
on its own line.
所以我的第一步是遍历这个文件,并创建一个新的文件,每个mutatiebericht都在它自己的线上。
I can load the file in node.js and do things with the content, but am lost regarding the splitting of the string with contents.
我可以在节点中加载文件。并对内容进行处理,但是对于字符串与内容的分割,我感到很困惑。
1 个解决方案
#1
0
You could use the xml-stream module, it reads a XML file into a stream and emit events on each elements start and end. It would look something like this (untested):
您可以使用XML流模块,它将XML文件读入流,并在每个元素的开始和结束时发出事件。它看起来像这样(未经测试):
var stream = fs.createReadStream(pathtoyourfile);
var xml = new XmlStream(stream);
xml.on('endElement: mutatiebericht', function(item) {
console.log(item); //item contains your element
});
#1
0
You could use the xml-stream module, it reads a XML file into a stream and emit events on each elements start and end. It would look something like this (untested):
您可以使用XML流模块,它将XML文件读入流,并在每个元素的开始和结束时发出事件。它看起来像这样(未经测试):
var stream = fs.createReadStream(pathtoyourfile);
var xml = new XmlStream(stream);
xml.on('endElement: mutatiebericht', function(item) {
console.log(item); //item contains your element
});