如何在node.js中解码内存数据的含义并调试内存泄漏?

时间:2021-09-02 21:25:37

I've got an RSS to MongoDB reader/scraper that runs through a data set larger than my system has memory for. As I loop through the data, the system slows down. I'm reasonably sure that it's cause I'm running out of memory.

我有一个RSS到MongoDB阅读器/刮刀,它运行的数据集大于我的系统有内存。当我遍历数据时,系统会变慢。我有理由相信这是因为我的内存不足。

I've added some debug info and have made a few changes, but I don't know how to read the information given in the debug output.

我添加了一些调试信息并进行了一些更改,但我不知道如何读取调试输出中给出的信息。

Here's a debug output sample (from before it gets deadly):

这是一个调试输出示例(从它变得致命之前):

 100 items 
 Memory: { rss: 11104256,        // what is RSS?
           vsize: 57507840,      // what is VSIZE?
           heapTotal: 4732352,   // heapTotal?
           heapUsed: 3407624 }   // heapUsed?
 200 items
 Memory: { rss: 12533760,
           vsize: 57880576,
           heapTotal: 6136320,
           heapUsed: 3541984 }
                                 // what key numbers do I watch for?
                                 // when do I reach 'situation critical'? 
                                 // how do I free up memory to prevent problems?

Also, if it helps and for better illustration, I've included a sample of the code. One change I've made already is moving all of the require statements outside of the GrabRss function.

此外,如果它有帮助并且为了更好地说明,我已经包含了代码示例。我已经做出的一个改变是将所有require语句移到GrabRss函数之外。

var http    = require('http');
var sys     = require('sys');
var xml2js  = require('xml2js');
var util    = require('util');
var Db      = require('../lib/mongodb').Db,
    Conn    = require('../lib/mongodb').Connection,
    Server  = require('../lib/mongodb').Server,
    // BSON = require('../lib/mongodb').BSONPure;
    BSON    = require('../lib/mongodb').BSONNative;

GrabRss = function(grab, start) {           
    var options = {
        host: 'www.example.com',
        port: 80,
        path: '/rss/'+grab+'/'+start
    };

    var data;
    var items;
    var checked = 0;
    var len = 0;

    GotResponse = function(res) {
        var ResponseBody = "";
        res.on('data', DoChunk);
        res.on('end', EndResponse);

        function DoChunk(chunk){
            ResponseBody += chunk;
        }
        function EndResponse() {
            //console.log(ResponseBody);
            var parser = new xml2js.Parser();
            parser.addListener('end', GotRSSObject);
            parser.parseString(ResponseBody);
        }
    }

    GotError = function(e) {
        console.log("Got error: " + e.message);
    }

    GotRSSObject = function(r){
        items = r.item;
        //console.log(sys.inspect(r));

        var db = new Db('rss', new Server('localhost', 27017, {}), {native_parser:false});
        db.open(function(err, db){
             db.collection('items', function(err, col) {
                len = items.length;
                if (len === 0) {
                    process.exit(0);
                }
                for (i in items) {
                    SaveItem(item[i], col);
                }
             });
        });
    }

    SaveMovie = function(i, c) {
        c.update({'id': i.id}, {$set: i}, {upsert: true, safe: true}, function(err){
            if (err) console.warn(err.message);
            if (++checked >= len) {
                if (checked < 5000) {
                        delete data;   // added since asking
                        delete items; // added since asking

                    console.log(start+checked);
                    console.log('Memory: '+util.inspect(process.memoryUsage()));
                    GrabRss(50, start+checked);
                } else {
                    console.log(len);
                    process.exit(0);
                }
            } else if (checked % 10 == 0) {
                console.log(start+checked);
            }
        });
    }
    http.get(options, GotResponse).on('error', GotError);

}
GrabRss(50, 0);

1 个解决方案

#1


8  

After reading through this code, I do see that items in GotRSSObject is declared as a global, because there is no var prefacing it.

阅读完这段代码之后,我确实看到GotRSSObject中的项被声明为全局,因为没有var前置它。

Aside from that, I see no other obvious memory leaks. A good basic technique is to add some more print statements to see where the memory is being allocated and then to check where you would expect that memory to be cleaned up by asserting that the variables == null.

除此之外,我没有看到其他明显的内存泄漏。一个很好的基本技术是添加一些打印语句来查看内存的分配位置,然后通过断言变量== null来检查你希望清理内存的位置。

The problem with memory with node.js and v8 is that it's not guaranteed to be garbage collected at any time and afaik, you can't force garbage collection to happen. You'll want to limit the amount of data you're working with to easily fit within memory and provide some error handling (perhaps with setTimeout or process.nextTick) to wait until memory has been cleaned up.

node.js和v8的内存问题是它不能保证在任何时候都是垃圾收集而且afaik,你不能强制垃圾收集发生。您需要限制您正在使用的数据量以便轻松适应内存并提供一些错误处理(可能使用setTimeout或process.nextTick)以等待内存清理完毕。

A word of advice with nextTick - it's a very, very fast call. Node.js is single threaded on an event loop as everyone knows. Using nextTick will literally execute that function on the very next loop - make sure you don't call to it very often otherwise you'll find yourself wasting cycles.

给nextTick一个忠告 - 这是一个非常非常快速的通话。每个人都知道,Node.js在事件循环上是单线程的。使用nextTick会在下一个循环中逐字执行该函数 - 确保你不经常调用它,否则你会发现自己浪费周期。

And regarding rss, vsize, heapTotal, heapUsed... vsize is the entire size of memory that your process is using and rss is how much of that is in actual physical RAM and not swap. heaptotal and heapUsed refer to v8's underlying storage that you have no control of. You'll mostly be concerned with vsize, but you can also get more detailed information with top or Activity Monitor on OS X (anyone know of good process visualization tools on *nix systems?).

关于rss,vsize,heapTotal,heapUsed ... vsize是你的进程正在使用的整个内存大小,rss是实际物理RAM中的多少,而不是交换。 heaptotal和heapUsed是指您无法控制的v8底层存储。您将主要关注vsize,但您也可以在OS X上使用top或Activity Monitor获取更详细的信息(任何人都知道* nix系统上的良好过程可视化工具?)。

#1


8  

After reading through this code, I do see that items in GotRSSObject is declared as a global, because there is no var prefacing it.

阅读完这段代码之后,我确实看到GotRSSObject中的项被声明为全局,因为没有var前置它。

Aside from that, I see no other obvious memory leaks. A good basic technique is to add some more print statements to see where the memory is being allocated and then to check where you would expect that memory to be cleaned up by asserting that the variables == null.

除此之外,我没有看到其他明显的内存泄漏。一个很好的基本技术是添加一些打印语句来查看内存的分配位置,然后通过断言变量== null来检查你希望清理内存的位置。

The problem with memory with node.js and v8 is that it's not guaranteed to be garbage collected at any time and afaik, you can't force garbage collection to happen. You'll want to limit the amount of data you're working with to easily fit within memory and provide some error handling (perhaps with setTimeout or process.nextTick) to wait until memory has been cleaned up.

node.js和v8的内存问题是它不能保证在任何时候都是垃圾收集而且afaik,你不能强制垃圾收集发生。您需要限制您正在使用的数据量以便轻松适应内存并提供一些错误处理(可能使用setTimeout或process.nextTick)以等待内存清理完毕。

A word of advice with nextTick - it's a very, very fast call. Node.js is single threaded on an event loop as everyone knows. Using nextTick will literally execute that function on the very next loop - make sure you don't call to it very often otherwise you'll find yourself wasting cycles.

给nextTick一个忠告 - 这是一个非常非常快速的通话。每个人都知道,Node.js在事件循环上是单线程的。使用nextTick会在下一个循环中逐字执行该函数 - 确保你不经常调用它,否则你会发现自己浪费周期。

And regarding rss, vsize, heapTotal, heapUsed... vsize is the entire size of memory that your process is using and rss is how much of that is in actual physical RAM and not swap. heaptotal and heapUsed refer to v8's underlying storage that you have no control of. You'll mostly be concerned with vsize, but you can also get more detailed information with top or Activity Monitor on OS X (anyone know of good process visualization tools on *nix systems?).

关于rss,vsize,heapTotal,heapUsed ... vsize是你的进程正在使用的整个内存大小,rss是实际物理RAM中的多少,而不是交换。 heaptotal和heapUsed是指您无法控制的v8底层存储。您将主要关注vsize,但您也可以在OS X上使用top或Activity Monitor获取更详细的信息(任何人都知道* nix系统上的良好过程可视化工具?)。