如何查询Linux(和OSX)上分配的内存数量?

时间:2021-02-25 16:58:44

While this might look like a duplicate from other questions, let me explain why it's not.

虽然这看起来像是其他问题的重复,让我解释为什么不是。

I am looking to get a specific part of my application to degrade gracefully when a certain memory limit has been reached. I could have used criteria based on remaining available physical memory, but this wouldn't be safe, because the OS could start paging out memory used by my application before reaching the criteria, which would think there is still some physical memory left, and keep allocating, etc. For the same reason, I can't used the amount of physical memory currently used by the process, because as soon as the OS would start swapping me out, I would keep allocating as the OS pages memory so the number would not grow anymore.

我希望在达到某个内存限制时,应用程序的特定部分能够优雅地降级。我可以使用标准基于剩余的可用物理内存,但这不会是安全的,因为操作系统可以分页内存使用我的应用程序在到达标准之前,将认为还有一些物理内存,并且保持分配,等。基于同样的原因,我无法使用的物理内存数量目前使用的过程,因为一旦操作系统将开始交换我出去,我将继续分配OS页面内存,这样这个数字就不会再增长了。

For this reason, I chose a criteria based on the amount of memory allocated by my application, i.e. very close to virtual memory size.

出于这个原因,我根据应用程序分配的内存数量选择了一个标准,即非常接近虚拟内存大小。

This question (How to determine CPU and memory consumption from inside a process?) provides great ways of querying the amount of virtual memory used by the current process, which I THOUGHT was what I needed.

这个问题(如何从进程内部确定CPU和内存消耗?)提供了查询当前进程使用的虚拟内存数量的好方法,我认为这正是我所需要的。

On Windows, I'm using GetProcessMemoryInfo() and the PrivateUsage field, which works great.

在Windows上,我使用GetProcessMemoryInfo()和PrivateUsage字段,这两个字段非常有用。

On Linux, I tried several things (listed below) that did not work. The reason why virtual memory usage does not work for me is because of something that happens with OpenCL context creation on NVidia hardware on Linux. The driver reserves a region of the virtual memory space big enough to hold all RAM, all swap and all video memory. My guess is it does so for unified address space and everything. But it also means that the process reports using enormous amounts of memory. On my system for instance, top will report 23.3 Gb in the VIRT column (12 Gb of RAM, 6 Gb of swap, 2 Gb of video memory, which gives 20 Gb reserved by the NVidia driver).

在Linux上,我尝试了一些没用的东西(如下所示)。我之所以不能使用虚拟内存,是因为Linux上NVidia硬件上的OpenCL上下文创建发生了一些事情。驱动程序保留虚拟内存空间的一个区域,足够容纳所有RAM、所有交换和所有视频内存。我猜这是为了统一地址空间和一切。但这也意味着使用大量内存的过程报告。例如,在我的系统中,top将在VIRT列中报告23.3 Gb (12 Gb RAM, 6 Gb swap, 2 Gb视频内存,NVidia驱动程序保留了20 Gb)。

On OSX, by using task_info() and the virtual_size field, I also get a bigger than expected number (a few Gb for an app that takes not even close to 1 Gb on Windows), but not as big as Linux.

在OSX上,通过使用task_info()和virtual_size字段,我也得到了一个比预期值更大的数字(对于一个在Windows上甚至不接近1 Gb的应用程序来说,只有几个Gb),但是没有Linux那么大。

So here is the big question: how can I get the amount of memory allocated by my application? I know that this is a somewhat vague question (what does "allocated memory" means?), but I'm flexible:

这里有一个大问题:如何获得应用程序分配的内存?我知道这是一个有点模糊的问题(“分配内存”是什么意思?)

  • I would prefer to include the application static data, code section and everything, but I can live without.
  • 我更愿意将应用程序静态数据、代码部分和所有内容包括进来,但是我可以不使用它。
  • I would prefer to include the memory allocated for stacks, but I can live without.
  • 我希望包含为栈分配的内存,但是我可以不使用它。
  • I would prefer to include the memory used by shared libraries, but I can live without.
  • 我希望包含共享库使用的内存,但是我可以不使用它。
  • I don't really care for mmap stuff, I can do with or without at that point.
  • 我不太喜欢mmap的东西,我可以用也可以不用。
  • Etc.
  • 等。

What is really important is that the number grows with dynamic allocation (new, malloc, anything) and shrinks when the memory is released (which I know can be implementation-dependent).

真正重要的是,随着动态分配(新的、malloc之类的)而增加,而随着内存的释放而减少(我知道这可能与实现有关)。

Things I have tried

Here are a couple of solutions I have tried and/or thought of but that would not work for me.

以下是我尝试过的和/或考虑过的几个解决方案,但对我来说行不通。

  1. Read from /proc/self/status

    读取/proc/self/status

    This is the approach suggested by how-to-determine-cpu-and-memory-consumption-from-inside-a-process. However, as stated above, this returns the amount of virtual memory, which does not work for me.

    这是如何确定cpu和内存消耗的方法。但是,如上所述,这将返回虚拟内存的数量,这对我不起作用。

  2. Read from /proc/self/statm

    读取/proc/self/statm

    Very slightly worst: according to http://kernelnewbies.kernelnewbies.narkive.com/iG9xCmwB/proc-pid-statm-doesnt-match-with-status, which refers to Linux kernel code, the only difference between those two values is that the second one does not substract reserved_vm to the amount of virtual memory. I would have HOPED that reserved_vm would include the memory reserved by the OpenCL driver, but it does not.

    最糟糕的是:根据http://kernelnewbies.kernelnewbies.narkive.com/ig9xcmwb/proc -pid- pid-statm doesnt-match- status,它指的是Linux内核代码,这两个值之间的唯一区别是第二个值没有将reserved_vm细分为虚拟内存的数量。我本来希望reserved_vm包含OpenCL驱动程序保留的内存,但它没有。

  3. Use mallinfo() and the uordblks field

    使用mallinfo()和uordblks字段

    This does not seem to include all the allocations (I'm guessing the news are missing), since for an +2Gb growth in virtual memory space (after doing some memory-heavy work and still holding the memory), I'm only seeing about 0.1Gb growth in the number returned by mallinfo().

    这似乎不包括所有的分配(我猜新闻已经丢失),因为对于虚拟内存空间的+2Gb增长(在做了一些内存繁重的工作并且仍然保留内存之后),我只看到mallinfo()返回的数据增加了0.1Gb。

  4. Read the [heap] section size from /proc/self/smaps

    从/proc/self/smaps中读取[heap]区段大小

    This value started at around 336,760 Kb and peaked at 1,019,496 Kb for work that grew virtual memory space by +2Gb, and then it never gets down, so I'm not sure I can't really rely on this number...

    这个值开始于336,760 Kb左右,在增加了+2Gb虚拟内存空间的工作中,最大值为1019,496 Kb,然后它永远不会下降,所以我不确定我不能真的依赖这个数字……

  5. Monitor all memory allocations in my application

    监视我的应用程序中的所有内存分配

    Yes, in an ideal world, I would have control over everybody who allocates memory. However, this is a legacy application, using tons of different allocators, some mallocs, some news, some OS-specific routines, etc. There are some plug-ins that could do whatever they want, they could be compiled with a different compiler, etc. So while this would be great to really control memory, this does not work in my context.

    是的,在一个理想的世界里,我可以控制每个分配记忆的人。然而,这是一个遗留应用程序,使用大量的不同的分配器,malloc,一些新闻,一些特定于操作系统例程,等。有一些插件,可以做任何他们想做的,他们可能会使用不同的编译器、编译等。因此,尽管这将是伟大的真正控制内存,这并不在我的工作背景。

  6. Read the virtual memory size before and after the OpenCL context initialization

    在OpenCL上下文初始化前后读取虚拟内存大小

    While this could be a "hacky" way to solve the problem (and I might have to fallback to it), I would really wish for a more reliable way to query memory, because OpenCL context could be initialized somewhere out of my control, and other similar but non-OpenCL specific issues could creep in and I wouldn't know about it.

    虽然这可能是一个“出租汽车司机”的方式来解决这个问题(我可能要回退),我真的想要一个更可靠的方法来查询内存,因为OpenCL上下文初始化某个地方脱离我的控制,和其他类似但non-OpenCL具体问题可能蠕变,我不知道。

So that's pretty much all I've got. There is one more thing I have not tried yet, because it only works on OSX, but it is to use the approach described in Why does mstats and malloc_zone_statistics not show recovered memory after free?, i.e. use malloc_get_all_zones() and malloc_zone_statistics(), but I think this might be the same problem as mallinfo(), i.e. not take all allocations into account.

这差不多就是我的全部了。还有一件事我还没有尝试过,因为它只在OSX上工作,但是它使用了mstats和malloc_zone_statistics为什么在空闲时不显示恢复后的内存?,即使用malloc_get_all_zones()和malloc_zone_statistics(),但我认为这可能与mallinfo()相同,即不考虑所有分配。

So, can anyone suggest a way to query memory usage (as vague of a term as this is, see above for precision) of a given process in Linux (and also OSX even if it's a different method)?

因此,任何人都可以提出一种查询内存使用的方法(就像这是一个模糊的术语,就像这是一个模糊的术语一样),在Linux中给定的进程(而且即使是不同的方法,也可以使用OSX)?

3 个解决方案

#1


1  

You can try and use information returned by getrusage():

您可以尝试使用getrusage()返回的信息:

#include <sys/time.h>
#include <sys/resource.h>

int getrusage(int who, struct rusage *usage);

struct rusage {
    struct timeval ru_utime; /* user CPU time used */
    struct timeval ru_stime; /* system CPU time used */
    long   ru_maxrss;        /* maximum resident set size */
    long   ru_ixrss;         /* integral shared memory size */
    long   ru_idrss;         /* integral unshared data size */
    long   ru_isrss;         /* integral unshared stack size */
    long   ru_minflt;        /* page reclaims (soft page faults) */
    long   ru_majflt;        /* page faults (hard page faults) */
    long   ru_nswap;         /* swaps */
    long   ru_inblock;       /* block input operations */
    long   ru_oublock;       /* block output operations */
    long   ru_msgsnd;        /* IPC messages sent */
    long   ru_msgrcv;        /* IPC messages received */
    long   ru_nsignals;      /* signals received */
    long   ru_nvcsw;         /* voluntary context switches */
    long   ru_nivcsw;        /* involuntary context switches */
};

If the memory information does not fit you purpose, observing the page fault counts can help monitor memory stress, which is what you intend to detect.

如果内存信息不适合您的目的,观察页面错误计数可以帮助监视内存压力,这是您要检测的。

#2


1  

Have you tried a shared library interposer for Linux for section (5) above? So long as your application is not statically linking the malloc functions, you can interpose a new function between your program and the kernel malloc. I've used this tactic many times to collect stats on memory usage.

您是否尝试过为上面第(5)节提供Linux的共享库interposer ?只要应用程序没有静态地链接malloc函数,就可以在程序和内核malloc之间插入一个新函数。我多次使用这个策略来收集有关内存使用的统计数据。

It does required setting LD_PRELOAD before running the program but no source or binary changes. It is an ideal answer in many cases.

它需要在运行程序之前设置LD_PRELOAD,但是没有源或二进制更改。在许多情况下,这是一个理想的答案。

Here is an example of a malloc interposer:

这里有一个malloc interposer的例子:

http://www.drdobbs.com/building-library-interposers-for-fun-and/184404926

http://www.drdobbs.com/building-library-interposers-for-fun-and/184404926

You probably will also want to do calloc and free. Calls to new generally end up as a call to malloc so C++ is covered as well.

你可能也想做calloc和free。对new的调用通常以对malloc的调用结束,因此也包括c++。

OS X seems to have similar capabilities but I have not tried it.

OS X似乎有类似的功能,但我还没有尝试过。

http://tlrobinson.net/blog/2007/12/overriding-library-functions-in-mac-os-x-the-easy-way-dyld_insert_libraries/

http://tlrobinson.net/blog/2007/12/overriding-library-functions-in-mac-os-x-the-easy-way-dyld_insert_libraries/

--Matt

——马特

#3


0  

Here is what I ended up using. I scan /proc/self/maps and sum the size of all the address ranges meeting my criteria, which is:

这是我最后使用的。我扫描/proc/self/maps,并将符合我标准的所有地址范围的大小相加,即:

  • Only include ranges from inode 0 (i.e. no devices, no mapped file, etc.)
  • 只包含inode 0的范围(即没有设备、没有映射文件等)
  • Only include ranges that are at least one of readable, writable or executable
  • 只包含至少一个可读、可写或可执行的范围
  • Only include private memory
    • In my experiments I did not see instances of shared memory from inode 0. Maybe with inter-process shared memory...?
    • 在我的实验中,我没有看到来自inode 0的共享内存实例。也许是进程间共享内存…?
  • 在我的实验中,我没有看到来自inode 0的共享内存实例。也许是进程间共享内存…?

Here is the code for my solution:

下面是我的解决方案的代码:

size_t getValue()
{
    FILE* file = fopen("/proc/self/maps", "r");
    if (!file)
    {
        assert(0);
        return 0;
    }

    size_t value = 0;

    char line[1024];
    while (fgets(line, 1024, file) != NULL)
    {
        ptrdiff_t start_address, end_address;
        char perms[4];
        ptrdiff_t offset;
        int dev_major, dev_minor;
        unsigned long int inode;
        const int nb_scanned = sscanf(
            line, "%16tx-%16tx %c%c%c%c %16tx %02x:%02x %lu",
            &start_address, &end_address,
            &perms[0], &perms[1], &perms[2], &perms[3],
            &offset, &dev_major, &dev_minor, &inode
            );
        if (10 != nb_scanned)
        {
            assert(0);
            continue;
        }

        if ((inode == 0) &&
            (perms[0] != '-' || perms[1] != '-' || perms[2] != '-') &&
            (perms[3] == 'p'))
        {
            assert(dev_major == 0);
            assert(dev_minor == 0);
            value += (end_address - start_address);
        }
    }

    fclose(file);

    return value;
}

Since this is looping through all the lines in /proc/self/maps, querying memory that way is significantly slower than using "Virtual Memory currently used by current process" from How to determine CPU and memory consumption from inside a process?.

由于这是循环遍历/proc/self/maps中的所有行,因此以这种方式查询内存要比使用“当前进程当前使用的虚拟内存”来确定进程内部的CPU和内存消耗要慢得多。

However, it provides an answer much closer to what I need.

然而,它提供了一个更接近我需要的答案。

#1


1  

You can try and use information returned by getrusage():

您可以尝试使用getrusage()返回的信息:

#include <sys/time.h>
#include <sys/resource.h>

int getrusage(int who, struct rusage *usage);

struct rusage {
    struct timeval ru_utime; /* user CPU time used */
    struct timeval ru_stime; /* system CPU time used */
    long   ru_maxrss;        /* maximum resident set size */
    long   ru_ixrss;         /* integral shared memory size */
    long   ru_idrss;         /* integral unshared data size */
    long   ru_isrss;         /* integral unshared stack size */
    long   ru_minflt;        /* page reclaims (soft page faults) */
    long   ru_majflt;        /* page faults (hard page faults) */
    long   ru_nswap;         /* swaps */
    long   ru_inblock;       /* block input operations */
    long   ru_oublock;       /* block output operations */
    long   ru_msgsnd;        /* IPC messages sent */
    long   ru_msgrcv;        /* IPC messages received */
    long   ru_nsignals;      /* signals received */
    long   ru_nvcsw;         /* voluntary context switches */
    long   ru_nivcsw;        /* involuntary context switches */
};

If the memory information does not fit you purpose, observing the page fault counts can help monitor memory stress, which is what you intend to detect.

如果内存信息不适合您的目的,观察页面错误计数可以帮助监视内存压力,这是您要检测的。

#2


1  

Have you tried a shared library interposer for Linux for section (5) above? So long as your application is not statically linking the malloc functions, you can interpose a new function between your program and the kernel malloc. I've used this tactic many times to collect stats on memory usage.

您是否尝试过为上面第(5)节提供Linux的共享库interposer ?只要应用程序没有静态地链接malloc函数,就可以在程序和内核malloc之间插入一个新函数。我多次使用这个策略来收集有关内存使用的统计数据。

It does required setting LD_PRELOAD before running the program but no source or binary changes. It is an ideal answer in many cases.

它需要在运行程序之前设置LD_PRELOAD,但是没有源或二进制更改。在许多情况下,这是一个理想的答案。

Here is an example of a malloc interposer:

这里有一个malloc interposer的例子:

http://www.drdobbs.com/building-library-interposers-for-fun-and/184404926

http://www.drdobbs.com/building-library-interposers-for-fun-and/184404926

You probably will also want to do calloc and free. Calls to new generally end up as a call to malloc so C++ is covered as well.

你可能也想做calloc和free。对new的调用通常以对malloc的调用结束,因此也包括c++。

OS X seems to have similar capabilities but I have not tried it.

OS X似乎有类似的功能,但我还没有尝试过。

http://tlrobinson.net/blog/2007/12/overriding-library-functions-in-mac-os-x-the-easy-way-dyld_insert_libraries/

http://tlrobinson.net/blog/2007/12/overriding-library-functions-in-mac-os-x-the-easy-way-dyld_insert_libraries/

--Matt

——马特

#3


0  

Here is what I ended up using. I scan /proc/self/maps and sum the size of all the address ranges meeting my criteria, which is:

这是我最后使用的。我扫描/proc/self/maps,并将符合我标准的所有地址范围的大小相加,即:

  • Only include ranges from inode 0 (i.e. no devices, no mapped file, etc.)
  • 只包含inode 0的范围(即没有设备、没有映射文件等)
  • Only include ranges that are at least one of readable, writable or executable
  • 只包含至少一个可读、可写或可执行的范围
  • Only include private memory
    • In my experiments I did not see instances of shared memory from inode 0. Maybe with inter-process shared memory...?
    • 在我的实验中,我没有看到来自inode 0的共享内存实例。也许是进程间共享内存…?
  • 在我的实验中,我没有看到来自inode 0的共享内存实例。也许是进程间共享内存…?

Here is the code for my solution:

下面是我的解决方案的代码:

size_t getValue()
{
    FILE* file = fopen("/proc/self/maps", "r");
    if (!file)
    {
        assert(0);
        return 0;
    }

    size_t value = 0;

    char line[1024];
    while (fgets(line, 1024, file) != NULL)
    {
        ptrdiff_t start_address, end_address;
        char perms[4];
        ptrdiff_t offset;
        int dev_major, dev_minor;
        unsigned long int inode;
        const int nb_scanned = sscanf(
            line, "%16tx-%16tx %c%c%c%c %16tx %02x:%02x %lu",
            &start_address, &end_address,
            &perms[0], &perms[1], &perms[2], &perms[3],
            &offset, &dev_major, &dev_minor, &inode
            );
        if (10 != nb_scanned)
        {
            assert(0);
            continue;
        }

        if ((inode == 0) &&
            (perms[0] != '-' || perms[1] != '-' || perms[2] != '-') &&
            (perms[3] == 'p'))
        {
            assert(dev_major == 0);
            assert(dev_minor == 0);
            value += (end_address - start_address);
        }
    }

    fclose(file);

    return value;
}

Since this is looping through all the lines in /proc/self/maps, querying memory that way is significantly slower than using "Virtual Memory currently used by current process" from How to determine CPU and memory consumption from inside a process?.

由于这是循环遍历/proc/self/maps中的所有行,因此以这种方式查询内存要比使用“当前进程当前使用的虚拟内存”来确定进程内部的CPU和内存消耗要慢得多。

However, it provides an answer much closer to what I need.

然而,它提供了一个更接近我需要的答案。