Python os。fork OSError: [Errno 12]无法分配内存(但内存不是问题所在)

I have similar problem to this one: Python subprocess.Popen "OSError: [Errno 12] Cannot allocate memory"

我有类似的问题:Python子流程。Popen "OSError: [Errno 12]无法分配内存"

I have a daemon process that runs OK for a few minutes and then fails to run shell programs via popen2.Popen3(). It spawns 20 threads. Memory does not appear to be the issue; this is the only program running on the machine, which has 2G of RAM, and it's using less than 400M. I've been logging ru_maxrss and this is only 50M (before and after OSError is raised).

我有一个守护进程，可以运行几分钟，然后不能通过popen2.Popen3()运行shell程序。它生成20的线程。内存似乎不是问题所在;这是这台机器上运行的唯一一个程序，它有2G内存，而且它使用的内存还不到400亿。我一直在登录ru_maxrss，而这仅仅是5000米(在OSError出现之前和之后)。

ulimit -a:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 15962
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 15962
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I've also been watching free -m and ls /proc/$pid/fd | wc -l while it is running, and neither of these seems to indicate resource exhaustion. Here's typical free -m while running:

我也一直在看免费的-m和ls /proc/$pid/fd | wc -l，当它在运行时，这两种都不表示资源耗尽。以下是跑步时的典型免费-m:

             total       used       free     shared    buffers     cached
Mem:          2003        374       1628          0         46        154
-/+ buffers/cache:        173       1830
Swap:          283          0        283

... and the fd count is around 90-100.

…fd数大约是90-100。

The host is Ubuntu 12.04 (server jeos - minimal vm), Python 2.7.3, running on a VMWare host.

主机是Ubuntu 12.04(服务器jeos -最小vm)， Python 2.7.3，运行在VMWare主机上。

So I'm wondering: what do I do next to diagnose why this is failing? Are there some more resource stats I can gather? Do I need to get down to the level of strace?

所以我在想:接下来我要做什么来诊断为什么这是失败的呢?有更多的资源统计我可以收集吗?我需要降到斯特拉斯的水平吗?

2 个解决方案

#1

Hypothesis: if your VM is 32-bit, you may be running out of address space.

假设:如果您的VM是32位的，那么可能会耗尽地址空间。

Not memory: address space. Let me explain: in Linux many things (IO, video card, memory-mapped files) use up address space without necessarily consuming corresponding amount of main memory.

不是记忆:地址空间。让我解释一下:在Linux中，许多东西(IO，显卡，内存映射文件)占用了地址空间，而不需要消耗相应数量的主内存。

Here's an explanation of related issues:

(look for "Kernel virtual address space exhaustion on the X86 platform" section, use dmesg to test if that's the situation)

(查找“X86平台上的内核虚拟地址空间耗尽”一节，使用dmesg测试是否存在这种情况)

ENOMEM error in result of mmap may very well mean situation of "not enough address space", not just "not enough memory", although I'm not sure how to diagnose this in CPython. If you have some big files mmaped on your system by any process running on it, well..

mmap结果中的ENOMEM错误很可能意味着“没有足够的地址空间”，而不仅仅是“没有足够的内存”，尽管我不确定如何在CPython中诊断这个问题。如果你的系统上有一些大文件被运行在上面的进程mmaped，那么。

#2

Check if you have run out of space on your disk drive, that was the problem in my case.

检查磁盘驱动器上的空间是否用完，这是我的问题。

bravo@by1-dotbravo-01:~$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              16G   16G     0 100% /
tmpfs                 2.0G     0  2.0G   0% /dev/shm
/dev/sdb              296G  162G  119G  58% /home

#1