My app was killed by the oom-killer. It is Ubuntu 11.10 running on a live USB with no swap and the PC has 1 Gig of RAM. The only app running (other than all the built in Ubuntu stuff) is my program flasherav. Note that /tmp is memory mapped and at the time of the crash had about 200MB of files in it (so was taking up ~200MB of RAM).
我的应用程序被杀了。它是ubuntu11.10,运行在一个没有交换的实时USB上,PC有1g的RAM。唯一运行的应用程序(除了所有Ubuntu内置的东西)是我的程序flasherav。注意,/tmp是内存映射的,在崩溃时,它有大约200MB的文件(因此占用了大约200MB的RAM)。
I'm trying to understand how to analyze the om-killer log such that I can understand where exactly all the memory is being used- i.e. what are the different chunks that will add up to ~1 gig which resulted in the oom-killer kicking in? Once I understand that, I can work on reducing the offender's usage so the app will run on a machine with 1 GB of ram. My specific questions are.
我正在尝试理解如何分析杀人记录,这样我就可以理解所有的内存都在哪里被使用——也就是说,有哪些不同的块加起来就会导致杀人魔开始攻击?一旦我理解了这一点,我就可以减少违规者的使用,这样应用程序就会运行在一台有1gb内存的机器上。我的具体问题。
To try to analyze the situation, I summed up the "total_vm" column and I only get 609342KB (which when added to the 200MB in /tmp is still only 809MB). Maybe I'm wrong on what the "total_vm" column is- does it include allocated but not used memory plus shared memory. If yes, then shouldn't it far overstate actually used memory (and therefore I shouldn't be out of memory), right? Are there other chunks of memory in use that aren't accounted for in the list below?
为了分析这种情况,我总结了“total_vm”列,只得到609342KB(添加到200MB in /tmp时,仍然只有809MB)。也许我在“total_vm”列上错了——它包含已分配但未使用的内存和共享内存吗?如果是,那么它是否应该夸大实际使用的内存(因此我不应该超出内存),对吗?是否有其他内存块在使用中没有在下面的列表中说明?
[11686.040460] flasherav invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
[11686.040467] flasherav cpuset=/ mems_allowed=0
[11686.040472] Pid: 2859, comm: flasherav Not tainted 3.0.0-12-generic #20-Ubuntu
[11686.040476] Call Trace:
[11686.040488] [<c10e1c15>] dump_header.isra.7+0x85/0xc0
[11686.040493] [<c10e1e6c>] oom_kill_process+0x5c/0x80
[11686.040498] [<c10e225f>] out_of_memory+0xbf/0x1d0
[11686.040503] [<c10e6123>] __alloc_pages_nodemask+0x6c3/0x6e0
[11686.040509] [<c10e78d3>] ? __do_page_cache_readahead+0xe3/0x170
[11686.040514] [<c10e0fc8>] filemap_fault+0x218/0x390
[11686.040519] [<c1001c24>] ? __switch_to+0x94/0x1a0
[11686.040525] [<c10fb5ee>] __do_fault+0x3e/0x4b0
[11686.040530] [<c1069971>] ? enqueue_hrtimer+0x21/0x80
[11686.040535] [<c10fec2c>] handle_pte_fault+0xec/0x220
[11686.040540] [<c10fee68>] handle_mm_fault+0x108/0x210
[11686.040546] [<c152fa00>] ? vmalloc_fault+0xee/0xee
[11686.040551] [<c152fb5b>] do_page_fault+0x15b/0x4a0
[11686.040555] [<c1069a90>] ? update_rmtp+0x80/0x80
[11686.040560] [<c106a7b6>] ? hrtimer_start_range_ns+0x26/0x30
[11686.040565] [<c106aeaf>] ? sys_nanosleep+0x4f/0x60
[11686.040569] [<c152fa00>] ? vmalloc_fault+0xee/0xee
[11686.040574] [<c152cfcf>] error_code+0x67/0x6c
[11686.040580] [<c1520000>] ? reserve_backup_gdb.isra.11+0x26d/0x2c0
[11686.040583] Mem-Info:
[11686.040585] DMA per-cpu:
[11686.040588] CPU 0: hi: 0, btch: 1 usd: 0
[11686.040592] CPU 1: hi: 0, btch: 1 usd: 0
[11686.040594] Normal per-cpu:
[11686.040597] CPU 0: hi: 186, btch: 31 usd: 5
[11686.040600] CPU 1: hi: 186, btch: 31 usd: 30
[11686.040603] HighMem per-cpu:
[11686.040605] CPU 0: hi: 42, btch: 7 usd: 7
[11686.040608] CPU 1: hi: 42, btch: 7 usd: 22
[11686.040613] active_anon:113150 inactive_anon:113378 isolated_anon:0
[11686.040615] active_file:86 inactive_file:1964 isolated_file:0
[11686.040616] unevictable:0 dirty:0 writeback:0 unstable:0
[11686.040618] free:13274 slab_reclaimable:2239 slab_unreclaimable:2594
[11686.040619] mapped:1387 shmem:4380 pagetables:1375 bounce:0
[11686.040627] DMA free:4776kB min:784kB low:980kB high:1176kB active_anon:5116kB inactive_anon:5472kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:80kB slab_unreclaimable:168kB kernel_stack:96kB pagetables:64kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:6 all_unreclaimable? yes
[11686.040634] lowmem_reserve[]: 0 865 1000 1000
[11686.040644] Normal free:48212kB min:44012kB low:55012kB high:66016kB active_anon:383196kB inactive_anon:383704kB active_file:344kB inactive_file:7884kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:5548kB shmem:17520kB slab_reclaimable:8876kB slab_unreclaimable:10208kB kernel_stack:1960kB pagetables:3976kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:930 all_unreclaimable? yes
[11686.040652] lowmem_reserve[]: 0 0 1078 1078
[11686.040662] HighMem free:108kB min:132kB low:1844kB high:3560kB active_anon:64288kB inactive_anon:64336kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:138072kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:1460kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:61 all_unreclaimable? yes
[11686.040669] lowmem_reserve[]: 0 0 0 0
[11686.040675] DMA: 20*4kB 24*8kB 34*16kB 26*32kB 19*64kB 13*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4784kB
[11686.040690] Normal: 819*4kB 607*8kB 357*16kB 176*32kB 99*64kB 49*128kB 23*256kB 4*512kB 0*1024kB 0*2048kB 2*4096kB = 48212kB
[11686.040704] HighMem: 16*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 80kB
[11686.040718] 14680 total pagecache pages
[11686.040721] 8202 pages in swap cache
[11686.040724] Swap cache stats: add 2191074, delete 2182872, find 1247325/1327415
[11686.040727] Free swap = 0kB
[11686.040729] Total swap = 524284kB
[11686.043240] 262100 pages RAM
[11686.043244] 34790 pages HighMem
[11686.043246] 5610 pages reserved
[11686.043248] 2335 pages shared
[11686.043250] 240875 pages non-shared
[11686.043253] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
[11686.043266] [ 1084] 0 1084 662 1 0 0 0 upstart-udev-br
[11686.043271] [ 1094] 0 1094 743 79 0 -17 -1000 udevd
[11686.043276] [ 1104] 101 1104 7232 42 0 0 0 rsyslogd
[11686.043281] [ 1149] 103 1149 1066 188 1 0 0 dbus-daemon
[11686.043286] [ 1165] 0 1165 1716 66 0 0 0 modem-manager
[11686.043291] [ 1220] 106 1220 861 42 0 0 0 avahi-daemon
[11686.043296] [ 1221] 106 1221 829 0 1 0 0 avahi-daemon
[11686.043301] [ 1255] 0 1255 6880 117 0 0 0 NetworkManager
[11686.043306] [ 1308] 0 1308 5988 144 0 0 0 polkitd
[11686.043311] [ 1334] 0 1334 723 85 0 -17 -1000 udevd
[11686.043316] [ 1335] 0 1335 730 108 0 -17 -1000 udevd
[11686.043320] [ 1375] 0 1375 663 37 0 0 0 upstart-socket-
[11686.043325] [ 1464] 0 1464 1333 120 1 0 0 login
[11686.043330] [ 1467] 0 1467 1333 135 1 0 0 login
[11686.043335] [ 1486] 0 1486 1333 135 1 0 0 login
[11686.043339] [ 1487] 0 1487 1333 136 1 0 0 login
[11686.043344] [ 1493] 0 1493 1333 134 1 0 0 login
[11686.043349] [ 1528] 0 1528 496 45 0 0 0 acpid
[11686.043354] [ 1529] 0 1529 607 46 1 0 0 cron
[11686.043359] [ 1549] 0 1549 10660 100 0 0 0 lightdm
[11686.043363] [ 1550] 0 1550 570 28 0 0 0 atd
[11686.043368] [ 1584] 0 1584 855 35 0 0 0 irqbalance
[11686.043373] [ 1703] 0 1703 17939 9653 0 0 0 Xorg
[11686.043378] [ 1874] 0 1874 7013 174 0 0 0 console-kit-dae
[11686.043382] [ 1958] 0 1958 1124 52 1 0 0 bluetoothd
[11686.043388] [ 2048] 999 2048 2435 641 1 0 0 bash
[11686.043392] [ 2049] 999 2049 2435 595 0 0 0 bash
[11686.043397] [ 2050] 999 2050 2435 587 1 0 0 bash
[11686.043402] [ 2051] 999 2051 2435 634 1 0 0 bash
[11686.043406] [ 2054] 999 2054 2435 569 0 0 0 bash
[11686.043411] [ 2155] 0 2155 1333 128 0 0 0 login
[11686.043416] [ 2222] 0 2222 684 67 1 0 0 dhclient
[11686.043420] [ 2240] 999 2240 2435 415 0 0 0 bash
[11686.043425] [ 2244] 0 2244 3631 58 0 0 0 accounts-daemon
[11686.043430] [ 2258] 999 2258 11683 277 0 0 0 gnome-session
[11686.043435] [ 2407] 999 2407 964 24 0 0 0 ssh-agent
[11686.043440] [ 2410] 999 2410 937 53 0 0 0 dbus-launch
[11686.043444] [ 2411] 999 2411 1319 300 1 0 0 dbus-daemon
[11686.043449] [ 2413] 999 2413 2287 88 0 0 0 gvfsd
[11686.043454] [ 2418] 999 2418 7867 123 1 0 0 gvfs-fuse-daemo
[11686.043459] [ 2427] 999 2427 32720 804 0 0 0 gnome-settings-
[11686.043463] [ 2437] 999 2437 10750 124 0 0 0 gnome-keyring-d
[11686.043468] [ 2442] 999 2442 2321 244 1 0 0 gconfd-2
[11686.043473] [ 2447] 0 2447 6490 156 0 0 0 upowerd
[11686.043478] [ 2467] 999 2467 7590 87 0 0 0 dconf-service
[11686.043482] [ 2529] 999 2529 11807 211 0 0 0 gsd-printer
[11686.043487] [ 2531] 999 2531 12162 587 0 0 0 metacity
[11686.043492] [ 2535] 999 2535 19175 960 0 0 0 unity-2d-panel
[11686.043496] [ 2536] 999 2536 19408 1012 0 0 0 unity-2d-launch
[11686.043502] [ 2539] 999 2539 16154 1120 1 0 0 nautilus
[11686.043506] [ 2540] 999 2540 17888 534 0 0 0 nm-applet
[11686.043511] [ 2541] 999 2541 7005 253 0 0 0 polkit-gnome-au
[11686.043516] [ 2544] 999 2544 8930 430 0 0 0 bamfdaemon
[11686.043521] [ 2545] 999 2545 11217 442 1 0 0 bluetooth-apple
[11686.043525] [ 2547] 999 2547 510 16 0 0 0 sh
[11686.043530] [ 2548] 999 2548 11205 301 1 0 0 gnome-fallback-
[11686.043535] [ 2565] 999 2565 6614 179 1 0 0 gvfs-gdu-volume
[11686.043539] [ 2567] 0 2567 5812 164 1 0 0 udisks-daemon
[11686.043544] [ 2571] 0 2571 1580 69 0 0 0 udisks-daemon
[11686.043549] [ 2579] 999 2579 16354 1035 0 0 0 unity-panel-ser
[11686.043554] [ 2602] 0 2602 1188 47 0 0 0 sudo
[11686.043559] [ 2603] 0 2603 374634 181503 0 0 0 flasherav
[11686.043564] [ 2607] 999 2607 12673 189 0 0 0 indicator-appli
[11686.043569] [ 2609] 999 2609 19313 311 1 0 0 indicator-datet
[11686.043573] [ 2611] 999 2611 15738 225 0 0 0 indicator-messa
[11686.043578] [ 2615] 999 2615 17433 237 1 0 0 indicator-sessi
[11686.043583] [ 2627] 999 2627 2393 132 0 0 0 gvfsd-trash
[11686.043588] [ 2640] 999 2640 1933 85 0 0 0 geoclue-master
[11686.043592] [ 2650] 0 2650 2498 1136 1 0 0 mount.ntfs
[11686.043598] [ 2657] 999 2657 6624 128 1 0 0 telepathy-indic
[11686.043602] [ 2659] 999 2659 2246 112 0 0 0 mission-control
[11686.043607] [ 2662] 999 2662 5431 346 1 0 0 gdu-notificatio
[11686.043612] [ 2664] 0 2664 3716 2392 0 0 0 mount.ntfs
[11686.043617] [ 2679] 999 2679 12453 197 1 0 0 zeitgeist-datah
[11686.043621] [ 2685] 999 2685 5196 1581 1 0 0 zeitgeist-daemo
[11686.043626] [ 2934] 999 2934 16305 710 0 0 0 gnome-terminal
[11686.043631] [ 2938] 999 2938 553 0 0 0 0 gnome-pty-helpe
[11686.043636] [ 2939] 999 2939 1814 406 0 0 0 bash
[11686.043641] Out of memory: Kill process 2603 (flasherav) score 761 or sacrifice child
[11686.043647] Killed process 2603 (flasherav) total-vm:1498536kB, anon-rss:721784kB, file-rss:4228kB
4 个解决方案
#1
23
Memory management in Linux is a bit tricky to understand, and I can't say I fully understand it yet, but I'll try to share a little bit of my experience and knowledge.
Linux中的内存管理有点难以理解,我不能说我完全理解它,但是我将尝试分享我的一些经验和知识。
Short answer to your question: Yes there are other stuff included than whats in the list.
对你的问题的简短回答是:是的,除了清单上的其他东西外,还有其他的东西。
What's being shown in your list is applications run in userspace. The kernel uses memory for itself and modules, on top of that it also has a lower limit of free memory that you can't go under. When you've reached that level it will try to free up resources, and when it can't do that anymore, you end up with an OOM problem.
在列表中显示的是在userspace中运行的应用程序。内核为自己和模块使用内存,除此之外,它还有一个更低的*内存限制,您无法进入。当你达到这个水平时,它会试图释放资源,当它不能再这么做时,你就会遇到OOM问题。
From the last line of your list you can read that the kernel reports a total-vm usage of: 1498536kB (1,5GB), where the total-vm includes both your physical RAM and swap space. You stated you don't have any swap but the kernel seems to think otherwise since your swap space is reported to be full (Total swap = 524284kB, Free swap = 0kB) and it reports a total vmem size of 1,5GB.
从列表的最后一行可以看到,内核报告了一个total-vm使用情况:1498536kB (1,5gb),其中total-vm包含物理RAM和交换空间。您声明没有任何交换,但是内核似乎不这么认为,因为您的交换空间被报告为full (Total swap = 524284kB, Free swap = 0kB),并且它报告总的vmem大小为1,5gb。
Another thing that can complicate things further is memory fragmentation. You can hit the OOM killer when the kernel tries to allocate lets say 4096kB of continous memory, but there are no free ones availible.
另一个可能使事情进一步复杂化的是内存碎片。当内核试图分配比如4096kB的连续内存时,您可以使用OOM杀手,但是没有免费的。
Now that alone probably won't help you solve the actual problem. I don't know if it's normal for your program to require that amount of memory, but I would recommend to try a static code analyzer like cppcheck to check for memory leaks or file descriptor leaks. You could also try to run it through Valgrind to get a bit more information out about memory usage.
现在单靠这个可能不会帮助你解决实际的问题。我不知道您的程序需要这么多内存是否正常,但我建议尝试使用cppcheck这样的静态代码分析器检查内存泄漏或文件描述符泄漏。您也可以试着在Valgrind中运行它,以获得关于内存使用的更多信息。
#2
12
Sum of total_vm
is 847170 and sum of rss
is 214726, these two values are counted in 4kB pages, which means when oom-killer was running, you had used 214726*4kB=858904kB physical memory and swap space.
total_vm的和是847170,rss的和是214726,这两个值在4kB页面中计数,这意味着在运行oom杀手时,您使用了214726*4kB=858904kB物理内存和交换空间。
Since your physical memory is 1GB and ~200MB was used for memory mapping, it's reasonable for invoking oom-killer when 858904kB was used.
由于您的物理内存是1GB,并且内存映射使用的是~200MB,所以当使用858904kB时,调用oom-killer是合理的。
rss
for process 2603 is 181503, which means 181503*4KB=726012 rss, was equal to sum of anon-rss
and file-rss
.
process 2603的rss是181503,意思是181503*4KB=726012 rss,等于anon-rss和file-rss的总和。
[11686.043647] Killed process 2603 (flasherav) total-vm:1498536kB, anon-rss:721784kB, file-rss:4228kB
[11686.043647]杀死进程2603 (flasherav) total-vm:1498536kB, anon-rss:721784kB, file-rss:4228kB
#3
5
This webpage have an explanation and a solution.
这个网页有一个解释和一个解决方案。
The solution is:
解决方案是:
To fix this problem the behavior of the kernel has to be changed, so it will no longer overcommit the memory for application requests. Finally I have included those mentioned values into the /etc/sysctl.conf file, so they get automatically applied on start-up:
为了解决这个问题,内核的行为必须被修改,这样它就不会再为应用程序请求过度提交内存。最后,我将上述值包含到/etc/sysctl中conf文件,在启动时自动应用:
vm.overcommit_memory = 2
vm。overcommit_memory = 2
vm.overcommit_ratio = 80
vm。overcommit_ratio = 80
#4
0
Im having hundreds of vms. it is pretty hard to us, to understand every log in a vm where a process got killed.
我有数百个虚拟机。我们很难理解vm中每一个进程被杀死的日志。
for this kind of situation this will help you.
这种情况对你有帮助。
#1
23
Memory management in Linux is a bit tricky to understand, and I can't say I fully understand it yet, but I'll try to share a little bit of my experience and knowledge.
Linux中的内存管理有点难以理解,我不能说我完全理解它,但是我将尝试分享我的一些经验和知识。
Short answer to your question: Yes there are other stuff included than whats in the list.
对你的问题的简短回答是:是的,除了清单上的其他东西外,还有其他的东西。
What's being shown in your list is applications run in userspace. The kernel uses memory for itself and modules, on top of that it also has a lower limit of free memory that you can't go under. When you've reached that level it will try to free up resources, and when it can't do that anymore, you end up with an OOM problem.
在列表中显示的是在userspace中运行的应用程序。内核为自己和模块使用内存,除此之外,它还有一个更低的*内存限制,您无法进入。当你达到这个水平时,它会试图释放资源,当它不能再这么做时,你就会遇到OOM问题。
From the last line of your list you can read that the kernel reports a total-vm usage of: 1498536kB (1,5GB), where the total-vm includes both your physical RAM and swap space. You stated you don't have any swap but the kernel seems to think otherwise since your swap space is reported to be full (Total swap = 524284kB, Free swap = 0kB) and it reports a total vmem size of 1,5GB.
从列表的最后一行可以看到,内核报告了一个total-vm使用情况:1498536kB (1,5gb),其中total-vm包含物理RAM和交换空间。您声明没有任何交换,但是内核似乎不这么认为,因为您的交换空间被报告为full (Total swap = 524284kB, Free swap = 0kB),并且它报告总的vmem大小为1,5gb。
Another thing that can complicate things further is memory fragmentation. You can hit the OOM killer when the kernel tries to allocate lets say 4096kB of continous memory, but there are no free ones availible.
另一个可能使事情进一步复杂化的是内存碎片。当内核试图分配比如4096kB的连续内存时,您可以使用OOM杀手,但是没有免费的。
Now that alone probably won't help you solve the actual problem. I don't know if it's normal for your program to require that amount of memory, but I would recommend to try a static code analyzer like cppcheck to check for memory leaks or file descriptor leaks. You could also try to run it through Valgrind to get a bit more information out about memory usage.
现在单靠这个可能不会帮助你解决实际的问题。我不知道您的程序需要这么多内存是否正常,但我建议尝试使用cppcheck这样的静态代码分析器检查内存泄漏或文件描述符泄漏。您也可以试着在Valgrind中运行它,以获得关于内存使用的更多信息。
#2
12
Sum of total_vm
is 847170 and sum of rss
is 214726, these two values are counted in 4kB pages, which means when oom-killer was running, you had used 214726*4kB=858904kB physical memory and swap space.
total_vm的和是847170,rss的和是214726,这两个值在4kB页面中计数,这意味着在运行oom杀手时,您使用了214726*4kB=858904kB物理内存和交换空间。
Since your physical memory is 1GB and ~200MB was used for memory mapping, it's reasonable for invoking oom-killer when 858904kB was used.
由于您的物理内存是1GB,并且内存映射使用的是~200MB,所以当使用858904kB时,调用oom-killer是合理的。
rss
for process 2603 is 181503, which means 181503*4KB=726012 rss, was equal to sum of anon-rss
and file-rss
.
process 2603的rss是181503,意思是181503*4KB=726012 rss,等于anon-rss和file-rss的总和。
[11686.043647] Killed process 2603 (flasherav) total-vm:1498536kB, anon-rss:721784kB, file-rss:4228kB
[11686.043647]杀死进程2603 (flasherav) total-vm:1498536kB, anon-rss:721784kB, file-rss:4228kB
#3
5
This webpage have an explanation and a solution.
这个网页有一个解释和一个解决方案。
The solution is:
解决方案是:
To fix this problem the behavior of the kernel has to be changed, so it will no longer overcommit the memory for application requests. Finally I have included those mentioned values into the /etc/sysctl.conf file, so they get automatically applied on start-up:
为了解决这个问题,内核的行为必须被修改,这样它就不会再为应用程序请求过度提交内存。最后,我将上述值包含到/etc/sysctl中conf文件,在启动时自动应用:
vm.overcommit_memory = 2
vm。overcommit_memory = 2
vm.overcommit_ratio = 80
vm。overcommit_ratio = 80
#4
0
Im having hundreds of vms. it is pretty hard to us, to understand every log in a vm where a process got killed.
我有数百个虚拟机。我们很难理解vm中每一个进程被杀死的日志。
for this kind of situation this will help you.
这种情况对你有帮助。