当out of memory发生时,out_of_memory函数会选择一个内核认为犯有分配过多内存 “罪行”的进程,并杀死该进程。
这就有很大的几率腾出较多的空闲页,然后再跳转回重试内存分配的操作。
这里我们不讨论out_of_memory 函数流程 选择要牺牲进程的策略方法。
我们仅讨论out of memory发生时,内核输出信息的含义。
1. OOM 信息
以下是一段典型的out of memory 内核输出信息:
<4>[12345.342532] systemd-journal invoked oom-killer: gfp_mask=0x800d0, order=0, oom_score_adj=0
<4>[12345.351216] CPU: 1 PID: 1371 Comm: systemd-journal Tainted: G O 3.14.31-00017-g40fab71 #1
<4>[12345.360695] Backtrace:
<4>[12345.363263] [<c0012fcc>] (dump_backtrace) from [<c00131a4>] (show_stack+0x20/0x24)
<4>[12345.371192] r6:00000000 r5:ffffffff r4:00000000 r3:bd943631
<4>[12345.377136] [<c0013184>] (show_stack) from [<c07bbe78>] (dump_stack+0x7c/0xc8)
<4>[12345.384710] [<c07bbdfc>] (dump_stack) from [<c07ba7e4>] (dump_header.isra.14+0x74/0x188)
<4>[12345.393184] r6:000800d0 r5:00000000 r4:e8088000 r3:00000002
<4>[12345.399126] [<c07ba770>] (dump_header.isra.14) from [<c00f8a28>] (oom_kill_process+0x230/0x3e0)
<4>[12345.408234] r10:00000000 r8:000800d0 r7:00000000 r6:c0b89aa8 r5:000800d0 r4:e9bb79c0
<4>[12345.416462] [<c00f87f8>] (oom_kill_process) from [<c00f90c8>] (out_of_memory+0x2f4/0x354)
<4>[12345.425024] r10:00000000 r9:00000000 r8:000800d0 r7:00000000 r6:c0b89aa8 r5:c0b89d08
<4>[12345.433249] r4:c0b89aa8
<4>[12345.435903] [<c00f8dd4>] (out_of_memory) from [<c00fd6c8>] (__alloc_pages_nodemask+0x93c/0x988)
<4>[12345.445011] r10:00000000 r9:c0c38fc0 r8:c0b871d8 r7:e8088000 r6:c0c39bc0 r5:00000000
<4>[12345.453234] r4:000800d0
<4>[12345.455887] [<c00fcd8c>] (__alloc_pages_nodemask) from [<c00fd734>] (__get_free_pages+0x20/0x3c)
<4>[12345.465087] r10:e97d36a8 r9:00000063 r8:e8089f6c r7:00000063 r6:b6f79f68 r5:e97d36a8
<4>[12345.473311] r4:00000000
<4>[12345.475965] [<c00fd714>] (__get_free_pages) from [<c0196878>] (proc_pid_readlink+0x68/0x110)
<4>[12345.484808] [<c0196810>] (proc_pid_readlink) from [<c013dcb8>] (SyS_readlinkat+0xf0/0x104)
<4>[12345.493461] r7:bea40520 r6:ffffff9c r5:00004000 r4:00000000
<4>[12345.499402] [<c013dbc8>] (SyS_readlinkat) from [<c000eee0>] (ret_fast_syscall+0x0/0x34)
<4>[12345.507785] r10:00000000 r9:e8088000 r8:c000f148 r7:0000014c r6:00000063 r5:b6f79f68
<4>[12345.516011] r4:00000064
<4>[12345.518663] Mem-info:
<4>[12345.521049] Normal per-cpu:
<4>[12345.523969] CPU 0: hi: 42, btch: 7 usd: 23
<4>[12345.528979] CPU 1: hi: 42, btch: 7 usd: 25
<4>[12345.534004] HighMem per-cpu:
<4>[12345.537013] CPU 0: hi: 186, btch: 31 usd: 27
<4>[12345.542199] CPU 1: hi: 186, btch: 31 usd: 29
<4>[12345.547247] active_anon:21860 inactive_anon:14790 isolated_anon:0
<4>[12345.547247] active_file:41585 inactive_file:10422 isolated_file:0
<4>[12345.547247] unevictable:0 dirty:9 writeback:205 unstable:0
<4>[12345.547247] free:285748 slab_reclaimable:2100 slab_unreclaimable:26286
<4>[12345.547247] mapped:26079 shmem:14857 pagetables:687 bounce:0
<4>[12345.547247] free_cma:57779
<4>[12345.581839] Normal free:233460kB min:2488kB low:3108kB high:3732kB active_anon:17312kB
inactive_anon:10824kB active_file:128kB inactive_file:4kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:774144kB managed:387568kB mlocked:0kB dirty:16kB writeback:76kB
mapped:3296kB shmem:10840kB slab_reclaimable:8400kB slab_unreclaimable:105144kB kernel_stack:1168kB
pagetables:2748kB unstable:0kB bounce:0kB free_cma:231116kB writeback_tmp:0kB pages_scanned:1648
all_unreclaimable? yes
<4>[12345.627014] lowmem_reserve[]: 0 10168 10168
<4>[12345.631565] HighMem free:909036kB min:512kB low:2604kB high:4696kB active_anon:70632kB
inactive_anon:48336kB active_file:166212kB inactive_file:41684kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:1301504kB managed:1301504kB mlocked:0kB dirty:20kB writeback:744kB
mapped:101020kB shmem:48588kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
<4>[12345.675614] lowmem_reserve[]: 0 0 0
<4>[12345.679437] Normal: 1165*4kB (MRC) 1122*8kB (RC) 1119*16kB (RC) 1118*32kB (C) 1068*64kB (RC)
748*128kB (C) 0*256kB 0*512kB 0*1024kB 1*2048kB (R) 0*4096kB 0*8192kB = 233460kB
<4>[12345.695797] HighMem: 99*4kB (M) 1148*8kB (UM) 1314*16kB (UM) 880*32kB (UM) 327*64kB (M)
87*128kB (M) 34*256kB (M) 38*512kB (M) 12*1024kB (M) 10*2048kB (M) 3*4096kB (M) 91*8192kB (UMR) = 909516kB
<4>[12345.714293] 66770 total pagecache pages
<4>[12345.718309] 0 pages in swap cache
<4>[12345.724832] Swap cache stats: add 0, delete 0, find 0/0
<4>[12345.730308] Free swap = 0kB
<4>[12345.733412] Total swap = 0kB
<4>[12345.747245] 520192 pages of RAM
<4>[12345.750577] 286253 free pages
<4>[12345.753778] 97924 reserved pages
<4>[12345.757258] 28061 slab pages
<4>[12345.760574] 115601 pages shared
<4>[12345.764283] 0 pages swap cached
<6>[12345.767572] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
<6>[12345.775906] [ 1366] 0 1366 459 125 3 0 0 sh
<6>[12345.785861] [ 1367] 0 1367 665 235 4 0 0 propertyd
<6>[12345.794802] [ 1368] 0 1368 26553 8835 58 0 0 seed
<6>[12345.803296] [ 1371] 0 1371 1648 772 5 0 0 systemd-journal
<6>[12345.812792] [ 1375] 0 1375 750 300 4 0 -1000 systemd-udevd
<6>[12345.822449] [ 2416] 1040 2416 3852 510 7 0 0 secd
<6>[12345.831341] [ 2419] 0 2419 6678 923 9 0 0 storagemanagerd
<6>[12345.840944] [ 2420] 0 2420 1267 497 5 0 0 connmand
<6>[12345.849566] [ 2422] 0 2422 4484 687 8 0 0 uuid
<6>[12345.857843] [ 2424] 0 2424 1161 358 5 0 0 connman-vpnd
<6>[12345.867271] [ 2427] 1000 2427 1593 461 6 0 0 logboxd
<6>[12345.875846] [ 2432] 0 2432 9483 1718 15 0 0 cmns
<6>[12345.884104] [ 2451] 81 2451 1355 474 4 0 -900 dbus-daemon
<6>[12345.893018] [ 2532] 0 2532 11794 246 10 0 0 adbd
<6>[12345.901304] [ 2535] 0 2535 1502 347 5 0 0 wpa_supplicant
<6>[12345.910473] [ 2536] 0 2536 12820 866 12 0 0 udisksd
<6>[12345.919119] [ 2537] 0 2537 1898 527 6 0 0 tyid
<6>[12345.927361] [ 2540] 0 2540 10076 2157 16 0 0 datamanagerd
<6>[12345.936349] [ 2554] 0 2554 5983 574 7 0 0 connectivityser
<6>[12345.945635] [ 2558] 0 2558 10604 5388 21 0 0 weston
<6>[12345.964101] [ 2589] 0 2589 14597 1917 17 0 0 pagemanagerd
<6>[12345.973272] [ 2590] 0 2590 3832 515 7 0 0 amt
<6>[12345.981730] [ 2593] 0 2593 6176 1343 12 0 0 weston-desktop-
<6>[12345.991046] [ 2599] 0 2599 7185 761 12 0 0 scim-launcher
<6>[12346.098925] [ 5580] 0 5580 458 116 3 0 0 sh
<6>[12346.107065] [ 5581] 0 5581 492 175 3 0 0 gzip
<3>[12346.115335] Out of memory: Kill process 5575 thread_x score 481 or sacrifice child
<3>[12346.124212] Killed process 5575 thread_x total-vm:106212kB, anon-rss:18036kB, file-rss:2704kB
2 OOM信息分析
2.1
<4>[12345.342532] systemd-journal invoked oom-killer: gfp_mask=0x800d0, order=0, oom_score_adj=0systemd-joural: 当前进程为systemd-journal,请求分配页面时,引发了oom-killer
gfp_mask=0x800d0: 是alloc_page的GFP标志,对于当前场景,代表___GFP_RECLAIMABLE | ___GFP_HIGH | ___GFP_IO | ___GFP_FS
order=0 : 表示alloc_page的order为0, 也就是说仅请求1^0=1个页面,
oom_score_adj=0: 表明这个进程被杀的几率, oom_score_adj取值0(never kill)~1000(always kill)
<4>[12345.351216] CPU: 1 PID: 1371 Comm: systemd-journal Tainted: G O 3.14.31-00017-g40fab71 #1
<4>[12345.360695] Backtrace:
<4>[12345.363263] [<c0012fcc>] (dump_backtrace) from [<c00131a4>] (show_stack+0x20/0x24)
<4>[12345.371192] r6:00000000 r5:ffffffff r4:00000000 r3:bd943631
<4>[12345.377136] [<c0013184>] (show_stack) from [<c07bbe78>] (dump_stack+0x7c/0xc8)
<4>[12345.384710] [<c07bbdfc>] (dump_stack) from [<c07ba7e4>] (dump_header.isra.14+0x74/0x188)
<4>[12345.393184] r6:000800d0 r5:00000000 r4:e8088000 r3:00000002
<4>[12345.399126] [<c07ba770>] (dump_header.isra.14) from [<c00f8a28>] (oom_kill_process+0x230/0x3e0)
<4>[12345.408234] r10:00000000 r8:000800d0 r7:00000000 r6:c0b89aa8 r5:000800d0 r4:e9bb79c0
<4>[12345.416462] [<c00f87f8>] (oom_kill_process) from [<c00f90c8>] (out_of_memory+0x2f4/0x354)
<4>[12345.425024] r10:00000000 r9:00000000 r8:000800d0 r7:00000000 r6:c0b89aa8 r5:c0b89d08
<4>[12345.433249] r4:c0b89aa8
<4>[12345.435903] [<c00f8dd4>] (out_of_memory) from [<c00fd6c8>] (__alloc_pages_nodemask+0x93c/0x988)
<4>[12345.445011] r10:00000000 r9:c0c38fc0 r8:c0b871d8 r7:e8088000 r6:c0c39bc0 r5:00000000
<4>[12345.453234] r4:000800d0
<4>[12345.455887] [<c00fcd8c>] (__alloc_pages_nodemask) from [<c00fd734>] (__get_free_pages+0x20/0x3c)
<4>[12345.465087] r10:e97d36a8 r9:00000063 r8:e8089f6c r7:00000063 r6:b6f79f68 r5:e97d36a8
<4>[12345.473311] r4:00000000
<4>[12345.475965] [<c00fd714>] (__get_free_pages) from [<c0196878>] (proc_pid_readlink+0x68/0x110)
<4>[12345.484808] [<c0196810>] (proc_pid_readlink) from [<c013dcb8>] (SyS_readlinkat+0xf0/0x104)
<4>[12345.493461] r7:bea40520 r6:ffffff9c r5:00004000 r4:00000000
<4>[12345.499402] [<c013dbc8>] (SyS_readlinkat) from [<c000eee0>] (ret_fast_syscall+0x0/0x34)
<4>[12345.507785] r10:00000000 r9:e8088000 r8:c000f148 r7:0000014c r6:00000063 r5:b6f79f68
<4>[12345.516011] r4:00000064
dump_header->dump_stack的输出的引发OOM的调用函数栈,从ret_fast_syscall开始dump_backtrace。
通过这段输出,可以推测systemd-journal调用readlink系统调用时,引发的一次分页操作,导致了OOM。
2.2
<4>[12345.518663] Mem-info:dump_header->show_mem输出当前系统内存信息。
<4>[12345.521049] Normal per-cpu:
<4>[12345.523969] CPU 0: hi: 42, btch: 7 usd: 23
<4>[12345.528979] CPU 1: hi: 42, btch: 7 usd: 25
<4>[12345.534004] HighMem per-cpu:
<4>[12345.537013] CPU 0: hi: 186, btch: 31 usd: 27
<4>[12345.542199] CPU 1: hi: 186, btch: 31 usd: 29
<4>[12345.547247] active_anon:21860 inactive_anon:14790 isolated_anon:0
<4>[12345.547247] active_file:41585 inactive_file:10422 isolated_file:0
<4>[12345.547247] unevictable:0 dirty:9 writeback:205 unstable:0
<4>[12345.547247] free:285748 slab_reclaimable:2100 slab_unreclaimable:26286
<4>[12345.547247] mapped:26079 shmem:14857 pagetables:687 bounce:0
<4>[12345.547247] free_cma:57779
<4>[12345.581839] Normal free:233460kB min:2488kB low:3108kB high:3732kB active_anon:17312kB
inactive_anon:10824kB active_file:128kB inactive_file:4kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:774144kB managed:387568kB mlocked:0kB dirty:16kB writeback:76kB
mapped:3296kB shmem:10840kB slab_reclaimable:8400kB slab_unreclaimable:105144kB kernel_stack:1168kB
pagetables:2748kB unstable:0kB bounce:0kB free_cma:231116kB writeback_tmp:0kB pages_scanned:1648
all_unreclaimable? yes
<4>[12345.627014] lowmem_reserve[]: 0 10168 10168
<4>[12345.631565] HighMem free:909036kB min:512kB low:2604kB high:4696kB active_anon:70632kB
inactive_anon:48336kB active_file:166212kB inactive_file:41684kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:1301504kB managed:1301504kB mlocked:0kB dirty:20kB writeback:744kB
mapped:101020kB shmem:48588kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
<4>[12345.675614] lowmem_reserve[]: 0 0 0
<4>[12345.679437] Normal: 1165*4kB (MRC) 1122*8kB (RC) 1119*16kB (RC) 1118*32kB (C) 1068*64kB (RC)
748*128kB (C) 0*256kB 0*512kB 0*1024kB 1*2048kB (R) 0*4096kB 0*8192kB = 233460kB
<4>[12345.695797] HighMem: 99*4kB (M) 1148*8kB (UM) 1314*16kB (UM) 880*32kB (UM) 327*64kB (M)
87*128kB (M) 34*256kB (M) 38*512kB (M) 12*1024kB (M) 10*2048kB (M) 3*4096kB (M) 91*8192kB (UMR) = 909516kB
<4>[12345.714293] 66770 total pagecache pages
<4>[12345.718309] 0 pages in swap cache
<4>[12345.724832] Swap cache stats: add 0, delete 0, find 0/0
<4>[12345.730308] Free swap = 0kB
<4>[12345.733412] Total swap = 0kB
<4>[12345.747245] 520192 pages of RAM
<4>[12345.750577] 286253 free pages
<4>[12345.753778] 97924 reserved pages
<4>[12345.757258] 28061 slab pages
<4>[12345.760574] 115601 pages shared
<4>[12345.764283] 0 pages swap cached
2.2.1
<4>[12345.521049] Normal per-cpu:
<4>[12345.523969] CPU 0: hi: 42, btch: 7 usd: 23
<4>[12345.528979] CPU 1: hi: 42, btch: 7 usd: 25
<4>[12345.534004] HighMem per-cpu:
<4>[12345.537013] CPU 0: hi: 186, btch: 31 usd: 27
<4>[12345.542199] CPU 1: hi: 186, btch: 31 usd: 29
每个内存管理区定义了一个“每CPU”页框高速缓存,所有“每CPU”高速缓存包含一些预先分配的页框,它们被用于满足本地CPU 发出的单个页内存请求。
CPU 0: hi: 42, btch: 7 usd: 23
表示 CPU 0,
hi: 42 表示上限值,超过这个数字,则释放batch个页框到buddy系统中
btch: 7 表示向高速缓存添加或者删除页框时,页框块的大小
usd: 23 页框高速缓存中的页框数目
2.2.2
<4>[12345.547247] active_anon:21860 inactive_anon:14790 isolated_anon:0
<4>[12345.547247] active_file:41585 inactive_file:10422 isolated_file:0
<4>[12345.547247] unevictable:0 dirty:9 writeback:205 unstable:0
<4>[12345.547247] free:285748 slab_reclaimable:2100 slab_unreclaimable:26286
<4>[12345.547247] mapped:26079 shmem:14857 pagetables:687 bounce:0
<4>[12345.547247] free_cma:57779
active_anon: 活动的匿名映射,"活动"是指最近被访问过,"匿名"则指页面映射不与任何数据源相关
inactive_anon: 非活动的匿名映射
isolated_anon: DON'T KNOW
active_file: 活动的文件映射,页面映射和磁盘文件相关联
inactive_file: 非活动的文件映射
isolated_file: DON'T KNOW
unevictable:
dirty: 脏页面,表示页面的内容和快设备上的原始内容已经不一至
writeback: 当前页面正处在回写状态
unstable:
free: 空闲页面
slab_relaimable: slab cache中可回收的页面
slab_unreclaimable: slab cache中不可以回收的页面
mapped: BH_MAPPED,表示这个页面被用做快设备的buffer映射,注意这个映射不同于anon和file映射。
shmem: 用于共享内存映射的页面
pagetable: 页表占用的页面,也就是PTE PTD占用的页面数目
bounce:
free_cma: continuous memory allocator的空闲页面。
2.2.3
<4>[12345.581839] Normal free:233460kB min:2488kB low:3108kB high:3732kB active_anon:17312kB inactive_anon:10824kB
active_file:128kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:774144kB
managed:387568kB mlocked:0kB dirty:16kB writeback:76kB mapped:3296kB shmem:10840kB slab_reclaimable:8400kB
slab_unreclaimable:105144kB kernel_stack:1168kB pagetables:2748kB unstable:0kB bounce:0kB free_cma:231116kB
writeback_tmp:0kB pages_scanned:1648 all_unreclaimable? yes
<4>[12345.627014] lowmem_reserve[]: 0 10168 10168
<4>[12345.631565] HighMem free:909036kB min:512kB low:2604kB high:4696kB active_anon:70632kB inactive_anon:48336kB
active_file:166212kB inactive_file:41684kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1301504kB
managed:1301504kB mlocked:0kB dirty:20kB writeback:744kB mapped:101020kB shmem:48588kB slab_reclaimable:0kB
slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
<4>[12345.675614] lowmem_reserve[]: 0 0 0
Normal free: Normal zone的空闲空间
min, low, high是normal zone执行页面置换的几个水印
lowmem_reserve: 表示该分zone为其他zone预留的可分配页面数
present: 表示zone的物理内存大小
managed: 是buddy系统管理的present内存大小,managed = preset - reserved
其他值可参考2.2.2节,除了数值代表Normal zone,其他含义类似。
注意1,有几项是Normal特有的,比如kernel_stack, pagetables, free_cma, slab_reclaimable, slab_unreclaimable,是因为normal zone的页面是直接映射,这些页面是供内核中使用的。
对于highmem,主要用来匿名映射,文件映射,mapped,以及共享内存。
2.2.4
<4>[12345.679437] Normal: 1165*4kB (MRC) 1122*8kB (RC) 1119*16kB (RC) 1118*32kB (C) 1068*64kB (RC)buddy系统信息信息, order范围0~11
748*128kB (C) 0*256kB 0*512kB 0*1024kB 1*2048kB (R) 0*4096kB 0*8192kB = 233460kB
<4>[12345.695797] HighMem: 99*4kB (M) 1148*8kB (UM) 1314*16kB (UM) 880*32kB (UM) 327*64kB (M)
87*128kB (M) 34*256kB (M) 38*512kB (M) 12*1024kB (M) 10*2048kB (M) 3*4096kB (M) 91*8192kB (UMR) = 909516kB
M表示 moveable
R表示 Reserve
C表示 CMA
U表示 unmovable
E表示 reclaimable
1. 仅有 (C),表示这个freelist只能分配给带有ALLOC_CMA标志的分配
2. Highmem没有C标记,这是因为连续内存分配只发生在Normal zone中
3. MRC,表示这个freelist既有CMA内存,Reserve内存还有Movable内存
3 Who triggered OOM
有几个因素影响OOM的发生
1. 分配的order大小,以及系统对待order的方式
2. 分配发生在哪个zone
3. Zone的水印大小
4. 内存碎片化程度
5. 据说不停的分配地址空间,也会导致OOM的发生(还没验证过)
对于上面的OOM信息,我们可以看到系统内有很大的空闲空间:233460KB,但是OOM仍然发生了。
首先分配的order为0,所以和碎片化是无关的,gfp_mask=0x800d0说明分配发生在Normal分区,并且类型为Reclaimable,Reclaimable也就意味着无法从CMA分配内存。
既然不是order过大导致分配失败,那么就是free空间小于内存水印min导致了OOM killer。
static bool __zone_watermark_ok(struct zone *z, unsigned int order,分配类型为Reclaimable,导致free空间必须减去CMA空闲空间233460KB - 223116kB = 2352KB,小于min 水印,系统启动OOM killer
unsigned long mark, int classzone_idx, int alloc_flags,
long free_pages)
{
...
#ifdef CONFIG_CMA
/* If allocation can't use CMA areas don't use free CMA pages */
if (!(alloc_flags & ALLOC_CMA))
free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
#endif
if (free_pages - free_cma <= min + z->lowmem_reserve[classzone_idx])
return false;
...
}