Answers:
如果某个进程占用了太多内存,则内核“内存不足”(OOM)杀手将自动杀死有问题的进程。听起来这可能是您的工作发生了。内核日志应显示OOM杀手的动作,因此请使用“ dmesg”命令查看发生的情况,例如
dmesg | less
您将看到一个OOM杀手消息,如下所示:
[ 54.125380] Out of memory: Kill process 8320 (stress-ng-brk) score 324 or sacrifice child
[ 54.125382] Killed process 8320 (stress-ng-brk) total-vm:1309660kB, anon-rss:1287796kB, file-rss:76kB
[ 54.522906] gmain invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
[ 54.522908] gmain cpuset=accounts-daemon.service mems_allowed=0
[ 54.522912] CPU: 6 PID: 1032 Comm: gmain Not tainted 4.4.0-0-generic #3-Ubuntu
[ 54.522913] Hardware name: Intel Corporation Skylake Client platform/Skylake DT DDR4 RVP8, BIOS SKLSE2R1.R00.B089.B00.1506160228 06/16/2015
[ 54.522914] 0000000000000000 000000002d879fe9 ffff88016d727a58 ffffffff813d8604
[ 54.522915] ffff88016d727c50 ffff88016d727ac8 ffffffff8120272e 0000000000000015
[ 54.522916] 0000000000000000 ffff880080ab3600 ffff880086725880 ffff88016d727ab8
[ 54.522917] Call Trace:
[ 54.522921] [<ffffffff813d8604>] dump_stack+0x44/0x60
[ 54.522924] [<ffffffff8120272e>] dump_header+0x5a/0x1c5
[ 54.522926] [<ffffffff81376bd8>] ? apparmor_capable+0xb8/0x120
[ 54.522928] [<ffffffff8118b472>] oom_kill_process+0x202/0x3b0
[ 54.522929] [<ffffffff8118b885>] out_of_memory+0x215/0x460
[ 54.522931] [<ffffffff81191740>] __alloc_pages_nodemask+0x9b0/0xb40
[ 54.522933] [<ffffffff811da7cc>] alloc_pages_current+0x8c/0x110
[ 54.522934] [<ffffffff81187d75>] __page_cache_alloc+0xb5/0xc0
[ 54.522935] [<ffffffff81189f4a>] filemap_fault+0x14a/0x3f0
[ 54.522937] [<ffffffff811b6140>] __do_fault+0x50/0xe0
[ 54.522938] [<ffffffff811b9b82>] handle_mm_fault+0xf92/0x1840
[ 54.522939] [<ffffffff812526a7>] ? eventfd_ctx_read+0x67/0x210
[ 54.522941] [<ffffffff81068517>] __do_page_fault+0x197/0x400
[ 54.522942] [<ffffffff810687a2>] do_page_fault+0x22/0x30
[ 54.522944] [<ffffffff8180e2f8>] page_fault+0x28/0x30
[ 54.522945] Mem-Info:
[ 54.522947] active_anon:788399 inactive_anon:33532 isolated_anon:0
active_file:83 inactive_file:37 isolated_file:0
unevictable:1 dirty:10 writeback:0 unstable:0
slab_reclaimable:5166 slab_unreclaimable:13868
mapped:5646 shmem:9752 pagetables:4476 bounce:0
free:7576 free_pcp:0 free_cma:0
[ 54.522948] Node 0 DMA free:15476kB min:28kB low:32kB high:40kB active_anon:144kB inactive_anon:216kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15888kB mlocked:0kB dirty:0kB writeback:0kB mapped:80kB shmem:80kB slab_reclaimable:0kB slab_unreclaimable:48kB kernel_stack:0kB pagetables:4kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 54.522951] lowmem_reserve[]: 0 2072 3862 3862
[ 54.522952] Node 0 DMA32 free:11220kB min:4204kB low:5252kB high:6304kB active_anon:1711968kB inactive_anon:80964kB active_file:236kB inactive_file:100kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2206296kB managed:2125964kB mlocked:0kB dirty:36kB writeback:0kB mapped:17948kB shmem:26240kB slab_reclaimable:8988kB slab_unreclaimable:26036kB kernel_stack:2656kB pagetables:9348kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:3776 all_unreclaimable? yes
[ 54.522955] lowmem_reserve[]: 0 0 1790 1790
[ 54.522956] Node 0 Normal free:3608kB min:3628kB low:4532kB high:5440kB active_anon:1441484kB inactive_anon:52948kB active_file:96kB inactive_file:48kB unevictable:4kB isolated(anon):0kB isolated(file):0kB present:1900544kB managed:1833172kB mlocked:4kB dirty:4kB writeback:0kB mapped:4556kB shmem:12688kB slab_reclaimable:11676kB slab_unreclaimable:29388kB kernel_stack:2448kB pagetables:8552kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:924 all_unreclaimable? yes
[ 54.522958] lowmem_reserve[]: 0 0 0 0
[ 54.522959] Node 0 DMA: 7*4kB (UME) 3*8kB (UM) 4*16kB (UME) 4*32kB (UME) 2*64kB (U) 4*128kB (UME) 1*256kB (E) 2*512kB (ME) 3*1024kB (UME) 1*2048kB (E) 2*4096kB (M) = 15476kB
[ 54.522965] Node 0 DMA32: 118*4kB (UME) 36*8kB (UME) 62*16kB (UME) 94*32kB (UME) 34*64kB (UME) 24*128kB (UME) 5*256kB (UE) 1*512kB (U) 0*1024kB 0*2048kB 0*4096kB = 11800kB
[ 54.522969] Node 0 Normal: 151*4kB (UME) 39*8kB (UME) 77*16kB (UME) 38*32kB (UME) 9*64kB (ME) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3940kB
[ 54.522974] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 54.522974] Node 0 hugepages_total=256 hugepages_free=256 hugepages_surp=0 hugepages_size=2048kB
[ 54.522975] 9932 total pagecache pages
[ 54.522976] 0 pages in swap cache
[ 54.522976] Swap cache stats: add 1831590, delete 1831590, find 5929/10969
[ 54.522977] Free swap = 0kB
[ 54.522977] Total swap = 0kB
[ 54.522978] 1030706 pages RAM
[ 54.522978] 0 pages HighMem/MovableOnly
[ 54.522979] 36950 pages reserved
[ 54.522979] 0 pages cma reserved
[ 54.522979] 0 pages hwpoisoned
[ 54.522980] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[ 54.522986] [ 285] 0 285 10173 1022 23 3 0 0 systemd-journal
[ 54.522988] [ 312] 0 312 11192 266 22 3 0 -1000 systemd-udevd
[ 54.522989] [ 623] 100 623 25590 569 20 4 6 0 systemd-timesyn
[ 54.522990] [ 823] 0 823 5859 1723 14 3 0 0 dhclient
[ 54.522991] [ 917] 0 917 7152 96 18 3 2 0 systemd-logind
[ 54.522992] [ 936] 0 936 6310 223 16 3 0 0 smartd
[ 54.522993] [ 943] 0 943 112847 523 72 3 9 0 NetworkManager
[ 54.522993] [ 952] 0 952 84334 421 68 4 0 0 ModemManager
[ 54.522994] [ 957] 0 957 4797 40 15 4 0 0 atd
[ 54.522995] [ 961] 115 961 93456 912 80 4 0 0 whoopsie
[ 54.522996] [ 963] 0 963 4865 65 13 3 0 0 irqbalance
[ 54.522997] [ 964] 104 964 65667 224 30 4 9 0 rsyslogd
[ 54.522998] [ 966] 0 966 23282 34 13 3 0 0 lxcfs
[ 54.522999] [ 971] 105 971 10926 318 26 3 8 -900 dbus-daemon
[ 54.523000] [ 1008] 0 1008 9570 82 25 3 0 0 cgmanager
[ 54.523001] [ 1016] 0 1016 70808 240 41 3 0 0 accounts-daemon
[ 54.523002] [ 1019] 0 1019 1119 46 8 3 0 0 ondemand
[ 54.523003] [ 1022] 0 1022 7233 68 20 3 0 0 cron
[ 54.523004] [ 1028] 109 1028 11218 97 26 3 3 0 avahi-daemon
[ 54.523005] [ 1030] 0 1030 1807 20 10 3 0 0 sleep
[ 54.523006] [ 1037] 109 1037 11185 82 25 3 0 0 avahi-daemon
[ 54.523007] [ 1047] 0 1047 141966 2188 156 4 3 0 libvirtd
[ 54.523008] [ 1053] 0 1053 13902 163 33 3 0 -1000 sshd
[ 54.523009] [ 1057] 0 1057 69683 586 40 3 12 0 polkitd
[ 54.523010] [ 1072] 0 1072 10963 134 24 3 0 0 wpa_supplicant
[ 54.523011] [ 1081] 0 1081 87582 696 39 3 23 0 lightdm
[ 54.523012] [ 1088] 0 1088 99946 6138 97 3 15 0 Xorg
[ 54.523012] [ 1111] 0 1111 1099 45 8 3 0 0 acpid
[ 54.523013] [ 1125] 0 1125 56533 191 47 4 14 0 lightdm
[ 54.523014] [ 1129] 114 1129 11957 850 27 3 0 0 systemd
[ 54.523015] [ 1130] 114 1130 15825 501 33 3 0 0 (sd-pam)
[ 54.523029] [ 1136] 114 1136 30728 108 26 4 0 0 gnome-keyring-d
[ 54.523030] [ 1138] 114 1138 1119 20 8 3 0 0 lightdm-greeter
[ 54.523031] [ 1143] 114 1143 10743 145 25 3 13 0 dbus-daemon
[ 54.523032] [ 1144] 114 1144 227063 2039 170 4 17 0 unity-greeter
[ 54.523032] [ 1146] 114 1146 84488 626 34 3 0 0 at-spi-bus-laun
[ 54.523033] [ 1151] 114 1151 10680 97 27 4 0 0 dbus-daemon
[ 54.523034] [ 1153] 114 1153 51706 157 37 3 3 0 at-spi2-registr
[ 54.523035] [ 1159] 114 1159 68584 154 37 3 0 0 gvfsd
[ 54.523036] [ 1164] 114 1164 85325 145 32 3 0 0 gvfsd-fuse
[ 54.523037] [ 1174] 114 1174 44626 121 23 3 3 0 dconf-service
[ 54.523038] [ 1197] 0 1197 20665 147 44 3 0 0 lightdm
[ 54.523038] [ 1201] 114 1201 11465 160 27 3 0 0 upstart
[ 54.523039] [ 1204] 114 1204 144936 1323 136 4 4 0 nm-applet
[ 54.523040] [ 1206] 114 1206 88647 256 41 3 26 0 indicator-messa
[ 54.523041] [ 1207] 114 1207 83323 127 31 3 0 0 indicator-bluet
[ 54.523042] [ 1208] 114 1208 122044 98 37 4 12 0 indicator-power
[ 54.523043] [ 1209] 114 1209 132868 439 75 3 0 0 indicator-datet
[ 54.523044] [ 1210] 114 1210 140272 1504 127 4 1 0 indicator-keybo
[ 54.523045] [ 1211] 114 1211 134142 426 68 4 8 0 indicator-sound
[ 54.523045] [ 1212] 114 1212 189042 260 47 4 0 0 indicator-sessi
[ 54.523046] [ 1218] 114 1218 117391 350 89 4 0 0 indicator-appli
[ 54.523047] [ 1232] 0 1232 7973 81 20 3 11 0 bluetoothd
[ 54.523048] [ 1238] 114 1238 152474 1084 129 3 15 0 unity-settings-
[ 54.523049] [ 1261] 114 1261 104039 719 78 4 0 0 pulseaudio
[ 54.523050] [ 1272] 120 1272 45874 77 24 3 1 0 rtkit-daemon
[ 54.523051] [ 1293] 0 1293 68995 324 53 3 12 0 upowerd
[ 54.523052] [ 1296] 114 1296 15493 366 33 3 0 0 gconfd-2
[ 54.523053] [ 1342] 110 1342 75254 1170 49 3 0 0 colord
[ 54.523054] [ 1429] 113 1429 12484 98 27 3 0 0 dnsmasq
[ 54.523054] [ 1430] 0 1430 12477 94 27 3 0 0 dnsmasq
[ 54.523055] [ 1514] 0 1514 22408 226 49 3 0 0 sshd
[ 54.523056] [ 1570] 1000 1570 11958 853 26 3 0 0 systemd
[ 54.523057] [ 1571] 1000 1571 15825 501 33 3 0 0 (sd-pam)
[ 54.523058] [ 1631] 1000 1631 22408 244 46 3 0 0 sshd
[ 54.523058] [ 1632] 1000 1632 5779 619 16 3 0 0 bash
[ 54.523059] [ 1692] 118 1692 11320 77 25 3 14 0 kerneloops
[ 54.523060] [ 1745] 0 1745 3964 41 13 3 0 0 agetty
[ 54.523061] [ 1768] 125 1768 13192 98 27 3 0 0 dnsmasq
[ 54.523062] [ 2276] 126 2276 32160 388 58 3 0 0 exim4
[ 54.523062] [ 8310] 1000 8310 5508 661 14 3 0 0 stress-ng
[ 54.523063] [ 8311] 1000 8311 5508 49 13 3 0 0 stress-ng-brk
[ 54.523064] [ 8312] 1000 8312 5508 46 13 3 0 0 stress-ng-brk
[ 54.523065] [ 8313] 1000 8313 5508 46 13 3 0 0 stress-ng-brk
[ 54.523065] [ 8314] 1000 8314 5508 46 13 3 0 0 stress-ng-brk
[ 54.523066] [ 8321] 1000 8321 365871 360407 717 4 0 0 stress-ng-brk
[ 54.523067] [ 8322] 1000 8322 239424 233959 470 3 0 0 stress-ng-brk
[ 54.523068] [ 8323] 1000 8323 143599 138152 283 3 0 0 stress-ng-brk
[ 54.523069] [ 8324] 1000 8324 54613 49145 109 3 0 0 stress-ng-brk
[ 54.523070] Out of memory: Kill process 8321 (stress-ng-brk) score 363 or sacrifice child
[ 54.523072] Killed process 8321 (stress-ng-brk) total-vm:1463484kB, anon-rss:1441628kB, file-rss:0kB
但是,此消息可能已从内核日志中清除,因此可能需要检查内核日志/var/log/kern.log*
Linux的默认虚拟内存设置是过量使用内存。这意味着内核将允许一个人分配比可用内存更多的内存,从而使进程能够在内存中映射较大的区域,因为通常不会使用分配中的所有页面。但是,有时某个进程将读取/写入所有超额提交的页面,而内核无法提供足够的物理内存+交换,因此OOM杀手试图找到最佳的候选超额提交进程并将其杀死。
因此,如果您想立即查看内核日志,该作业将被终止,请使用以下bash脚本将其包装:
#!/bin/bash
your_job_here
ret=$?
#
# returns > 127 are a SIGNAL
#
if [ $ret -gt 127 ]; then
sig=$((ret - 128))
echo "Got SIGNAL $sig"
if [ $sig -eq $(kill -l SIGKILL) ]; then
echo "process was killed with SIGKILL"
dmesg > $HOME/dmesg-kill.log
fi
fi
注意:“ your_job_here”是要运行的程序/作业的名称。该脚本检查程序的返回码,并检查它是否被SIGKILL杀死,如果是,则将dmesg随后立即转储到名为dmesg-kill.log的主目录中。
希望能有所帮助
dmesg | less
,dmesg | grep -i kill
可能会更有用。因此,grep /var/log/kern.log* -ie kill
。