什么使用4GB内存?(不是缓存,不是进程,不是平板,不是shm)


10

我们有些EC2服务器会在几天或几周内发生内存泄漏。最终,将使用许多GB的内存(根据诸如free和的工具htop),如果不重新启动服务器,我们的进程将开始被OOM杀死。

一台这样的服务器具有15GB的内存。这是的输出free -m

             total       used       free     shared    buffers     cached
Mem:         15039       3921      11118          0          0          7
-/+ buffers/cache:       3913      11126
Swap:            0          0          0

该服务器处于空闲状态;我杀死了大多数用户区进程。htop中没有进程显示> 100k VIRT。我最近跑步了echo 3 > /proc/sys/vm/drop_caches,没有任何效果(这就是为什么buffers,它cached是如此之小)。另外:

  • 随便看看/proc/slabinfoslabtop并没有显示出任何希望
  • / run / shm中没有任何内容

这是的输出cat /proc/meminfo

MemTotal:       15400880 kB
MemFree:        11385688 kB
Buffers:             564 kB
Cached:             7792 kB
SwapCached:            0 kB
Active:            27668 kB
Inactive:           2012 kB
Active(anon):      21368 kB
Inactive(anon):      380 kB
Active(file):       6300 kB
Inactive(file):     1632 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:         21380 kB
Mapped:             7208 kB
Shmem:               380 kB
Slab:              39260 kB
SReclaimable:      16456 kB
SUnreclaim:        22804 kB
KernelStack:        1352 kB
PageTables:         2872 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     7700440 kB
Committed_AS:      39072 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       30336 kB
VmallocChunk:   34359691552 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       36864 kB
DirectMap2M:    15822848 kB

您会发现两者之间存在很大的差距,MemFreeMemTotal其他meminfo指标并未对此进行解释。

您知道该内存的去向,或者如何进一步调试吗?

更多服务器信息(如果相关):

$ lsb_release -d
Description:    Ubuntu 14.04.1 LTS
$ uname -r
3.13.0-36-generic

更新:这是更多命令及其输出:

# dmesg | fgrep 'Memory:'
[    0.000000] Memory: 15389980K/15728244K available (7373K kernel code, 1144K rwdata, 3404K rodata, 1336K init, 1440K bss, 338264K reserved)

# awk '{print $2 " " $1}' /proc/modules  | sort -nr | head -5
106678 psmouse
97812 raid6_pq
86484 raid456
69418 floppy
55624 aesni_intel

# cat /proc/mounts | grep tmp
udev /dev devtmpfs rw,relatime,size=7695004k,nr_inodes=1923751,mode=755 0 0
tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=1540088k,mode=755 0 0
none /sys/fs/cgroup tmpfs rw,relatime,size=4k,mode=755 0 0
none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
none /run/user tmpfs rw,nosuid,nodev,noexec,relatime,size=102400k,mode=755 0 0
# df -h /dev /run /sys/fs/cgroup /run/lock /run/shm /run/user
Filesystem      Size  Used Avail Use% Mounted on
udev            7.4G   12K  7.4G   1% /dev
tmpfs           1.5G  368K  1.5G   1% /run
none            4.0K     0  4.0K   0% /sys/fs/cgroup
none            5.0M     0  5.0M   0% /run/lock
none            7.4G     0  7.4G   0% /run/shm
none            100M     0  100M   0% /run/user

更新2:这是的全部输出ps aux

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  33636  2368 ?        Ss    2015   0:03 /sbin/init
root         2  0.0  0.0      0     0 ?        S     2015   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S     2015   0:11 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S<    2015   0:00 [kworker/0:0H]
root         7  0.0  0.0      0     0 ?        S     2015   1:31 [rcu_sched]
root         8  0.0  0.0      0     0 ?        S     2015   0:30 [rcuos/0]
root         9  0.0  0.0      0     0 ?        S     2015   0:25 [rcuos/1]
root        10  0.0  0.0      0     0 ?        S     2015   0:33 [rcuos/2]
root        11  0.0  0.0      0     0 ?        S     2015   0:25 [rcuos/3]
root        12  0.0  0.0      0     0 ?        S     2015   0:14 [rcuos/4]
root        13  0.0  0.0      0     0 ?        S     2015   0:14 [rcuos/5]
root        14  0.0  0.0      0     0 ?        S     2015   0:14 [rcuos/6]
root        15  0.0  0.0      0     0 ?        S     2015   0:33 [rcuos/7]
root        16  0.0  0.0      0     0 ?        S     2015   0:00 [rcuos/8]
root        17  0.0  0.0      0     0 ?        S     2015   0:00 [rcuos/9]
root        18  0.0  0.0      0     0 ?        S     2015   0:00 [rcuos/10]
root        19  0.0  0.0      0     0 ?        S     2015   0:00 [rcuos/11]
root        20  0.0  0.0      0     0 ?        S     2015   0:00 [rcuos/12]
root        21  0.0  0.0      0     0 ?        S     2015   0:00 [rcuos/13]
root        22  0.0  0.0      0     0 ?        S     2015   0:00 [rcuos/14]
root        23  0.0  0.0      0     0 ?        S     2015   0:00 [rcu_bh]
root        24  0.0  0.0      0     0 ?        S     2015   0:00 [rcuob/0]
root        25  0.0  0.0      0     0 ?        S     2015   0:00 [rcuob/1]
root        26  0.0  0.0      0     0 ?        S     2015   0:00 [rcuob/2]
root        27  0.0  0.0      0     0 ?        S     2015   0:00 [rcuob/3]
root        28  0.0  0.0      0     0 ?        S     2015   0:00 [rcuob/4]
root        29  0.0  0.0      0     0 ?        S     2015   0:00 [rcuob/5]
root        30  0.0  0.0      0     0 ?        S     2015   0:00 [rcuob/6]
root        31  0.0  0.0      0     0 ?        S     2015   0:00 [rcuob/7]
root        32  0.0  0.0      0     0 ?        S     2015   0:00 [rcuob/8]
root        33  0.0  0.0      0     0 ?        S     2015   0:00 [rcuob/9]
root        34  0.0  0.0      0     0 ?        S     2015   0:00 [rcuob/10]
root        35  0.0  0.0      0     0 ?        S     2015   0:00 [rcuob/11]
root        36  0.0  0.0      0     0 ?        S     2015   0:00 [rcuob/12]
root        37  0.0  0.0      0     0 ?        S     2015   0:00 [rcuob/13]
root        38  0.0  0.0      0     0 ?        S     2015   0:00 [rcuob/14]
root        39  0.0  0.0      0     0 ?        S     2015   0:01 [migration/0]
root        40  0.0  0.0      0     0 ?        S     2015   0:06 [watchdog/0]
root        41  0.0  0.0      0     0 ?        S     2015   0:05 [watchdog/1]
root        42  0.0  0.0      0     0 ?        S     2015   0:00 [migration/1]
root        43  0.0  0.0      0     0 ?        S     2015   0:08 [ksoftirqd/1]
root        45  0.0  0.0      0     0 ?        S<    2015   0:00 [kworker/1:0H]
root        46  0.0  0.0      0     0 ?        S     2015   0:05 [watchdog/2]
root        47  0.0  0.0      0     0 ?        S     2015   0:01 [migration/2]
root        48  0.0  0.0      0     0 ?        S     2015   0:08 [ksoftirqd/2]
root        50  0.0  0.0      0     0 ?        S<    2015   0:00 [kworker/2:0H]
root        51  0.0  0.0      0     0 ?        S     2015   0:06 [watchdog/3]
root        52  0.0  0.0      0     0 ?        S     2015   0:01 [migration/3]
root        53  0.0  0.0      0     0 ?        S     2015   0:17 [ksoftirqd/3]
root        55  0.0  0.0      0     0 ?        S<    2015   0:00 [kworker/3:0H]
root        56  0.0  0.0      0     0 ?        S     2015   0:07 [watchdog/4]
root        57  0.0  0.0      0     0 ?        S     2015   0:01 [migration/4]
root        58  0.0  0.0      0     0 ?        S     2015   0:02 [ksoftirqd/4]
root        60  0.0  0.0      0     0 ?        S<    2015   0:00 [kworker/4:0H]
root        61  0.0  0.0      0     0 ?        S     2015   0:06 [watchdog/5]
root        62  0.0  0.0      0     0 ?        S     2015   0:01 [migration/5]
root        63  0.0  0.0      0     0 ?        S     2015   0:07 [ksoftirqd/5]
root        65  0.0  0.0      0     0 ?        S<    2015   0:00 [kworker/5:0H]
root        66  0.0  0.0      0     0 ?        S     2015   0:06 [watchdog/6]
root        67  0.0  0.0      0     0 ?        S     2015   0:01 [migration/6]
root        68  0.0  0.0      0     0 ?        S     2015   0:04 [ksoftirqd/6]
root        70  0.0  0.0      0     0 ?        S<    2015   0:00 [kworker/6:0H]
root        71  0.0  0.0      0     0 ?        S     2015   0:06 [watchdog/7]
root        72  0.0  0.0      0     0 ?        S     2015   0:02 [migration/7]
root        73  0.0  0.0      0     0 ?        S     2015   0:17 [ksoftirqd/7]
root        74  0.0  0.0      0     0 ?        S     2015   0:14 [kworker/7:0]
root        75  0.0  0.0      0     0 ?        S<    2015   0:00 [kworker/7:0H]
root        76  0.0  0.0      0     0 ?        S<    2015   0:00 [khelper]
root        77  0.0  0.0      0     0 ?        S     2015   0:00 [kdevtmpfs]
root        78  0.0  0.0      0     0 ?        S<    2015   0:00 [netns]
root        79  0.0  0.0      0     0 ?        S     2015   0:00 [xenwatch]
root        80  0.0  0.0      0     0 ?        S     2015   0:00 [xenbus]
root        81  0.0  0.0      0     0 ?        S     2015   0:39 [kworker/0:1]
root        82  0.0  0.0      0     0 ?        S<    2015   0:00 [writeback]
root        83  0.0  0.0      0     0 ?        S<    2015   0:00 [kintegrityd]
root        84  0.0  0.0      0     0 ?        S<    2015   0:00 [bioset]
root        86  0.0  0.0      0     0 ?        S<    2015   0:00 [kblockd]
root        88  0.0  0.0      0     0 ?        S<    2015   0:00 [ata_sff]
root        89  0.0  0.0      0     0 ?        S     2015   0:00 [khubd]
root        90  0.0  0.0      0     0 ?        S<    2015   0:00 [md]
root        91  0.0  0.0      0     0 ?        S<    2015   0:00 [devfreq_wq]
root        92  0.0  0.0      0     0 ?        S     2015   0:12 [kworker/1:1]
root        95  0.0  0.0      0     0 ?        S     2015   0:10 [kworker/4:1]
root        97  0.0  0.0      0     0 ?        S     2015   0:11 [kworker/6:1]
root        99  0.0  0.0      0     0 ?        S     2015   0:00 [khungtaskd]
root       100  0.0  0.0      0     0 ?        S     2015   7:26 [kswapd0]
root       101  0.0  0.0      0     0 ?        SN    2015   0:00 [ksmd]
root       102  0.0  0.0      0     0 ?        SN    2015   0:29 [khugepaged]
root       103  0.0  0.0      0     0 ?        S     2015   0:00 [fsnotify_mark]
root       104  0.0  0.0      0     0 ?        S     2015   0:00 [ecryptfs-kthrea]
root       105  0.0  0.0      0     0 ?        S<    2015   0:00 [crypto]
root       117  0.0  0.0      0     0 ?        S<    2015   0:00 [kthrotld]
root       119  0.0  0.0      0     0 ?        S     2015   0:00 [scsi_eh_0]
root       120  0.0  0.0      0     0 ?        S     2015   0:00 [scsi_eh_1]
root       141  0.0  0.0      0     0 ?        S<    2015   0:00 [deferwq]
root       142  0.0  0.0      0     0 ?        S<    2015   0:00 [charger_manager]
root       199  0.0  0.0      0     0 ?        S<    2015   0:00 [kpsmoused]
root       223  0.0  0.0      0     0 ?        S<    2015   0:00 [bioset]
root       265  0.0  0.0      0     0 ?        S<    2015   0:00 [raid5wq]
root       291  0.0  0.0      0     0 ?        S     2015   0:22 [jbd2/xvda1-8]
root       292  0.0  0.0      0     0 ?        S<    2015   0:00 [ext4-rsv-conver]
root       445  0.0  0.0      0     0 ?        S     2015   0:16 [jbd2/md0-8]
root       446  0.0  0.0      0     0 ?        S<    2015   0:00 [ext4-rsv-conver]
root       516  0.0  0.0  19604   564 ?        S     2015   0:00 upstart-udev-bridge --daemon
root       522  0.0  0.0  49864  1048 ?        Ss    2015   0:00 /lib/systemd/systemd-udevd --daemon
root       671  0.0  0.0  15256   408 ?        S     2015   0:00 upstart-socket-bridge --daemon
root       800  0.0  0.0  10220  2900 ?        Ss    2015   0:00 dhclient -1 -v -pf /run/dhclient.eth0.pid -l
message+  1048  0.0  0.0  39224  1048 ?        Ss    2015   0:00 dbus-daemon --system --fork
root      1077  0.0  0.0      0     0 ?        S    Jan04   0:00 [kworker/u30:2]
root      1082  0.0  0.0  43448  1196 ?        Ss    2015   0:00 /lib/systemd/systemd-logind
root      1116  0.0  0.0  15272   512 ?        S     2015   0:00 upstart-file-bridge --daemon
root      1339  0.0  0.0  14536   412 tty4     Ss+   2015   0:00 /sbin/getty -8 38400 tty4
root      1344  0.0  0.0  14536   416 tty5     Ss+   2015   0:00 /sbin/getty -8 38400 tty5
root      1360  0.0  0.0  14536   408 tty2     Ss+   2015   0:00 /sbin/getty -8 38400 tty2
root      1361  0.0  0.0  14536   416 tty3     Ss+   2015   0:00 /sbin/getty -8 38400 tty3
root      1363  0.0  0.0  14536   404 tty6     Ss+   2015   0:00 /sbin/getty -8 38400 tty6
root      1418  0.0  0.0  61364  1296 ?        Ss    2015   0:07 /usr/sbin/sshd -D
root      1432  0.0  0.0  23652   552 ?        Ss    2015   0:02 cron
daemon    1433  0.0  0.0  19136   180 ?        Ss    2015   0:00 atd
root      1461  0.0  0.0  19316   644 ?        Ss    2015   1:57 /usr/sbin/irqbalance
root      1518  0.0  0.0   4364   404 ?        Ss    2015   0:00 acpid -c /etc/acpi/events -s /var/run/acpid.
root      1521  0.0  0.0      0     0 ?        S     2015   0:00 [kworker/5:1]
root      1641  0.0  0.0      0     0 ?        S    Jan04   0:00 [kworker/u30:1]
root      1863  0.0  0.0  14536   404 tty1     Ss+   2015   0:00 /sbin/getty -8 38400 tty1
root      1864  0.0  0.0  12784   388 ttyS0    Ss+   2015   0:00 /sbin/getty -8 38400 ttyS0
ntp       2075  0.0  0.0  31448  1252 ?        Ss    2015   1:17 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 10
root      2087  0.0  0.0      0     0 ?        S     2015   0:00 [kauditd]
ubuntu    2393  0.0  0.0 105628  2028 ?        S    Jan04   0:00 sshd: ubuntu@notty
root      2496  0.0  0.0      0     0 ?        S    Jan04   0:00 [kworker/2:2]
root      2713  0.0  0.0      0     0 ?        S     2015   0:00 [kworker/6:2]
root      2722  0.0  0.0      0     0 ?        S     2015   0:12 [kworker/5:2]
root      3678  0.0  0.0      0     0 ?        S    Jan05   0:01 [kworker/0:0]
root      3716  0.0  0.0      0     0 ?        S    Jan05   0:00 [kworker/3:0]
root      3941  0.0  0.0      0     0 ?        S    Jan05   0:00 [kworker/2:0]
root      4732  0.0  0.0      0     0 ?        S    Jan05   0:00 [kworker/1:2]
root      6896  0.0  0.0 105628  4228 ?        Ss   08:00   0:00 sshd: ubuntu [priv]
ubuntu    7008  0.0  0.0 105628  1876 ?        S    08:00   0:00 sshd: ubuntu@pts/0
ubuntu    7014  0.0  0.0  21308  3908 pts/0    Ss   08:00   0:00 -bash
root      7234  0.0  0.0  63668  2096 pts/0    S    08:10   0:00 sudo su
root      7235  0.0  0.0  63248  1776 pts/0    S    08:10   0:00 su
root      7236  1.0  0.0  21088  3456 pts/0    S    08:10   0:00 bash
root      7248  0.0  0.0  17164  1320 pts/0    R+   08:10   0:00 ps aux
root     13299  0.0  0.0      0     0 ?        S     2015   0:19 [kworker/3:2]
root     19933  0.0  0.0      0     0 ?        S     2015   0:00 [kworker/7:1]
root     20305  0.0  0.0      0     0 ?        S     2015   0:00 [kworker/4:2]
root     29814  0.0  0.0      0     0 ?        S<   Jan04   0:00 [kworker/u31:2]
root     30693  0.0  0.0      0     0 ?        S<   Jan04   0:00 [kworker/u31:1]

2
您在某个地方安装了tmpfs吗?也许它已满,并且内存使用量“无缘无故”增长。什么dmesg | fgrep Memory:显示?

@siblenx可以通过看到df -h --type=tmpfs
Hauke Laging,

@silibnx感谢您的建议。我确实有一个tmpfs,但是它很小:df -h | grep tmpfstmpfs 1.5G 364K 1.5G 1% /run。至于dmesg:dmesg | fgrep 'Memory:'[ 0.000000] Memory: 15389980K/15728244K available (7373K kernel code, 1144K rwdata, 3404K rodata, 1336K init, 1440K bss, 338264K reserved)
Caleb Spare

1
我相信内核中的某些东西正在消耗您的内存,也许是模块。检查unix.stackexchange.com/questions/97261/…是否有其他内存调试技术,并通过在/ proc / mounts中找到tmpfs条目来确保没有其他tmpfs挂载和填充。

1
@Caleb:您是否认为4GB可能是内核根据您的总内存认为适当的inode缓存大小的空间。根据我昨天在内核源代码(fs / inode.c)中所做的一些验证,引导内核时,分配给inode缓存的数量与总内存成比例。
Julie Pelletier

Answers:


2

内存泄漏可能是一个真正的痛苦,并且在大规模系统上进行跟踪非常令人沮丧。我会尝试将整个服务器复制到一个测试环境中,一次启动一个服务以找出问题所在。

在分别检查了每个服务(用户模式进程)后,仍然无法找到泄漏源之后,应该检查内核。与内核打交道需要时间和经验丰富的经验,我建议咨询内核专家。

另一种可能性是存在恶意软件。处理恶意软件是一种完全不同的操作。

有时没有捷径:\

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.