我有一台运行CentOS 7的Web服务器,经过几周的正常运行时间后,systemd进程在该服务器上使用了近4 GB的RAM。RAM使用量稳定增长,每天约200MB。该进程以及诸如systemd-logind和dbus-daemon之类的相关进程在很多时候也使用大量的CPU。我的其他使用“ init”代替systemd的CentOS 6服务器没有这种资源使用情况。
在下面的顶部示例中,在不运行其他进程的正常Web服务期间,systemd,systemd-logind,systemd-journal和dbus-daemon总共使用了10.7%的四核CPU,而systemd消耗了19%的四核CPU。系统的16GB RAM。这不是正常现象,经过搜索后,我再也没有发现其他人遇到这个问题。是什么导致这种资源浪费?任何建议,将不胜感激。
空闲期间从顶部输出(Web服务除外):
top - 08:51:31 up 16 days, 13:43, 2 users, load average: 1.84, 1.39, 1.07
Tasks: 297 total, 2 running, 295 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.6 us, 3.6 sy, 0.0 ni, 90.6 id, 0.1 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 16212992 total, 2466564 free, 4275764 used, 9470664 buff/cache
KiB Swap: 4194300 total, 4070740 free, 123560 used. 10707392 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
743 dbus 20 0 27104 1856 1152 S 3.3 0.0 304:27.19 dbus-daemon
1 root 20 0 3247784 2.920g 1800 S 3.0 18.9 287:41.35 systemd
737 root 20 0 27416 2524 1304 S 2.7 0.0 225:32.66 systemd-logind
736 root 20 0 434760 3756 3076 S 2.0 0.0 172:26.53 NetworkManager
548 root 20 0 82276 34652 34516 S 1.7 0.2 160:20.16 systemd-journal
770 polkitd 20 0 522920 2956 2248 S 1.7 0.0 120:06.11 polkitd
716 root 16 -4 116744 1368 1312 S 1.3 0.0 93:26.54 auditd
3778 nginx 20 0 446488 14688 6564 S 1.3 0.1 2:18.80 php-fpm
3847 nginx 20 0 446316 14588 6548 S 1.3 0.1 2:19.29 php-fpm
7000 nginx 20 0 446132 14400 6544 S 1.3 0.1 1:22.77 php-fpm
14862 nginx 20 0 446304 14600 6580 S 1.3 0.1 1:32.25 php-fpm
30333 nginx 20 0 446292 14468 6528 S 1.3 0.1 1:40.78 php-fpm
740 root 20 0 784980 20112 19696 S 1.0 0.1 76:12.69 rsyslogd
3521 nginx 20 0 446188 14848 6748 S 1.0 0.1 2:20.00 php-fpm
3687 nginx 20 0 446036 14688 6764 S 1.0 0.1 2:20.45 php-fpm
3689 nginx 20 0 446408 14604 6552 S 1.0 0.1 2:19.75 php-fpm
3774 nginx 20 0 446288 14568 6552 S 1.0 0.1 2:19.68 php-fpm
3836 nginx 20 0 447416 15572 6564 S 1.0 0.1 2:21.06 php-fpm
4861 nginx 20 0 446260 14576 6540 S 1.0 0.1 2:18.94 php-fpm
4862 nginx 20 0 446508 15084 6764 S 1.0 0.1 2:20.71 php-fpm
13538 nginx 20 0 447204 15452 6572 S 1.0 0.1 1:32.33 php-fpm
15530 nginx 20 0 446292 14520 6528 S 1.0 0.1 1:32.55 php-fpm
28468 nginx 20 0 446356 14672 6568 S 1.0 0.1 1:42.21 php-fpm
29564 nginx 20 0 446292 14536 6548 S 1.0 0.1 1:41.11 php-fpm
30851 nginx 20 0 445956 14568 6748 S 1.0 0.1 1:49.66 php-fpm
编辑2-14-16
我可能已经在“ sudo journalctl”的输出中找到了一些相关的东西(见下文)。关于来自我的其他生产服务器之一的SSH连接,每小时有很多行每隔几个小时出现一次。这些是rsync进程,用于将文件从远程服务器传输到有问题的服务器。结果证明了systemd,systemd-logind,NetworkManager和systemd-journal的CPU使用率。
但是,这不能解释内存泄漏,这是最大的问题。自几天前最初撰写此帖子以来,systemd的系统内存使用率已从18.9%增加到21.4%。
下面的日志已被修改以替换服务器的真实域名和IP地址。
Feb 14 10:02:13 hostname.domain.com systemd-logind[737]: New session 6467482 of user tropicg9.
Feb 14 10:02:13 hostname.domain.com systemd[1]: Started Session 6467482 of user tropicg9.
Feb 14 10:02:13 hostname.domain.com systemd[1]: Starting Session 6467482 of user tropicg9.
Feb 14 10:02:13 hostname.domain.com sshd[9665]: pam_unix(sshd:session): session opened for user tropicg9 by (uid=0)
Feb 14 10:02:13 hostname.domain.com sshd[9667]: Received disconnect from 1.2.3.4: 11: disconnected by user
Feb 14 10:02:13 hostname.domain.com sshd[9665]: pam_unix(sshd:session): session closed for user tropicg9
Feb 14 10:02:13 hostname.domain.com systemd-logind[737]: Removed session 6467482.
Feb 14 10:02:14 hostname.domain.com sshd[9728]: Accepted publickey for tropicg9 from 1.2.3.4 port 45289 ssh2: RSA 0b:
Feb 14 10:02:14 hostname.domain.com systemd-logind[737]: New session 6467483 of user tropicg9.
Feb 14 10:02:14 hostname.domain.com systemd[1]: Started Session 6467483 of user tropicg9.
Feb 14 10:02:14 hostname.domain.com systemd[1]: Starting Session 6467483 of user tropicg9.
Feb 14 10:02:14 hostname.domain.com sshd[9728]: pam_unix(sshd:session): session opened for user tropicg9 by (uid=0)
Feb 14 10:02:14 hostname.domain.com sshd[9735]: Received disconnect from 1.2.3.4: 11: disconnected by user
Feb 14 10:02:14 hostname.domain.com sshd[9728]: pam_unix(sshd:session): session closed for user tropicg9
Feb 14 10:02:14 hostname.domain.com systemd-logind[737]: Removed session 6467483.
Feb 14 10:02:15 hostname.domain.com sshd[9876]: Accepted publickey for tropicg9 from 1.2.3.4 port 45290 ssh2: RSA 0b:
Feb 14 10:02:15 hostname.domain.com systemd-logind[737]: New session 6467484 of user tropicg9.
Feb 14 10:02:15 hostname.domain.com systemd[1]: Started Session 6467484 of user tropicg9.
Feb 14 10:02:15 hostname.domain.com systemd[1]: Starting Session 6467484 of user tropicg9.
Feb 14 10:02:15 hostname.domain.com sshd[9876]: pam_unix(sshd:session): session opened for user tropicg9 by (uid=0)
Feb 14 10:02:15 hostname.domain.com sshd[9883]: Received disconnect from 1.2.3.4: 11: disconnected by user
Feb 14 10:02:15 hostname.domain.com sshd[9876]: pam_unix(sshd:session): session closed for user tropicg9
Feb 14 10:02:15 hostname.domain.com systemd-logind[737]: Removed session 6467484.
Feb 14 10:02:20 hostname.domain.com sshd[10333]: Accepted publickey for tropicg9 from 1.2.3.4 port 45291 ssh2: RSA 0b
Feb 14 10:02:20 hostname.domain.com systemd-logind[737]: New session 6467485 of user tropicg9.
Feb 14 10:02:20 hostname.domain.com systemd[1]: Started Session 6467485 of user tropicg9.
Feb 14 10:02:20 hostname.domain.com systemd[1]: Starting Session 6467485 of user tropicg9.
Feb 14 10:02:20 hostname.domain.com sshd[10333]: pam_unix(sshd:session): session opened for user tropicg9 by (uid=0)
Feb 14 10:02:20 hostname.domain.com sshd[10342]: Received disconnect from 1.2.3.4: 11: disconnected by user
Feb 14 10:02:20 hostname.domain.com sshd[10333]: pam_unix(sshd:session): session closed for user tropicg9
Feb 14 10:02:20 hostname.domain.com systemd-logind[737]: Removed session 6467485.
Feb 14 10:02:21 hostname.domain.com sshd[10450]: Accepted publickey for tropicg9 from 1.2.3.4 port 45292 ssh2: RSA 0b
Feb 14 10:02:21 hostname.domain.com systemd-logind[737]: New session 6467486 of user tropicg9.
Feb 14 10:02:21 hostname.domain.com systemd[1]: Started Session 6467486 of user tropicg9.
Feb 14 10:02:21 hostname.domain.com systemd[1]: Starting Session 6467486 of user tropicg9.
Feb 14 10:02:21 hostname.domain.com sshd[10450]: pam_unix(sshd:session): session opened for user tropicg9 by (uid=0)
Feb 14 10:02:21 hostname.domain.com sshd[10457]: Received disconnect from 1.2.3.4: 11: disconnected by user
Feb 14 10:02:21 hostname.domain.com sshd[10450]: pam_unix(sshd:session): session closed for user tropicg9
Feb 14 10:02:21 hostname.domain.com systemd-logind[737]: Removed session 6467486.
Feb 14 10:02:22 hostname.domain.com sshd[10473]: Accepted publickey for tropicg9 from 1.2.3.4 port 45293 ssh2: RSA 0b
Feb 14 10:02:22 hostname.domain.com systemd-logind[737]: New session 6467487 of user tropicg9.
Feb 14 10:02:22 hostname.domain.com systemd[1]: Started Session 6467487 of user tropicg9.
Feb 14 10:02:22 hostname.domain.com systemd[1]: Starting Session 6467487 of user tropicg9.
Feb 14 10:02:22 hostname.domain.com sshd[10473]: pam_unix(sshd:session): session opened for user tropicg9 by (uid=0)
Feb 14 10:02:22 hostname.domain.com sshd[10475]: Received disconnect from 1.2.3.4: 11: disconnected by user
Feb 14 10:02:22 hostname.domain.com sshd[10473]: pam_unix(sshd:session): session closed for user tropicg9
Feb 14 10:02:22 hostname.domain.com systemd-logind[737]: Removed session 6467487.
Feb 14 10:02:23 hostname.domain.com sshd[10484]: Accepted publickey for tropicg9 from 1.2.3.4 port 45294 ssh2: RSA 0b
Feb 14 10:02:23 hostname.domain.com systemd-logind[737]: New session 6467488 of user tropicg9.
Feb 14 10:02:23 hostname.domain.com systemd[1]: Started Session 6467488 of user tropicg9.
Feb 14 10:02:23 hostname.domain.com systemd[1]: Starting Session 6467488 of user tropicg9.
Feb 14 10:02:23 hostname.domain.com sshd[10484]: pam_unix(sshd:session): session opened for user tropicg9 by (uid=0)
Feb 14 10:02:23 hostname.domain.com sshd[10486]: Received disconnect from 1.2.3.4: 11: disconnected by user
Feb 14 10:02:23 hostname.domain.com sshd[10484]: pam_unix(sshd:session): session closed for user tropicg9
Feb 14 10:02:23 hostname.domain.com systemd-logind[737]: Removed session 6467488.
Feb 14 10:02:39 hostname.domain.com sshd[10654]: Accepted publickey for tropicg9 from 1.2.3.4 port 45295 ssh2: RSA 0b
Feb 14 10:02:39 hostname.domain.com systemd[1]: Started Session 6467489 of user tropicg9.
Feb 14 10:02:39 hostname.domain.com systemd-logind[737]: New session 6467489 of user tropicg9.
Feb 14 10:02:39 hostname.domain.com systemd[1]: Starting Session 6467489 of user tropicg9.
Feb 14 10:02:39 hostname.domain.com sshd[10654]: pam_unix(sshd:session): session opened for user tropicg9 by (uid=0)
Feb 14 10:02:39 hostname.domain.com sshd[10656]: Received disconnect from 1.2.3.4: 11: disconnected by user
Feb 14 10:02:39 hostname.domain.com sshd[10654]: pam_unix(sshd:session): session closed for user tropicg9
Feb 14 10:02:39 hostname.domain.com systemd-logind[737]: Removed session 6467489.session 6467489.
更新2-16-16
这是来自systemd-cgtop的输出,显示了活动控制组的资源使用情况(向右滚动)。这显示了“根”路径下的所有重资源使用情况。这似乎并没有缩小范围,但是也许这些信息可能会有所帮助。
/ run / systemd / system /下只有86个作用域文件和相关目录,最多6天。存在一个问题,这些文件在SSH连接期间被孤立,导致成千上万的条目和较高的CPU负载,但这在这里没有发生。
Path Tasks %CPU Memory Input/s Output/s
/ 296 30.5 11.3G 657.8K 893.0K
/system.slice/NetworkManager.service 1 - - - -
/system.slice/auditd.service 1 - - - -
/system.slice/crond.service 1 - - - -
/system.slice/dbus.service 1 - - - -
/system.slice/irqbalance.service 1 - - - -
/system.slice/lvm2-lvmetad.service 1 - - - -
/system.slice/mariadb.service 2 - - - -
/system.slice/nginx.service 10 - - - -
/system.slice/php-fpm.service 101 - - - -
/system.slice/polkit.service 1 - - - -
/system.slice/postfix.service 3 - - - -
/system.slice/rsyslog.service 1 - - - -
/system.slice/smartd.service 1 - - - -
/system.slice/sshd.service 2 - - - -
/system.slice/system-getty.slice/getty@tty1.service 1 - - - -
/system.slice/systemd-journald.service 1 - - - -
/system.slice/systemd-logind.service 1 - - - -
/system.slice/systemd-udevd.service 1 - - - -
/system.slice/tuned.service 1 - - - -
/system.slice/wpa_supplicant.service 1 - - - -
/user.slice/user-1000.slice/session-7170741.scope 4 - - - -
临时清除系统内存
看来运行systemctl daemon-reexec
将释放分配给PID 1进程的所有内存。但是,泄漏仍在继续。解决此问题的一个权宜之计是每天设置cron清除内存,但不能解决泄漏问题。我已经向Redhat 提交了一个错误,因为这是CentOS 7.x的systemd的稳定发行版。希望可以找到并堵塞泄漏。