又发生了!我有4台服务器,它们定期崩溃,并且没有信息打印到系统日志或串行控制台上。
此外,Linux kdump服务不会将核心转储写入默认位置/var/crash
。
- 你能帮我弄清楚为什么吗?
- 我的根文件系统是LVM卷是否重要?
这是我尝试过的。
我的系统是带有最新内核的Scientific Linux 6.5。
[root@host1 ~]# uname -r 2.6.32-431.11.2.el6.x86_64 [root@host1 ~]# cat /etc/issue Scientific Linux release 6.5 (Carbon)
该文件
/etc/kdump.conf
是包含默认设置的原始文件。大多数行被注释掉了,只有两个主动线path
和core_collector
。#net my.server.com:/export/tmp #net user@my.server.com path /var/crash core_collector makedumpfile -c --message-level 1 -d 31 #core_collector scp
我确保
kdump
服务正在运行,并且kdump
不需要重建我的initrd
。[root@host1 ~]# chkconfig --list kdump kdump 0:off 1:off 2:off 3:on 4:on 5:on 6:off [root@host1 ~]# /etc/init.d/kdump restart Stopping kdump: [ OK ] Starting kdump: [ OK ] [root@host1 ~]#
然后,我使用《RHEL6部署指南:第29章》中的以下命令强制发生内核崩溃。kdump Crash Recovery Service:
然后在shell提示符下键入以下命令:
echo 1 > /proc/sys/kernel/sysrq echo c > /proc/sysrq-trigger
这将迫使Linux内核崩溃
系统崩溃。我可以在串行控制台上查看进度。我看到了该消息
Saving to the local filesystem UUID=e7abcdeb-1987-4c69-a867-fabdceffghi2
,但之后立即看到的奇怪消息Usage: fsck.ext4
,这种情况看起来像是某事在意外调用,fsck
而不是在做什么。我没有提到内存不足错误或其他任何内容。host1.example.org login: SysRq : Trigger a crash BUG: unable to handle kernel NULL pointer dereference at (null) ... ... skipping 50 lines of output ... Creating block device ram8 Creating block device ram9 Creating Remain Block Devices Making device-mapper control node Scanning logical volumes Reading all physical volumes. This may take a while... No volume groups found No volume groups found Activating logical volumes No volume groups found No volume groups found Free memory/Total memory (free %): 58272 / 116616 ( 49.9691 ) Saving to the local filesystem UUID=e7abcdeb-1987-4c69-a867-fabdceffghi2 Usage: fsck.ext4 [-panyrcdfvtDFV] [-b superblock] [-B blocksize] [-I inode_buffer_blocks] [-P process_inode_size] [-l|-L bad_blocks_file] [-C fd] [-j external_journal] [-E extended-options] device Emergency help: -p Autom
然后系统重新启动(这是默认设置)。
当系统重新联机时,中没有任何内容
/var/crash
。我认为故障转储未写入。[root@host1 ~]# ls -lA /var/crash/ total 0 [root@host1 ~]#
我知道崩溃转储通常可以正常工作。如果我告诉我
kdump
使用以下配置将核心转储复制到另一个系统,则kdump将成功将核心转储写入另一个主机:path vmcore ssh user@hostb.example.org sshkey /root/.ssh/kdump_id_rsa
如果我设置
default shell
的/etc/kdump.conf
,重建的initrd,然后系统崩溃,我再次得到关于一个稍微更有意义的错误mount: can't find /mnt in /etc/fstab
Free memory/Total memory (free %): 58272 / 116616 ( 49.9691 ) Saving to the local filesystem UUID=e720481b-1987-4c69-a867-f2b4cba3b312 Usage: fsck.ext4 [-panyrcdfvtDFV] [-b superblock] [-B blocksize] [-I inode_buffer_blocks] [-P process_inode_size] [-l|-L bad_blocks_file] [-C fd] [-j external_journal] [-E extended-options] device Emergency help: -p Automatic repair (no questions) -n Make no changes to the filesystem -y Assume "yes" to all questions -c Check for bad blocks and add them to the badblock list -f Force checking even if filesystem is marked clean -v Be verbose -b superblock Use alternative superblock -B blocksize Force blocksize when looking for superblock -j external_journal Set location of the external journal -l bad_blocks_file Add to badblocks list -L bad_blocks_file Set badblocks list mount: can't find /mnt in /etc/fstab dropping to initramfs shell exiting this shell will reboot your system /sys/block #
但是现在,我被困住了。