我最近将家庭服务器从Ubuntu 10.04升级到了12.04.1。它运行linux-image-server内核x86_64 arch。
我认为没有什么特别的异常运行-泛滥的守护进程,apache2,带有IP伪装的iptables防火墙,DHCP服务器,绑定DNS服务器,该区域具有自动更新的主机名的区域文件,DHCP客户端使用sshd,nfs服务器标识自己,其他的东西。这台机器是我的路由器-位于互联网和本地网络之间。
自升级以来,它一直间歇性地失败。开机后暂时会没事的,然后突然我们将失去wifi上的网络连接。如果插入网络电缆,则无法从DHCP服务器获取IP地址。如果我将自己设置为静态IP地址,则可以继续正常访问互联网。这使得它看起来像是DHCP服务器发生了故障(实际上,我正在运行,dhclient -v eth0
并且没有任何响应dhcpdiscover的响声),这在客户端尝试续订其IP租约时会注意到。但是使用静态IP进行有线连接后,我仍然可以连接到Internet,因此iptables仍然很好。
因此,我尝试通过SSH登录到计算机,但似乎挂起了。如果我使ssh变得很冗长,我会发现它确实建立了与服务器的连接,然后在线下进一步失败了-很难确切知道在哪里。
我注意到,如果我尝试从其HTTP服务器获取网页,则会得到我请求的页面,但不会提供任何额外的请求(用于图像,样式表,javascript)。但是,如果我直接从curl中请求它们,我可以得到这些文件。
这是否意味着每当有东西试图分叉时,事情就会变得艰难了?
我将监视器和键盘拖到服务器上(通常是无头的)并进行了查看-我看到了堆栈跟踪。
我切换到新的虚拟终端,然后尝试登录。输入密码后,出现堆栈跟踪(一般保护错误)。这里是:
Jan 6 20:19:54 localhost kernel: [ 1475.178245] general protection fault: 0000 [#12] SMP
Jan 6 20:19:54 localhost kernel: [ 1475.178292] CPU 1
Jan 6 20:19:54 localhost kernel: [ 1475.178309] Modules linked in: btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ext2 nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc dm_crypt ppdev ipt_REJECT ipt_LOG ipt_MASQUERADE xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables joydev sp5100_tco edac_core i2c_piix4 serio_raw k8temp edac_mce_amd snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore parport_pc snd_page_alloc mac_hid shpchp lp parport radeon 8139too ttm drm_kms_helper drm pata_atiixp i2c_algo_bit usbhid hid wmi r8169
Jan 6 20:19:54 localhost kernel: [ 1475.178911]
Jan 6 20:19:54 localhost kernel: [ 1475.178927] Pid: 1305, comm: login Tainted: G B D 3.2.0-35-generic #55-Ubuntu Gigabyte Technology Co., Ltd. GA-MA785GM-US2H/GA-MA785GM-US2H
Jan 6 20:19:54 localhost kernel: [ 1475.179028] RIP: 0010:[<ffffffff8116589a>] [<ffffffff8116589a>] kmem_cache_alloc+0x5a/0x140
Jan 6 20:19:54 localhost kernel: [ 1475.179096] RSP: 0018:ffff88006b251d78 EFLAGS: 00010206
Jan 6 20:19:54 localhost kernel: [ 1475.179135] RAX: 0000000000000000 RBX: 00007f062bb91000 RCX: 000000000005b2ed
Jan 6 20:19:54 localhost kernel: [ 1475.179186] RDX: 000000000005b2ec RSI: 0000000000016da0 RDI: ffff88006d408a00
Jan 6 20:19:54 localhost kernel: [ 1475.179236] RBP: ffff88006b251dc8 R08: ffff88006fa96da0 R09: 0000000000000001
Jan 6 20:19:54 localhost kernel: [ 1475.179287] R10: 00000000000000d1 R11: ffff88006b23a8f0 R12: ffff88006d408a00
Jan 6 20:19:54 localhost kernel: [ 1475.179336] R13: 2665c4979a04b7b8 R14: ffffffff811447c5 R15: 00000000000080d0
Jan 6 20:19:54 localhost kernel: [ 1475.179387] FS: 00007f062bb81700(0000) GS:ffff88006fa80000(0000) knlGS:0000000000000000
Jan 6 20:19:54 localhost kernel: [ 1475.179445] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 6 20:19:54 localhost kernel: [ 1475.179486] CR2: 00007f9b4d79da00 CR3: 0000000059a34000 CR4: 00000000000006e0
Jan 6 20:19:54 localhost kernel: [ 1475.179536] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 6 20:19:54 localhost kernel: [ 1475.179586] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 6 20:19:54 localhost kernel: [ 1475.179637] Process login (pid: 1305, threadinfo ffff88006b250000, task ffff880036058000)
Jan 6 20:19:54 localhost kernel: [ 1475.179695] Stack:
Jan 6 20:19:54 localhost kernel: [ 1475.179711] ffff880036058000 0000000000000041 0000000000000001 ffffffff81188cec
Jan 6 20:19:54 localhost kernel: [ 1475.179777] 0000000000000282 00007f062bb91000 ffff88006822ce00 0000000000000001
Jan 6 20:19:54 localhost kernel: [ 1475.179841] 0000000000001000 0000000000000000 ffff88006b251e88 ffffffff811447c5
Jan 6 20:19:54 localhost kernel: [ 1475.179905] Call Trace:
Jan 6 20:19:54 localhost kernel: [ 1475.179928] [<ffffffff81188cec>] ? path_openat+0xfc/0x3f0
Jan 6 20:19:54 localhost kernel: [ 1475.179971] [<ffffffff811447c5>] mmap_region+0x2a5/0x4f0
Jan 6 20:19:54 localhost kernel: [ 1475.180012] [<ffffffff81144d58>] do_mmap_pgoff+0x348/0x360
Jan 6 20:19:54 localhost kernel: [ 1475.180054] [<ffffffff81144e36>] sys_mmap_pgoff+0xc6/0x230
Jan 6 20:19:54 localhost kernel: [ 1475.180098] [<ffffffff81018b12>] sys_mmap+0x22/0x30
Jan 6 20:19:54 localhost kernel: [ 1475.180136] [<ffffffff816655c2>] system_call_fastpath+0x16/0x1b
Jan 6 20:19:54 localhost kernel: [ 1475.180180] Code: 00 4d 8b 04 24 65 4c 03 04 25 50 da 00 00 49 8b 50 08 4d 8b 28 4d 85 ed 0f 84 d8 00 00 00 49 63 44 24 20 49 8b 34 24 48 8d 4a 01 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0e 0f 94 c0 84 c0 74 c2 4d
Jan 6 20:19:54 localhost kernel: [ 1475.180503] RIP [<ffffffff8116589a>] kmem_cache_alloc+0x5a/0x140
Jan 6 20:19:54 localhost kernel: [ 1475.180552] RSP <ffff88006b251d78>
Jan 6 20:19:54 localhost kernel: [ 1475.180603] ---[ end trace 766ef1ef52f774b9 ]---
如果看了足够长的时间,就会发现更多一般性保护故障。我见过他们login
,apache2
,deluge-web
,head
,powerbtn.sh
到目前为止。
我必须对机器进行硬复位,以使其恢复工作状态(powerbtn.sh
按下电源按钮时甚至会遇到一般性保护故障),但不久之后又又恢复了这种状态。
我还没有想出如何按需复制它-它似乎是随机发生的。
如果有用,我浏览了kern.log,发现了第一个这样的错误。有一排开始,一吨他们所有的zsh
,然后deluged
,apache2
,cron
,head
,console-kit-dae
,irqbalance
,nmbd
...下面是zsh
一个和右后是坏页面状态错误:
Jan 6 20:13:35 localhost kernel: [ 1096.184250] general protection fault: 0000 [#1] SMP
Jan 6 20:13:35 localhost kernel: [ 1096.186339] CPU 1
Jan 6 20:13:35 localhost kernel: [ 1096.186355] Modules linked in: btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ext2 nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc dm_crypt ppdev ipt_REJECT ipt_LOG ipt_MASQUERADE xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables joydev sp5100_tco edac_core i2c_piix4 serio_raw k8temp edac_mce_amd snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore parport_pc snd_page_alloc mac_hid shpchp lp parport radeon 8139too ttm drm_kms_helper drm pata_atiixp i2c_algo_bit usbhid hid wmi r8169
Jan 6 20:13:35 localhost kernel: [ 1096.188008]
Jan 6 20:13:35 localhost kernel: [ 1096.188008] Pid: 2564, comm: zsh Not tainted 3.2.0-35-generic #55-Ubuntu Gigabyte Technology Co., Ltd. GA-MA785GM-US2H/GA-MA785GM-US2H
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RIP: 0010:[<ffffffff8116589a>] [<ffffffff8116589a>] kmem_cache_alloc+0x5a/0x140
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RSP: 0018:ffff880059877d78 EFLAGS: 00010206
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RAX: 0000000000000000 RBX: 00007f202c59d000 RCX: 000000000005b2ed
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RDX: 000000000005b2ec RSI: 0000000000016da0 RDI: ffff88006d408a00
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RBP: ffff880059877dc8 R08: ffff88006fa96da0 R09: 0000000000000001
Jan 6 20:13:35 localhost kernel: [ 1096.188008] R10: 0000000000100073 R11: ffff880059dbb2c0 R12: ffff88006d408a00
Jan 6 20:13:35 localhost kernel: [ 1096.188008] R13: 2665c4979a04b7b8 R14: ffffffff811447c5 R15: 00000000000080d0
Jan 6 20:13:35 localhost kernel: [ 1096.188008] FS: 00007f202c5ac700(0000) GS:ffff88006fa80000(0000) knlGS:0000000000000000
Jan 6 20:13:35 localhost kernel: [ 1096.188008] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 6 20:13:35 localhost kernel: [ 1096.188008] CR2: 00000000025991f0 CR3: 0000000059dbc000 CR4: 00000000000006e0
Jan 6 20:13:35 localhost kernel: [ 1096.188008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 6 20:13:35 localhost kernel: [ 1096.188008] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 6 20:13:35 localhost kernel: [ 1096.188008] Process zsh (pid: 2564, threadinfo ffff880059876000, task ffff88006b6b5c00)
Jan 6 20:13:35 localhost kernel: [ 1096.188008] Stack:
Jan 6 20:13:35 localhost kernel: [ 1096.188008] 0000000000000001 0000000000001000 0000000000000001 ffffffff8129e2e0
Jan 6 20:13:35 localhost kernel: [ 1096.188008] 0000000000000001 00007f202c59d000 ffff88006822f480 0000000000000001
Jan 6 20:13:35 localhost kernel: [ 1096.188008] 0000000000001000 0000000000000000 ffff880059877e88 ffffffff811447c5
Jan 6 20:13:35 localhost kernel: [ 1096.188008] Call Trace:
Jan 6 20:13:35 localhost kernel: [ 1096.188008] [<ffffffff8129e2e0>] ? cap_vm_enough_memory+0x50/0x60
Jan 6 20:13:35 localhost kernel: [ 1096.188008] [<ffffffff811447c5>] mmap_region+0x2a5/0x4f0
Jan 6 20:13:35 localhost kernel: [ 1096.188008] [<ffffffff81144d58>] do_mmap_pgoff+0x348/0x360
Jan 6 20:13:35 localhost kernel: [ 1096.188008] [<ffffffff81144eb1>] sys_mmap_pgoff+0x141/0x230
Jan 6 20:13:35 localhost kernel: [ 1096.188008] [<ffffffff81018b12>] sys_mmap+0x22/0x30
Jan 6 20:13:35 localhost kernel: [ 1096.188008] [<ffffffff816655c2>] system_call_fastpath+0x16/0x1b
Jan 6 20:13:35 localhost kernel: [ 1096.188008] Code: 00 4d 8b 04 24 65 4c 03 04 25 50 da 00 00 49 8b 50 08 4d 8b 28 4d 85 ed 0f 84 d8 00 00 00 49 63 44 24 20 49 8b 34 24 48 8d 4a 01 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0e 0f 94 c0 84 c0 74 c2 4d
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RIP [<ffffffff8116589a>] kmem_cache_alloc+0x5a/0x140
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RSP <ffff880059877d78>
Jan 6 20:13:35 localhost kernel: [ 1096.274513] ---[ end trace 766ef1ef52f774ae ]---
Jan 6 20:13:37 localhost kernel: [ 1097.836149] BUG: Bad page state in process swapper/0 pfn:59a33
Jan 6 20:13:37 localhost kernel: [ 1097.838885] page:ffffea0001668cc0 count:0 mapcount:-1 mapping: (null) index:0xffff880059a33160
Jan 6 20:13:37 localhost kernel: [ 1097.841673] page flags: 0x100000000000000()
Jan 6 20:13:37 localhost kernel: [ 1097.844440] Modules linked in: btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ext2 nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc dm_crypt ppdev ipt_REJECT ipt_LOG ipt_MASQUERADE xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables joydev sp5100_tco edac_core i2c_piix4 serio_raw k8temp edac_mce_amd snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore parport_pc snd_page_alloc mac_hid shpchp lp parport radeon 8139too ttm drm_kms_helper drm pata_atiixp i2c_algo_bit usbhid hid wmi r8169
Jan 6 20:13:37 localhost kernel: [ 1097.856881] Pid: 0, comm: swapper/0 Tainted: G D 3.2.0-35-generic #55-Ubuntu
Jan 6 20:13:37 localhost kernel: [ 1097.860020] Call Trace:
Jan 6 20:13:37 localhost kernel: [ 1097.863063] <IRQ> [<ffffffff8111fe8f>] bad_page.part.61+0x9f/0xf0
Jan 6 20:13:37 localhost kernel: [ 1097.866119] [<ffffffff8111fef8>] bad_page+0x18/0x30
Jan 6 20:13:37 localhost kernel: [ 1097.869158] [<ffffffff8112098e>] free_pages_prepare+0x10e/0x120
Jan 6 20:13:37 localhost kernel: [ 1097.872178] [<ffffffff81120af9>] free_hot_cold_page+0x49/0x1a0
Jan 6 20:13:37 localhost kernel: [ 1097.875183] [<ffffffff81120c7d>] __free_pages+0x2d/0x40
Jan 6 20:13:37 localhost kernel: [ 1097.878163] [<ffffffff8159a8fb>] tcp_v4_destroy_sock+0x25b/0x2c0
Jan 6 20:13:37 localhost kernel: [ 1097.881105] [<ffffffff81582695>] inet_csk_destroy_sock+0x55/0x140
Jan 6 20:13:37 localhost kernel: [ 1097.883970] [<ffffffff815849b0>] tcp_done+0x50/0x90
Jan 6 20:13:37 localhost kernel: [ 1097.886853] [<ffffffff81591d92>] tcp_rcv_state_process+0x422/0x5f0
Jan 6 20:13:37 localhost kernel: [ 1097.889724] [<ffffffff8159a597>] tcp_v4_do_rcv+0xc7/0x1d0
Jan 6 20:13:37 localhost kernel: [ 1097.892513] [<ffffffff8159c1f1>] tcp_v4_rcv+0x581/0x820
Jan 6 20:13:37 localhost kernel: [ 1097.895301] [<ffffffff81577b60>] ? ip_rcv_finish+0x370/0x370
Jan 6 20:13:37 localhost kernel: [ 1097.898110] [<ffffffff81577b60>] ? ip_rcv_finish+0x370/0x370
Jan 6 20:13:37 localhost kernel: [ 1097.900915] [<ffffffff81577c3d>] ip_local_deliver_finish+0xdd/0x280
Jan 6 20:13:37 localhost kernel: [ 1097.903716] [<ffffffff81577fa8>] ip_local_deliver+0x88/0x90
Jan 6 20:13:37 localhost kernel: [ 1097.906502] [<ffffffff815778fd>] ip_rcv_finish+0x10d/0x370
Jan 6 20:13:37 localhost kernel: [ 1097.909279] [<ffffffff815781e5>] ip_rcv+0x235/0x300
Jan 6 20:13:37 localhost kernel: [ 1097.912067] [<ffffffff81613dc7>] ? packet_rcv_spkt+0x47/0x190
Jan 6 20:13:37 localhost kernel: [ 1097.914831] [<ffffffff81543446>] __netif_receive_skb+0x4d6/0x550
Jan 6 20:13:37 localhost kernel: [ 1097.917624] [<ffffffff81544230>] netif_receive_skb+0x80/0x90
Jan 6 20:13:37 localhost kernel: [ 1097.920415] [<ffffffff81536474>] ? __netdev_alloc_skb+0x24/0x50
Jan 6 20:13:37 localhost kernel: [ 1097.923124] [<ffffffffa00d6e90>] rtl8139_rx+0x150/0x2b0 [8139too]
Jan 6 20:13:37 localhost kernel: [ 1097.925754] [<ffffffffa00d704a>] rtl8139_poll+0x5a/0xd0 [8139too]
Jan 6 20:13:37 localhost kernel: [ 1097.928274] [<ffffffff81544bd4>] net_rx_action+0x134/0x290
Jan 6 20:13:37 localhost kernel: [ 1097.930698] [<ffffffff8103df8b>] ? native_safe_halt+0xb/0x10
Jan 6 20:13:37 localhost kernel: [ 1097.933115] [<ffffffff8106f6e8>] __do_softirq+0xa8/0x210
Jan 6 20:13:37 localhost kernel: [ 1097.935495] [<ffffffff810967f5>] ? do_timer+0x25/0x30
Jan 6 20:13:37 localhost kernel: [ 1097.937836] [<ffffffff81035dc2>] ? ack_apic_level+0x72/0x190
Jan 6 20:13:37 localhost kernel: [ 1097.940163] [<ffffffff8166782c>] call_softirq+0x1c/0x30
Jan 6 20:13:37 localhost kernel: [ 1097.942464] [<ffffffff81016305>] do_softirq+0x65/0xa0
Jan 6 20:13:37 localhost kernel: [ 1097.944778] [<ffffffff8106face>] irq_exit+0x8e/0xb0
Jan 6 20:13:37 localhost kernel: [ 1097.947068] [<ffffffff816680e3>] do_IRQ+0x63/0xe0
Jan 6 20:13:37 localhost kernel: [ 1097.949327] [<ffffffff8165d46e>] common_interrupt+0x6e/0x6e
Jan 6 20:13:37 localhost kernel: [ 1097.951597] <EOI> [<ffffffff8103df8b>] ? native_safe_halt+0xb/0x10
Jan 6 20:13:37 localhost kernel: [ 1097.953891] [<ffffffff810900a8>] ? hrtimer_start+0x18/0x20
Jan 6 20:13:37 localhost kernel: [ 1097.956171] [<ffffffff8101c983>] default_idle+0x53/0x1d0
Jan 6 20:13:37 localhost kernel: [ 1097.958426] [<ffffffff8101cb5d>] amd_e400_idle+0x5d/0x120
Jan 6 20:13:37 localhost kernel: [ 1097.960704] [<ffffffff81013236>] cpu_idle+0xd6/0x120
Jan 6 20:13:37 localhost kernel: [ 1097.962970] [<ffffffff816235ee>] rest_init+0x72/0x74
Jan 6 20:13:37 localhost kernel: [ 1097.965195] [<ffffffff81cfbc03>] start_kernel+0x3b0/0x3bd
Jan 6 20:13:37 localhost kernel: [ 1097.967421] [<ffffffff81cfb388>] x86_64_start_reservations+0x132/0x136
Jan 6 20:13:37 localhost kernel: [ 1097.969660] [<ffffffff81cfb140>] ? early_idt_handlers+0x140/0x140
Jan 6 20:13:37 localhost kernel: [ 1097.971888] [<ffffffff81cfb459>] x86_64_start_kernel+0xcd/0xdc
这里发生了什么事?我能做什么?
memtest
。但是,由于您的痕迹很早就出现了,我怀疑这是一个记忆的东西。您的服务器有什么硬件?你有没有做过调整/超频?