我的问题是由内存模块故障以及内核二进制文件损坏引起的。
我刚刚使用基本全新的硬件启动了PC。我以前一直在运行Debian 6.0 AMD64,并且在那里没有任何变化(从字面上看;我只是从旧主板上拔下硬盘,然后将它们重新连接到新主板上),但发现有些奇怪:
- 我已经实际安装了4 x 8 GB的RAM
- UEFI / BIOS设置报告16383 MB RAM
- Linux
free -m
报告2985 MB RAM
2985 MB似乎离奇妙的3 GB标记太近了,不能纯粹是巧合,但可以uname -r
打印2.6.32-5-amd64
;显然是64位内核,这是我正在使用的系统驱动器上已安装的所有内核。新主板是华硕M5A97 Pro,它具有四个DDR3插槽,据说可以支持8 GB模块。内存模块本身是相同的,一起购买了四个8 GB的Corsair XMS3 PC12800。
我没有详细查看UEFI设置,但确实浏览了该设置,没有发现似乎需要更改才能启用大量RAM。
编辑:进一步确认我确实在运行64位:
# file `which free`
/usr/bin/free: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped
#
这是怎么回事,我该怎么办?
编辑2:根据要求dmesg,dmidecode和meminfo。我目前没有对该系统的物理访问权,因此必须等到今晚才能取出一些模块并查看其作用。(请注意,dmidecode报告3 x 8GB加上一个空的DIMM插槽。另请注意内核中的MTRR不匹配消息,导致丢失13 GB,至少与主板本身的报告相加。)
# dmidecode --type memory
# dmidecode 2.9
SMBIOS 2.7 present.
Handle 0x0026, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Multi-bit ECC
Maximum Capacity: 32 GB
Error Information Handle: Not Provided
Number Of Devices: 4
Handle 0x0028, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x0026
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: DIMM
Set: None
Locator: DIMM0
Bank Locator: BANK0
Type: <OUT OF SPEC>
Type Detail: Synchronous
Speed: 1333 MHz (0.8 ns)
Manufacturer: Manufacturer0
Serial Number: SerNum0
Asset Tag: AssetTagNum0
Part Number: Array1_PartNumber0
Handle 0x002A, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x0026
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: DIMM
Set: None
Locator: DIMM1
Bank Locator: BANK1
Type: <OUT OF SPEC>
Type Detail: Synchronous
Speed: 1333 MHz (0.8 ns)
Manufacturer: Manufacturer1
Serial Number: SerNum1
Asset Tag: AssetTagNum1
Part Number: Array1_PartNumber1
Handle 0x002C, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x0026
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: DIMM
Set: None
Locator: DIMM2
Bank Locator: BANK2
Type: <OUT OF SPEC>
Type Detail: Synchronous
Speed: 1333 MHz (0.8 ns)
Manufacturer: Manufacturer2
Serial Number: SerNum2
Asset Tag: AssetTagNum2
Part Number: Array1_PartNumber2
Handle 0x002E, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x0026
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: 64 bits
Size: No Module Installed
Form Factor: DIMM
Set: None
Locator: DIMM3
Bank Locator: BANK3
Type: Unknown
Type Detail: Synchronous
Speed: Unknown
Manufacturer: Manufacturer3
Serial Number: SerNum3
Asset Tag: AssetTagNum3
Part Number: Array1_PartNumber3
#
======================================================================
# cat /proc/meminfo
MemTotal: 3056820 kB
MemFree: 1470820 kB
Buffers: 390204 kB
Cached: 194660 kB
SwapCached: 0 kB
Active: 488024 kB
Inactive: 419096 kB
Active(anon): 231112 kB
Inactive(anon): 96660 kB
Active(file): 256912 kB
Inactive(file): 322436 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 8 kB
Writeback: 0 kB
AnonPages: 322320 kB
Mapped: 33012 kB
Shmem: 5472 kB
Slab: 613952 kB
SReclaimable: 597404 kB
SUnreclaim: 16548 kB
KernelStack: 2384 kB
PageTables: 19472 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1528408 kB
Committed_AS: 621464 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 294484 kB
VmallocChunk: 34359429080 kB
HardwareCorrupted: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 9216 kB
DirectMap2M: 2054144 kB
DirectMap1G: 1048576 kB
#
======================================================================
# dmesg | grep -i memory
[ 0.000000] WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 13295MB of RAM.
[ 0.000000] WARNING: at /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/arch/x86/kernel/cpu/mtrr/cleanup.c:1092 mtrr_trim_uncached_memory+0x2e6/0x311()
[ 0.000000] [<ffffffff814f7f1e>] ? mtrr_trim_uncached_memory+0x2e6/0x311
[ 0.000000] [<ffffffff814f7f1e>] ? mtrr_trim_uncached_memory+0x2e6/0x311
[ 0.000000] [<ffffffff814f7f1e>] ? mtrr_trim_uncached_memory+0x2e6/0x311
[ 0.000000] initial memory mapped : 0 - 20000000
[ 0.000000] init_memory_mapping: 0000000000000000-00000000bdf00000
[ 0.000000] PM: Registered nosave memory: 000000000009d000 - 000000000009e000
[ 0.000000] PM: Registered nosave memory: 000000000009e000 - 00000000000a0000
[ 0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000e0000
[ 0.000000] PM: Registered nosave memory: 00000000000e0000 - 0000000000100000
[ 0.000000] PM: Registered nosave memory: 00000000bd94d000 - 00000000bd99c000
[ 0.000000] PM: Registered nosave memory: 00000000bd99c000 - 00000000bd9a6000
[ 0.000000] PM: Registered nosave memory: 00000000bd9a6000 - 00000000bdade000
[ 0.000000] PM: Registered nosave memory: 00000000bdade000 - 00000000bdaef000
[ 0.000000] PM: Registered nosave memory: 00000000bdaef000 - 00000000bdb02000
[ 0.000000] PM: Registered nosave memory: 00000000bdb02000 - 00000000bdb04000
[ 0.000000] PM: Registered nosave memory: 00000000bdb04000 - 00000000bdb0d000
[ 0.000000] PM: Registered nosave memory: 00000000bdb0d000 - 00000000bdb13000
[ 0.000000] PM: Registered nosave memory: 00000000bdb13000 - 00000000bdb75000
[ 0.000000] PM: Registered nosave memory: 00000000bdb75000 - 00000000bdd78000
[ 0.000000] Memory: 3046732k/3111936k available (3075k kernel code, 4728k absent, 60476k reserved, 1879k data, 584k init)
[ 1.636730] Freeing initrd memory: 9501k freed
[ 1.647370] Freeing unused kernel memory: 584k freed
[ 4.876602] [TTM] Zone kernel: Available graphics memory: 1528410 kiB.
[ 4.876615] [drm] radeon: 256M of VRAM memory ready
[ 4.876617] [drm] radeon: 512M of GTT memory ready.
[ 25.571018] VBoxDrv: dbg - g_abExecMemory=ffffffffa051d6c0
#
对e820的摸索显示了一系列范围,并以结束e820 update range: 00000000bdf00000 - 000000043f000000 (usable) ==> (reserved)
。43f000000是16 GiB,bdf00000是3039 MiB。我不认为这是偶然的。
# dmesg | grep -i e820
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009d800 (usable)
[ 0.000000] BIOS-e820: 000000000009d800 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 00000000bd94d000 (usable)
[ 0.000000] BIOS-e820: 00000000bd94d000 - 00000000bd99c000 (ACPI NVS)
[ 0.000000] BIOS-e820: 00000000bd99c000 - 00000000bd9a6000 (ACPI data)
[ 0.000000] BIOS-e820: 00000000bd9a6000 - 00000000bdade000 (reserved)
[ 0.000000] BIOS-e820: 00000000bdade000 - 00000000bdaef000 (ACPI NVS)
[ 0.000000] BIOS-e820: 00000000bdaef000 - 00000000bdb02000 (reserved)
[ 0.000000] BIOS-e820: 00000000bdb02000 - 00000000bdb04000 (ACPI NVS)
[ 0.000000] BIOS-e820: 00000000bdb04000 - 00000000bdb0d000 (reserved)
[ 0.000000] BIOS-e820: 00000000bdb0d000 - 00000000bdb13000 (ACPI NVS)
[ 0.000000] BIOS-e820: 00000000bdb13000 - 00000000bdb75000 (reserved)
[ 0.000000] BIOS-e820: 00000000bdb75000 - 00000000bdd78000 (ACPI NVS)
[ 0.000000] BIOS-e820: 00000000bdd78000 - 00000000bdf00000 (usable)
[ 0.000000] BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
[ 0.000000] BIOS-e820: 00000000fec10000 - 00000000fec11000 (reserved)
[ 0.000000] BIOS-e820: 00000000fec20000 - 00000000fec21000 (reserved)
[ 0.000000] BIOS-e820: 00000000fed00000 - 00000000fed01000 (reserved)
[ 0.000000] BIOS-e820: 00000000fed61000 - 00000000fed71000 (reserved)
[ 0.000000] BIOS-e820: 00000000fed80000 - 00000000fed90000 (reserved)
[ 0.000000] BIOS-e820: 00000000fef00000 - 0000000100000000 (reserved)
[ 0.000000] BIOS-e820: 0000000100001000 - 000000043f000000 (usable)
[ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[ 0.000000] e820 update range: 00000000bdf00000 - 000000043f000000 (usable) ==> (reserved)
[ 0.000000] update e820 for mtrr
#
编辑3/4-部分成功:
- 将UEFI BIOS从版本升级
0705 x64 08/23/2011
到1007 02/10/2012
没有帮助:仍然存在完全相同的问题。 - 卸下一个DIMM模块(我很幸运地猜测哪个插槽是#4:距离CPU最远的那个),BIOS可以检测并使用剩余的24 GB,尽管根据建议,不建议使用三个DIMM配置。用户手册中的图表。值得注意的是,将剩余的DIMM之一放在插槽#4中仍然可以使用它,因此该插槽还不错。将“原始” DIMM重新插入该插槽后,我回到了起点。
- 从Debian 6.0.3 AMD64安装CD引导到应急环境并检查其
dmesg
输出不会显示类似的MTRR错误。同样,在那种环境中,安装了3 x 8GB,则24 GB(正负pi或负pi或大约pi;我没有做确切的数学运算)显示为可用free
。 - 升级/重新安装内核(可以进行较小的升级)似乎也解决了MTRR问题。
dmesg
现在报告的总大小为26198016 KB,并且没有MTRR错误,这与安装3 x 8GB时的预期一致。free -m
现在报告的总RAM为24114 MB,坦白地说,这对我来说已经足够了。
这闻起来像是一块倒空的DIMM,外加一个由于某种原因被损坏的内核。后者可能是在断电期间发生的(尽管我必须说这是内核崩溃的一种奇怪方式!)。当我与他们交谈后(希望明天),无法工作的DIMM将立即返回经销商。
(希望如此)最终编辑
我将RMA用作两对DIMM中的一个,经销商认为它损坏了,所以他们给我寄了一对新的DIMM,看来还不错。因此,我现在基本上已经达到了将近一个月前的原计划(尽管其中很大一部分时间并不是真正由于转售商造成的),并且可以使用32 GB的RAM。free -m
报告总内存为32194 MB,内核报告34586624k
初始化时的RAM,两者都与我的期望非常一致。
dmidecode --type memory
的前一百行左右dmesg
(请确保包含任何有关内存的内容)。
WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 13295MB of RAM.
好吧,您缺少13G。