我正在进行全新的Linux安装,在开始之前,我认为这是验证HDD健康状况的好时机,因为如果需要,我可以安全地覆盖HDD上的任何数据。
首先,我尝试使用smartmontools检查...我的Seagate HDD报告了一个当前的挂起扇区和一个不可修复的脱机扇区(大概是同一扇区)。重新分配的扇区数为零。
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
...
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1
但是,SMART自检(短,长,离线,运输)没有发现错误。
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 6631 -
# 2 Conveyance offline Completed without error 00% 6630 -
# 3 Extended offline Completed without error 00% 6622 -
# 4 Short offline Completed without error 00% 6600 -
# 5 Extended offline Completed without error 00% 6632 -
我还尝试在驱动器上运行badblocks -wsv(完全读写4模式通过测试),但未发现任何坏块。然后,我遵循了以下指南(在可能的范围内,因为我在运行Badblocks之后删除了文件系统)在以下位置找到:http : //smartmontools.sourceforge.net/badblockhowto.html
那里说如果我用全零覆盖扇区,磁盘应该移动(重新分配)挂起的扇区。Badblocks的最后写入模式全为零,因此应该这样做。但是,什么都没有改变,我仍然有待处理的扇区数1。
然后,我尝试找出哪个扇区是有问题的扇区,在SMART输出中有一个错误日志:
Error 2 occurred at disk power-on lifetime: 5344 hours (222 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 7c 1b 1a 02 ae Error: ABRT at LBA = 0x0e021a1b = 235018779
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
20 20 7f 18 1a 02 ae 00 00:09:05.228 READ SECTOR(S)
20 20 01 17 1a 02 ae 00 00:09:05.228 READ SECTOR(S)
20 20 01 01 00 00 a0 00 00:08:59.830 READ SECTOR(S)
91 20 3f 01 00 00 af 00 00:08:59.826 INITIALIZE DEVICE PARAMETERS [OBS-6]
10 20 01 01 00 00 a8 00 00:08:59.678 RECALIBRATE [OBS-4]
Error 1 occurred at disk power-on lifetime: 5009 hours (208 days + 17 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 b7 8c 02 e0 Error: UNC at LBA = 0x00028cb7 = 167095
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 20 1e 9e 8c 02 e0 00 00:02:20.691 READ DMA EXT
25 20 1e 80 8c 02 e0 00 00:02:20.691 READ DMA EXT
25 20 1e 62 8c 02 e0 00 00:02:20.690 READ DMA EXT
25 20 1e 44 8c 02 e0 00 00:02:20.690 READ DMA EXT
25 20 1e 26 8c 02 e0 00 00:02:20.690 READ DMA EXT
因此,显然驱动器有两个错误。
84 51 7c 1b 1a 02 ae Error: ABRT at LBA = 0x0e021a1b = 235018779
和
40 51 00 b7 8c 02 e0 Error: UNC at LBA = 0x00028cb7 = 167095
因此,我假设这些是扇区号:167095和235018779。我尝试用dd写入零:
dd if=/dev/zero of=/dev/sda bs=512 count=1 seek=167095
现在,一切正常。但是,当我尝试其他领域时:
dd if=/dev/zero of=/dev/sda bs=512 count=1 seek=235018779
我得到dd:'/ dev / sda':无法寻找:无效的参数。然后我发现我的硬盘只有234441658个扇区。因此,这超出范围。但是,为什么SMART报告该地址错误?!
有人可以帮我解决这个问题,如果我做错了,还可以建议我如何正确执行此操作吗?我怀疑在dd中使用块大小512可能是错误的。这是SMART报告的扇区大小。也许那些LBA地址是字节而不是块,我尝试设置bs = 1并将仅一个字节写入HDD上的那些地址。那确实起作用了(dd写入过程)…但是,此后仍未更改待处理的扇区数。我还调用sync和smartctl -t offline / dev / sda来尝试“强制”驱动器重新分配扇区。没有...
这是我完整的smartctl --all / dev / sda输出:
smartctl 5.43 2012-06-30 r3573 [i686-linux-2.6.32-358.el6.i686] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.9
Device Model: ST3120811AS
Serial Number: 6PT1N4VZ
Firmware Version: 3.AAE
User Capacity: 120,034,123,776 bytes [120 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Mon Nov 18 12:03:00 2013 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 51) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 084 077 006 Pre-fail Always - 185600113
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 098 098 020 Old_age Always - 2185
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 073 055 030 Pre-fail Always - 25890559714
9 Power_On_Hours 0x0032 093 093 000 Old_age Always - 6632
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 098 098 020 Old_age Always - 2229
187 Reported_Uncorrect 0x0032 099 099 000 Old_age Always - 1
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 071 056 045 Old_age Always - 29 (Min/Max 25/29)
194 Temperature_Celsius 0x0022 029 044 000 Old_age Always - 29 (0 13 0 0 0)
195 Hardware_ECC_Recovered 0x001a 052 046 000 Old_age Always - 194244099
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 Data_Address_Mark_Errs 0x0032 066 219 000 Old_age Always - 34
SMART Error Log Version: 1
ATA Error Count: 2
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 2 occurred at disk power-on lifetime: 5344 hours (222 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 7c 1b 1a 02 ae Error: ABRT at LBA = 0x0e021a1b = 235018779
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
20 20 7f 18 1a 02 ae 00 00:09:05.228 READ SECTOR(S)
20 20 01 17 1a 02 ae 00 00:09:05.228 READ SECTOR(S)
20 20 01 01 00 00 a0 00 00:08:59.830 READ SECTOR(S)
91 20 3f 01 00 00 af 00 00:08:59.826 INITIALIZE DEVICE PARAMETERS [OBS-6]
10 20 01 01 00 00 a8 00 00:08:59.678 RECALIBRATE [OBS-4]
Error 1 occurred at disk power-on lifetime: 5009 hours (208 days + 17 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 b7 8c 02 e0 Error: UNC at LBA = 0x00028cb7 = 167095
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 20 1e 9e 8c 02 e0 00 00:02:20.691 READ DMA EXT
25 20 1e 80 8c 02 e0 00 00:02:20.691 READ DMA EXT
25 20 1e 62 8c 02 e0 00 00:02:20.690 READ DMA EXT
25 20 1e 44 8c 02 e0 00 00:02:20.690 READ DMA EXT
25 20 1e 26 8c 02 e0 00 00:02:20.690 READ DMA EXT
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 6631 -
# 2 Conveyance offline Completed without error 00% 6630 -
# 3 Extended offline Completed without error 00% 6622 -
# 4 Short offline Completed without error 00% 6600 -
# 5 Extended offline Completed without error 00% 6632 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
更新:
如rob的回答所建议,我尝试用零覆盖整个HDD。检查SMART值,然后开始读取整个HDD。再次检查SMART值。结果是:在两种情况下,写入后立即读取后,有关挂起/重新分配的扇区计数的SMART值均不会改变。重新分配0。待定1。