SMART data & Self tests, not sure if my SSD is on it's last gasp

Sat Jan 2 22:14:08 EST 2021

Think it's a driver issue.  Looked in journalctl and there's some errors 
indicated.  One is a video issue, another is some sort of permissions 
issue for user who isn't me.  The permissions issue is with 
tracker-miner, which I find to be highly annoying.  Not quite sure how 
to disable it cleanly with low system impact.

Last fsck was 3 months ago.  Next one is due in 3 months.  So it wasn't 
an overdue fsck...  So I'm not so sure it's disk related at all.

Have contacted system76 and sent them logs.  If I recall correctly, the 
issue seems to be closely related to a driver change (issued by 
system76).  Of course, they are still on break...

Nonetheless, waiting 8-10 minutes for boot is awful.  I don't even think 
my first IBM PC was that slow, even with a boot from floppy disk.

On 1/2/21 9:15 PM, r270 at mrt4.com wrote:
> Examine the time stamps on the syslog and compare them to previous nominal boots. That should indicate where the issue is. If all log entries indicate long delays, then it is something systemic like memory, storage, CPU, a thermal issue, etc. (Note: A systemic issue is not necessarily a hardware fault because a HW device can be incorrectly configured when it is initialized.)
>
> If it was a one-time occurrence then it was most likely an overdue fsck, but syslog will indicate that if that's the case.
>
> Ronald Smith
>
> --------------------------
>
> On Wed, 30 Dec 2020 14:04:43 -0500
> Bruce Labitt <bruce.labitt at myfairpoint.net> wrote:
>
>> I think I have a SSD on the way out.  Last reboot took a REALLY long
>> time.  Like 30 minutes.  I ran the smart data and self test and the SSD
>> passes.  Overall assessment is disk is ok.  I really don't know how to
>> interpret what the results are.
>>
>> I think the disk is in pre-fail based on the smartctl output below
>>
>> /snip
>>
>> === START OF INFORMATION SECTION ===
>> Model Family:     Crucial/Micron RealSSD m4/C400/P400
>> Device Model:     M4-CT256M4SSD2
>> Serial Number:    000000001247091DC2FF
>> LU WWN Device Id: 5 00a075 1091dc2ff
>> Firmware Version: 040H
>> User Capacity:    256,060,514,304 bytes [256 GB]
>> Sector Size:      512 bytes logical/physical
>> Rotation Rate:    Solid State Device
>> Form Factor:      2.5 inches
>> Device is:        In smartctl database [for details use: -P show]
>> ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 6
>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is:    Wed Dec 30 13:49:17 2020 EST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> /snip
>>
>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
>> UPDATED  WHEN_FAILED RAW_VALUE
>>     1 Raw_Read_Error_Rate     0x002f   100   100   050 Pre-fail
>> Always       -       0
>>     5 Reallocated_Sector_Ct   0x0033   100   100   010 Pre-fail
>> Always       -       0
>>     9 Power_On_Hours          0x0032   100   100   001 Old_age
>> Always       -       7294
>>    12 Power_Cycle_Count       0x0032   100   100   001 Old_age
>> Always       -       2511
>> 170 Grown_Failing_Block_Ct  0x0033   100   100   010 Pre-fail
>> Always       -       0
>> 171 Program_Fail_Count      0x0032   100   100   001 Old_age
>> Always       -       0
>> 172 Erase_Fail_Count        0x0032   100   100   001 Old_age
>> Always       -       0
>> 173 Wear_Leveling_Count     0x0033   098   098   010 Pre-fail
>> Always       -       66
>> 174 Unexpect_Power_Loss_Ct  0x0032   100   100   001 Old_age
>> Always       -       87
>> 181 Non4k_Aligned_Access    0x0022   100   100   001 Old_age
>> Always       -       10250 5047 5203
>> 183 SATA_Iface_Downshift    0x0032   100   100   001 Old_age
>> Always       -       0
>> 184 End-to-End_Error        0x0033   100   100   050 Pre-fail
>> Always       -       0
>> 187 Reported_Uncorrect      0x0032   100   100   001 Old_age
>> Always       -       0
>> 188 Command_Timeout         0x0032   100   100   001 Old_age
>> Always       -       0
>> 189 Factory_Bad_Block_Ct    0x000e   100   100   001 Old_age
>> Always       -       81
>> 194 Temperature_Celsius     0x0022   100   100   000 Old_age
>> Always       -       0
>> 195 Hardware_ECC_Recovered  0x003a   100   100   001 Old_age
>> Always       -       0
>> 196 Reallocated_Event_Count 0x0032   100   100   001 Old_age
>> Always       -       0
>> 197 Current_Pending_Sector  0x0032   100   100   001 Old_age
>> Always       -       0
>> 198 Offline_Uncorrectable   0x0030   100   100   001 Old_age
>> Offline      -       0
>> 199 UDMA_CRC_Error_Count    0x0032   100   100   001 Old_age
>> Always       -       0
>> 202 Perc_Rated_Life_Used    0x0018   098   098   001 Old_age
>> Offline      -       2
>> 206 Write_Error_Rate        0x000e   100   100   001 Old_age
>> Always       -       0
>>
>> Replace the disk pronto?  Is that what this is telling me?  Or?
>>
>> I recently copied over many important files to another disk.  And
>> downloaded a new OS.  I just hate re-configuring things, and starting
>> from scratch, it's such a pain.  Not as painful as a disk crash, but
>> close.  I've got loads of stuff I've compiled from source and just 100's
>> of things to check or update.  Yes, I'll just have to do it.  It's just
>> the week plus of recovery that I'm rebelling against.
>>
>> Anything else I should do first?  Check something?  Run a test? Any tips
>> to make the "recovery" less painful?
>>
>> _______________________________________________
>> gnhlug-discuss mailing list
>> gnhlug-discuss at mail.gnhlug.org
>> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/