Ask HN: Monitoring Hard-Disk Drives and Arrays Health in Linux in 2022?

toast0 · on July 16, 2022

Quick and easy, run smartctl [1] at least once an hour. Add up the bad sectors (reallocated, pending, offline uncorrectable) --- alert if it grows by 10 in one day or so, or if the total hits 100 or so. Also alert if any of the other metrics say failed; if you've got a helium drive, there's a metric for that and you might want a threshold, but I don't have enough experience there.

If you really want to spend time on it, you could monitor disk transfer speeds and seek times and alert if the speeds drop or the seek times increase. But I'd guess that's unlikely to be worth the time.

[1] or whatever if the controller gets in the way and you have to use it's utility instead.

oaf357 · on July 17, 2022

Thank you both for the question and answer. I’ve been trying to figure out when I should be swapping disks in a RAID10 array. I’ll have to tinker with it to get it to work with my RAID controller (PERC H310) but this is exactly what I’m looking for.

toast0 · on July 17, 2022

A quick look says you'll likely need to install the perccli tool from Dell to get the disk info. I'm not going to read its manual, but I'd be surprised if it won't give you the smart data if you do the right incantations.