Remix.run Logo
Felger 10 hours ago

Hi, I believe you are quite new to workstation/hardware admin. Lots of things to say here (not native english speaking so basic style, sorry for that) :

Disk errors logged in the system event log are from the I/O layer, low-level class driver (msahci.sys) / filter drivers. See Windows Storage Driver Architecture : https://learn.microsoft.com/en-us/windows-hardware/drivers/s...

A disk error of this type showing in the event log must immediately be treated as an actual disk issue. This is a low level issue below the actual filesystem and application/services. Seems here the .mdf/ldf of your SQL database used one or more bad sectors on the disk surface.

Your disk seems to be only one on the system, so the first thing to do is check SMART status, for example with Crystaldiskinfo (the most used and user-friendly free portable windows software).

It would very probably have shown a warning state for the internal disk, with probably one or more (judging the quantity of disk error entries in your log) for Attribute C5 "Current Pending sectors" and probably some in Att 05 - "Reallocated sectors count" and/or Att C4 - "Realloc event Count".

Second thing to do is trying to backup your data as fast as possible. In your case related to a Ms SQL database, trying to dump it / backup first was the good move. Sadly (DR pro experience here), weak surface / failing Head Stack Assembly of a traditionnal HDD from most vendors has more difficulties reading correctly a sector than writing it.

If the dump/backup fails, the second choice would have be to try to a sector-to-sector dump approach of the whole disk, with either a online (from OS) software capable of reading sectors from the boot disk (didn't try if HDD Raw Copy Tool 2.6 supports it), or an offline solution like Clonezilla, Acronis True Image, Aomei backupper, etc. But offline solution means offline computer and service...

I didn't exactly understood if you had an actual backup of the data or an image of the whole disk. Considering the critical usage of this station, you should have both running : daily data backup or more + up-to-date disk image ready. whatever the type of disk (HDD/SSD). And a spare, identical computer.

As for repair of HDDs "weak sectors" (meaning Current Pending Sectors), it is indeed possible, often with complete data recovery. If not, the sector will be left as is, or may be remapped if written to 0 (it will then shift from Current pending to realloc sector count).

Hard disk Sentinel Pro as such features (Disk repair, Quick Fix), it works quite well. The result vary greatly from one type of failure to another, as from one disk maker to another.

Note that if the SMART shows more than a little dozen of sectors, the head (amp/preamp) is probably failing, making weaker magnetically-wise sectors too difficult to read and/or write. In this case, the count of current and remapped increases every repair/check pass made by the tools. In this case, the drive is toast and must be replaced ASAP.

SSD are a complete different case for repair.

A older autonomous tool, Spinrite, was specialized for this usage (accurate recovery of data), but veeeeery slow.

RAID pertinence : fortunately, it is an expecteed case as most SATA disks are prone to HSA failure before not initialyzing at all. A RAID 1 mirror would have protected you from a mirrored defect accross the two disks.

The RAID controller (true hardware controller like LSI/Avago or Microsemi) or even fake raid like Intel RST / VROC maintain data integrity accross the array's disks. The defective disk will raise bad blocks (that will get marked in metadata of the Raid Volume), but the others disks are fine and the data can be read safely. If too many errors are reported on a disk (very few in fact on most controllers), it will be labelled as failed and taken down from the array.

gruez 9 hours ago | parent | next [-]

>Disk errors logged in the system event log are from the I/O layer, low-level class driver (msahci.sys) / filter drivers. See Windows Storage Driver Architecture : https://learn.microsoft.com/en-us/windows-hardware/drivers/s...

What filters in the event log would you apply to find such errors?

9 hours ago | parent [-]
[deleted]
jwrallie 7 hours ago | parent | prev | next [-]

If taking it offline is not a concern, I would try a low level backup with ddrescue while booting from external media as soon as possible.

Keep using the system from a disk showing read issues could trigger loss of more data, and one could always back up the SQL from the backup image later.

r1chk1t 10 hours ago | parent | prev [-]

Thank you for all that, learned a lot !