Link to home
Start Free TrialLog in
Avatar of PMH4514
PMH4514

asked on

help finding root cause of file system corruption

We ship a PC/software bundle and I have a customer whose system failed to startup (quick bluescreen/reboot/bluscreen/reboot etc..) and the last known & recovery options didn't work.  

We pulled a disk, chkdsk shows thousands of bad files, and a virus scan shows no virus.  (This is Win7 Pro with dual SSD drives using Windows Dynamic Disk mirroring, and both drives are equally affected.)

The customer wants to know what happened, and I'm at a loss. Sure I can say "the OS corrupted the file-system" but that seems incomplete. There was no sudden loss of power that the customer remembers.

Can anybody suggest a way I can actually find root cause?
Avatar of noxcho
noxcho
Flag of Germany image

If it had BSOD then it generated a minidump file. Go to C:\Windows\Minidump and get the dump files from there. Upload the latest one.
Then tell me what type of errors was CHKDSK showing?
Avatar of PMH4514
PMH4514

ASKER

Hi. No BSOD, so no minidump file.

CHKDSK output was mostly lines like this (sorry for German)

Beschädigter Attributeintrag (128, "") wird
vom Datensatzsegment 1253237 gelöscht.

(ie. Damaged attribute record (128, "") is deleted from file record segment 1253237.)

several others related to "index entry deleted"..

ending with (again translated to English):

Free space verification is complete.
Error in (MFT Master File Table) to be corrected.
Error in the attribute BITMAP the Master File Table (MFT) to be corrected.
Errors in volume bitmap to be corrected.
Windows has made corrections to the file system.


Following CHKDSK, still no ability to boot.
replace the hard drive.. maybe the computer was knocked over.. Either way it is a failing hard drive
Avatar of PMH4514

ASKER

well of course, the drive has been replaced, needed data is recovered.. There is no issue I need help "fixing" here.. rather,  I was asked by the customer to explain what happened. I have no idea, and so am just looking for resources to help me figure it out.
German is not a problem, I speak it as well. So the errors you got point to the corrupt MFT which would leave the whole file system helpless.
You said the drive was SSD, did its size in Bytes get smaller after this problem? Possible cause is dead page on the SSD page system, und unluckily the MFT was located in this page exactly. Page is like clusters on HDD.
What did this cause? "Quick bluescreen" - the output of this bluescreen would give us a hint. Are you sure it did not create any .dmp file?
Have you checked in the given directory?
Avatar of PMH4514

ASKER

I would have thought the whole file-system would be helpless. But when I plugged into a dock to make it an external drive, I was able to recover from it most of the data.

The output of the blue-screen was unfortunately not described by the customer before they sent me the drive and I was not able to view it prior to the first CHKDSK attempt which was done by somebody else.  I find no .dmp files when I search the disk.

But I have to wonder about SSDs in the first place.. As I understand (rumor?) there is a limited number of read/write cycles..  Our product does a very high amount of I/O, potentially millions of small files written and read (but very rarely ever deleted.)  Maybe we hit the limit? I'm going to research some SSD diagnostic tools.
You're lucky usually when an SSD fails there is nothing at all that can be done.. The limited # of write cycles (there is no limit on read cycles) actually is a very large #.. and it is each cell can only be written a number of times before it fails and is swapped out.. Most SSD's have pretty decent wear leveling to help mitigate the drive wearing out prematurely
ASKER CERTIFIED SOLUTION
Avatar of noxcho
noxcho
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SSD's like most electronics tend to fail within the first 30 days of use ad the parts get to operating temperature and burn in, otherwise go on for years,
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of PMH4514

ASKER

sorry for the delay.
@noxcho = I think I understand what you're getting at. Yes, we sell a combined package of the PC/drives and software.  They are all the same. So far it is only this PC that has shown the problem, but we know this customer has used it more than most.  I'm just looking for ways to verify if it's a fluke, or if this is a ticking time-bomb for all others. ]

@nobus - not Intel, but those are interesting looking tools.
tx for feedback