Solved

help finding root cause of file system corruption

Posted on 2014-02-24
12
544 Views
Last Modified: 2014-03-18
We ship a PC/software bundle and I have a customer whose system failed to startup (quick bluescreen/reboot/bluscreen/reboot etc..) and the last known & recovery options didn't work.  

We pulled a disk, chkdsk shows thousands of bad files, and a virus scan shows no virus.  (This is Win7 Pro with dual SSD drives using Windows Dynamic Disk mirroring, and both drives are equally affected.)

The customer wants to know what happened, and I'm at a loss. Sure I can say "the OS corrupted the file-system" but that seems incomplete. There was no sudden loss of power that the customer remembers.

Can anybody suggest a way I can actually find root cause?
0
Comment
Question by:PMH4514
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 3
  • +1
12 Comments
 
LVL 47

Expert Comment

by:noxcho
ID: 39882860
If it had BSOD then it generated a minidump file. Go to C:\Windows\Minidump and get the dump files from there. Upload the latest one.
Then tell me what type of errors was CHKDSK showing?
0
 

Author Comment

by:PMH4514
ID: 39883056
Hi. No BSOD, so no minidump file.

CHKDSK output was mostly lines like this (sorry for German)

Beschädigter Attributeintrag (128, "") wird
vom Datensatzsegment 1253237 gelöscht.

(ie. Damaged attribute record (128, "") is deleted from file record segment 1253237.)

several others related to "index entry deleted"..

ending with (again translated to English):

Free space verification is complete.
Error in (MFT Master File Table) to be corrected.
Error in the attribute BITMAP the Master File Table (MFT) to be corrected.
Errors in volume bitmap to be corrected.
Windows has made corrections to the file system.


Following CHKDSK, still no ability to boot.
0
 
LVL 81

Expert Comment

by:David Johnson, CD, MVP
ID: 39883074
replace the hard drive.. maybe the computer was knocked over.. Either way it is a failing hard drive
0
Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

 

Author Comment

by:PMH4514
ID: 39883184
well of course, the drive has been replaced, needed data is recovered.. There is no issue I need help "fixing" here.. rather,  I was asked by the customer to explain what happened. I have no idea, and so am just looking for resources to help me figure it out.
0
 
LVL 47

Expert Comment

by:noxcho
ID: 39883266
German is not a problem, I speak it as well. So the errors you got point to the corrupt MFT which would leave the whole file system helpless.
You said the drive was SSD, did its size in Bytes get smaller after this problem? Possible cause is dead page on the SSD page system, und unluckily the MFT was located in this page exactly. Page is like clusters on HDD.
What did this cause? "Quick bluescreen" - the output of this bluescreen would give us a hint. Are you sure it did not create any .dmp file?
Have you checked in the given directory?
0
 

Author Comment

by:PMH4514
ID: 39883609
I would have thought the whole file-system would be helpless. But when I plugged into a dock to make it an external drive, I was able to recover from it most of the data.

The output of the blue-screen was unfortunately not described by the customer before they sent me the drive and I was not able to view it prior to the first CHKDSK attempt which was done by somebody else.  I find no .dmp files when I search the disk.

But I have to wonder about SSDs in the first place.. As I understand (rumor?) there is a limited number of read/write cycles..  Our product does a very high amount of I/O, potentially millions of small files written and read (but very rarely ever deleted.)  Maybe we hit the limit? I'm going to research some SSD diagnostic tools.
0
 
LVL 81

Expert Comment

by:David Johnson, CD, MVP
ID: 39883890
You're lucky usually when an SSD fails there is nothing at all that can be done.. The limited # of write cycles (there is no limit on read cycles) actually is a very large #.. and it is each cell can only be written a number of times before it fails and is swapped out.. Most SSD's have pretty decent wear leveling to help mitigate the drive wearing out prematurely
0
 
LVL 47

Accepted Solution

by:
noxcho earned 250 total points
ID: 39884042
Lets think logically, this is not a single machine with SSD and your software that you sell or work with, is it? If not then this problem occurred on this machine only - right?
Then if only one disk made this problem then it is the disk itself but not outworn pages. The hardware can fail, even new one. So don't make conclusions in hurry. Consider that this was a problem with a single drive only.
And still do not forget that this problem could occur also due to the software reason such as controller driver, installed program which corrupt first sector. Thats why I was asking for BSOD exception or minidump.
0
 
LVL 81

Expert Comment

by:David Johnson, CD, MVP
ID: 39884120
SSD's like most electronics tend to fail within the first 30 days of use ad the parts get to operating temperature and burn in, otherwise go on for years,
0
 
LVL 92

Assisted Solution

by:nobus
nobus earned 250 total points
ID: 39884917
0
 

Author Comment

by:PMH4514
ID: 39934998
sorry for the delay.
@noxcho = I think I understand what you're getting at. Yes, we sell a combined package of the PC/drives and software.  They are all the same. So far it is only this PC that has shown the problem, but we know this customer has used it more than most.  I'm just looking for ways to verify if it's a fluke, or if this is a ticking time-bomb for all others. ]

@nobus - not Intel, but those are interesting looking tools.
0
 
LVL 92

Expert Comment

by:nobus
ID: 39936181
tx for feedback
0

Featured Post

Increase your protection from Zero Day threats!

Running two Antivirus' is never a good idea.
Taking advantage of Multiple Security layers on the other hand can often save your hide.
See which top notch security software brands have been proven to happily coexist together.
Reduce your chances of becoming a statistic.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

You may have a outside contractor who comes in once a week or seasonal to do some work in your office but you only want to give him access to the programs and files he needs and keep privet all other documents and programs, can you do this on a loca…
Finding original email is quite difficult due to their duplicates. From this article, you will come to know why multiple duplicates of same emails appear and how to delete duplicate emails from Outlook securely and instantly while vital emails remai…
Windows 8 comes with a dramatically different user interface known as Metro. Notably missing from the new interface is a Start button and Start Menu. Many users do not like it, much preferring the interface of earlier versions — Windows 7, Windows X…
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question