Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 569
  • Last Modified:

help finding root cause of file system corruption

We ship a PC/software bundle and I have a customer whose system failed to startup (quick bluescreen/reboot/bluscreen/reboot etc..) and the last known & recovery options didn't work.  

We pulled a disk, chkdsk shows thousands of bad files, and a virus scan shows no virus.  (This is Win7 Pro with dual SSD drives using Windows Dynamic Disk mirroring, and both drives are equally affected.)

The customer wants to know what happened, and I'm at a loss. Sure I can say "the OS corrupted the file-system" but that seems incomplete. There was no sudden loss of power that the customer remembers.

Can anybody suggest a way I can actually find root cause?
0
PMH4514
Asked:
PMH4514
  • 4
  • 3
  • 3
  • +1
2 Solutions
 
noxchoCommented:
If it had BSOD then it generated a minidump file. Go to C:\Windows\Minidump and get the dump files from there. Upload the latest one.
Then tell me what type of errors was CHKDSK showing?
0
 
PMH4514Author Commented:
Hi. No BSOD, so no minidump file.

CHKDSK output was mostly lines like this (sorry for German)

Beschädigter Attributeintrag (128, "") wird
vom Datensatzsegment 1253237 gelöscht.

(ie. Damaged attribute record (128, "") is deleted from file record segment 1253237.)

several others related to "index entry deleted"..

ending with (again translated to English):

Free space verification is complete.
Error in (MFT Master File Table) to be corrected.
Error in the attribute BITMAP the Master File Table (MFT) to be corrected.
Errors in volume bitmap to be corrected.
Windows has made corrections to the file system.


Following CHKDSK, still no ability to boot.
0
 
David Johnson, CD, MVPOwnerCommented:
replace the hard drive.. maybe the computer was knocked over.. Either way it is a failing hard drive
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
PMH4514Author Commented:
well of course, the drive has been replaced, needed data is recovered.. There is no issue I need help "fixing" here.. rather,  I was asked by the customer to explain what happened. I have no idea, and so am just looking for resources to help me figure it out.
0
 
noxchoCommented:
German is not a problem, I speak it as well. So the errors you got point to the corrupt MFT which would leave the whole file system helpless.
You said the drive was SSD, did its size in Bytes get smaller after this problem? Possible cause is dead page on the SSD page system, und unluckily the MFT was located in this page exactly. Page is like clusters on HDD.
What did this cause? "Quick bluescreen" - the output of this bluescreen would give us a hint. Are you sure it did not create any .dmp file?
Have you checked in the given directory?
0
 
PMH4514Author Commented:
I would have thought the whole file-system would be helpless. But when I plugged into a dock to make it an external drive, I was able to recover from it most of the data.

The output of the blue-screen was unfortunately not described by the customer before they sent me the drive and I was not able to view it prior to the first CHKDSK attempt which was done by somebody else.  I find no .dmp files when I search the disk.

But I have to wonder about SSDs in the first place.. As I understand (rumor?) there is a limited number of read/write cycles..  Our product does a very high amount of I/O, potentially millions of small files written and read (but very rarely ever deleted.)  Maybe we hit the limit? I'm going to research some SSD diagnostic tools.
0
 
David Johnson, CD, MVPOwnerCommented:
You're lucky usually when an SSD fails there is nothing at all that can be done.. The limited # of write cycles (there is no limit on read cycles) actually is a very large #.. and it is each cell can only be written a number of times before it fails and is swapped out.. Most SSD's have pretty decent wear leveling to help mitigate the drive wearing out prematurely
0
 
noxchoCommented:
Lets think logically, this is not a single machine with SSD and your software that you sell or work with, is it? If not then this problem occurred on this machine only - right?
Then if only one disk made this problem then it is the disk itself but not outworn pages. The hardware can fail, even new one. So don't make conclusions in hurry. Consider that this was a problem with a single drive only.
And still do not forget that this problem could occur also due to the software reason such as controller driver, installed program which corrupt first sector. Thats why I was asking for BSOD exception or minidump.
0
 
David Johnson, CD, MVPOwnerCommented:
SSD's like most electronics tend to fail within the first 30 days of use ad the parts get to operating temperature and burn in, otherwise go on for years,
0
 
nobusCommented:
0
 
PMH4514Author Commented:
sorry for the delay.
@noxcho = I think I understand what you're getting at. Yes, we sell a combined package of the PC/drives and software.  They are all the same. So far it is only this PC that has shown the problem, but we know this customer has used it more than most.  I'm just looking for ways to verify if it's a fluke, or if this is a ticking time-bomb for all others. ]

@nobus - not Intel, but those are interesting looking tools.
0
 
nobusCommented:
tx for feedback
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 4
  • 3
  • 3
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now