Solved

Server 2003 Corrupting MFT, Corrupting Files, how should I proceed?

Posted on 2011-02-28
17
1,713 Views
Last Modified: 2012-05-11
I have a server that is corrupting files - especially as relating to SQL.

The server is running Server 2003 Enterprise R2, Xeon E5405 w/ 4GB RAM.  Intel mobo using the onboard SATA RAID.  4 Seagate 1TB HDD's, 3 in a RAID 5 with a Hot Spare.

It has had a couple of forced shutdowns by a well-meaning but misinformed maintenance man, and was showing some files that could not be deleted or changed due to corruption - we had first noted the corruption on our backup reports.  I had backed up everything imaginable, even dcpromoing their Exchange server in order to have a good backup of AD in case their server needed recovery.  I ran chkdsk, which complained of MFT corruption as well as a plethora of crosslinked files and the like.  After rebooting again and running a 2nd chkdsk, which showed good, the server booted fine, a "sfc /scannow" ran without incident, I fixed some broken DCOM settings as well as some services permissions, and the server seemed to be running as good as ever with clean backup reports for two nights.

Then it started corrupting files again.

So on 2/26 and 2/27 we received:

Backup started on 2/26/2011 at 1:48 AM.
Warning: Unable to open "C:\WINDOWS\assembly\NativeImages_v2.0.50727_32\PresentationBuildTa#\20ef773b20f6ce721ae60e5c2c2e8f80\PresentationBuildTasks.ni.dll" - skipped.
Reason: The file or directory is corrupted and unreadable.

Then on 2/28 it became:

Backup started on 2/28/2011 at 1:28 AM.
Warning: Unable to open "C:\WINDOWS\assembly\NativeImages_v2.0.50727_32\PresentationBuildTa#\20ef773b20f6ce721ae60e5c2c2e8f80\PresentationBuildTasks.ni.dll" - skipped.
Reason: The file or directory is corrupted and unreadable.


Error: Could not access portions of directory C:\WINDOWS\inf\01F\.NET Data Provider for SqlServer.
You may not have permission to open the file, or the directory may be missing or damaged.
Please contact the owner or administrator.

Warning: Unable to open "C:\WINDOWS\inf\01F\.NET Data Provider for SqlServer" - skipped.
Reason: The file or directory is corrupted and unreadable.

 Errors in the System Log (only seem to happen while the backup is running):

 Event ID:  9 Source: MegaSR

The device, \Device\Scsi\MegaSR1, did not respond within the timeout period.

Event ID :  55  Source: NTFS

The file system structure on the disk is corrupt and unusable. Please run the chkdsk utility on the volume \Device\HarddiskVolume2.


Opinions?


Since this will likely involve having to reload their server, does anyone have any nuggets of wisdom or astounding timesavers for reloading the OS without hosing AD and my Data after the inevitable reformat?






Thanks!
0
Comment
Question by:darnitol500
  • 9
  • 7
17 Comments
 
LVL 43

Expert Comment

by:Davis McCarn
ID: 35007341
Unless there are disk errors or the MegaRaid utility reports health problems, the most common cause of disk corruption is flakey ram.  Get the free ISO from http://www.memtest86.com and boot it to run the memory test.  Let it run for at least a few hours.
0
 
LVL 30

Accepted Solution

by:
pgm554 earned 500 total points
ID: 35008490
RAID 5 and terabyte drives is a very bad idea unless these are enterprise class and even then I would go RAID 6.

You have a 30% chance that if you have a drive failure ,it will fail during a rebuild.
Your issues might be related to to that.

Make and model of disks please.
0
 
LVL 1

Author Comment

by:darnitol500
ID: 35008556
Tell me more about this 30 percent chance of failure during rebuild, please, is it a logical or mechanical issue?  Do you have any external references that provide additional details?

This is not a Raid rebuild situation, but this sounds like important info nonetheless.
0
 
LVL 1

Author Comment

by:darnitol500
ID: 35008605
Seagate st310034as 7200 rpm sata hdd's
0
 
LVL 1

Author Comment

by:darnitol500
ID: 35008606
Seagate st310034as 7200 rpm sata hdd's
0
 
LVL 30

Expert Comment

by:pgm554
ID: 35008614
Also,download an eval copy of Backup Exec 2010 System Restore and create an image out to a USB drive.

Use the restore disk to test on another system using the restore anywhere option and see if your corruption goes away.

If so,I would rebuild the RAID using RAID 6 and restore the image to the new config.
0
 
LVL 30

Assisted Solution

by:pgm554
pgm554 earned 500 total points
ID: 35008758
ST31000340AS

That's a desktop drive and should not be used in a RAID config.

Main difference between the two is something called TLER.
Essentially it involves error recovery methods.

http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery

If you want cheap SATA RAID dives,Samsung Spinpoint F1's can be had for under a $100.
0
 
LVL 30

Assisted Solution

by:pgm554
pgm554 earned 500 total points
ID: 35008840
0
Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

 
LVL 30

Assisted Solution

by:pgm554
pgm554 earned 500 total points
ID: 35008892
0
 
LVL 30

Assisted Solution

by:pgm554
pgm554 earned 500 total points
ID: 35009176
0
 
LVL 1

Author Comment

by:darnitol500
ID: 35009468
These are excellent resources, pgm554!

Thank you for going above and beyond and "teaching a man to fish."

I am busy shifting operations to an alternate server while I troubleshoot this one - I will likely be replacing the mobo RAID with a 3Ware controller (a recurring theme from other forums is that the Intel mobo RAID is not known for reliability), and I will be reusing the existing drives to construct two RAID 1 volumes - I know those Seagate drives are consumer drives, but the customer cannot afford Enterprise-class drives.  They would have afforded them if they knew how much the cheap ones really cost over the long run!.

Before I do that, however, I will be running diagnostics against the RAM, RAID, and MOBO to see if there are any overt failures.

I will keep you posted!

0
 
LVL 1

Author Comment

by:darnitol500
ID: 35009776
The failure is snowballing - I'm glad I'm on top of it - this error began appearing in this mornings Application log:

Source:  ESENT  Event ID:  508

wins (3672) A request to write to the file "C:\WINDOWS\system32\wins\j50tmp.log" at offset 0 (0x0000000000000000) for 4096 (0x00001000) bytes succeeded, but took an abnormally long time (91 seconds) to be serviced by the OS. This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.

I'm 75% certain its a failing drive, and 25% certain that it is a failing controller.  Both are going to be ordered!
0
 
LVL 1

Author Comment

by:darnitol500
ID: 35010232
I can't find Spinpoint F1's, are Spinpoint F3's acceptable?
0
 
LVL 1

Author Comment

by:darnitol500
ID: 35010680
I have analyzed the hard drive space requirements and have found that 1TB is way too much space, the server OS + it's data is under 100GB.

SO:

I have ordered the following:

3ware 9650SE-4LPMLl + it's 4 drive breakout cable
4 x Western Digital RE4 WD2503ABYX 250GB 7200 RPM

I will create two RAID 1 volumes, 1 for the OS, and 1 for the data.

I am planning on using the Backup Exec 2010 System Restore for the move.
0
 
LVL 30

Assisted Solution

by:pgm554
pgm554 earned 500 total points
ID: 35011849
I can't find Spinpoint F1's, are Spinpoint F3's acceptable?
No


http://www.excaliberpc.com/583847/samsung-spinpoint-f1-raid-class.html

The WD's should be fine.

Remember RAID 6 or RAID 10 is also fine

You can use RAID  5 if you use the WD Raptor.
Basically SAS drives with a SATA interface.
Very fast,very reliable,but priced accordingly.
0
 
LVL 1

Author Comment

by:darnitol500
ID: 35044642
Found the issue, one of the drives in the existing array had bad blocks but it was never reported to the controller - cheap Intel RAID - had to figure it out by doing a consistency check then watching the lights on the hdd's - the flaky one stayed almost constantly illuminated as it ruminated on its bad sectors.  An error, however, was never reported during the in-OS consistency checks, I had to do the BIOS check instead.

Since I can recover the existing system I'm going to replace the failed drive, get it going soundly, then migrate to the new (more reliable) array.

The Symantec solution is crippled unless I buy it, so I'm going to set aside the repaired array and use ASR + NTBackup for the restore to the new array.
0
 
LVL 1

Author Comment

by:darnitol500
ID: 35073687
There is an excellent article on Tech Republic regarding RAID 6 and why we shouldn't jump on it too quickly.

http://www.techrepublic.com/blog/datacenter/raid-6-do-you-really-want-it/119
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The vision: A MegaMenu for a SharePoint portal home page The mission: Make it easy to maintain. Allow rich content and sub headers as well as standard links. Factor in frequent changes without involving developers or a lengthy Dev/Test/Prod rel…
Data center, now-a-days, is referred as the home of all the advanced technologies. In-fact, most of the businesses are now establishing their entire organizational structure around the IT capabilities.
Along with being a a promotional video for my three-day Annielytics Dashboard Seminor, this Micro Tutorial is an intro to Google Analytics API data.
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, just open a new email message. In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now