Solved

Server 2003 Corrupting MFT, Corrupting Files, how should I proceed?

Posted on 2011-02-28
17
1,704 Views
Last Modified: 2012-05-11
I have a server that is corrupting files - especially as relating to SQL.

The server is running Server 2003 Enterprise R2, Xeon E5405 w/ 4GB RAM.  Intel mobo using the onboard SATA RAID.  4 Seagate 1TB HDD's, 3 in a RAID 5 with a Hot Spare.

It has had a couple of forced shutdowns by a well-meaning but misinformed maintenance man, and was showing some files that could not be deleted or changed due to corruption - we had first noted the corruption on our backup reports.  I had backed up everything imaginable, even dcpromoing their Exchange server in order to have a good backup of AD in case their server needed recovery.  I ran chkdsk, which complained of MFT corruption as well as a plethora of crosslinked files and the like.  After rebooting again and running a 2nd chkdsk, which showed good, the server booted fine, a "sfc /scannow" ran without incident, I fixed some broken DCOM settings as well as some services permissions, and the server seemed to be running as good as ever with clean backup reports for two nights.

Then it started corrupting files again.

So on 2/26 and 2/27 we received:

Backup started on 2/26/2011 at 1:48 AM.
Warning: Unable to open "C:\WINDOWS\assembly\NativeImages_v2.0.50727_32\PresentationBuildTa#\20ef773b20f6ce721ae60e5c2c2e8f80\PresentationBuildTasks.ni.dll" - skipped.
Reason: The file or directory is corrupted and unreadable.

Then on 2/28 it became:

Backup started on 2/28/2011 at 1:28 AM.
Warning: Unable to open "C:\WINDOWS\assembly\NativeImages_v2.0.50727_32\PresentationBuildTa#\20ef773b20f6ce721ae60e5c2c2e8f80\PresentationBuildTasks.ni.dll" - skipped.
Reason: The file or directory is corrupted and unreadable.


Error: Could not access portions of directory C:\WINDOWS\inf\01F\.NET Data Provider for SqlServer.
You may not have permission to open the file, or the directory may be missing or damaged.
Please contact the owner or administrator.

Warning: Unable to open "C:\WINDOWS\inf\01F\.NET Data Provider for SqlServer" - skipped.
Reason: The file or directory is corrupted and unreadable.

 Errors in the System Log (only seem to happen while the backup is running):

 Event ID:  9 Source: MegaSR

The device, \Device\Scsi\MegaSR1, did not respond within the timeout period.

Event ID :  55  Source: NTFS

The file system structure on the disk is corrupt and unusable. Please run the chkdsk utility on the volume \Device\HarddiskVolume2.


Opinions?


Since this will likely involve having to reload their server, does anyone have any nuggets of wisdom or astounding timesavers for reloading the OS without hosing AD and my Data after the inevitable reformat?






Thanks!
0
Comment
Question by:darnitol500
  • 9
  • 7
17 Comments
 
LVL 42

Expert Comment

by:Davis McCarn
ID: 35007341
Unless there are disk errors or the MegaRaid utility reports health problems, the most common cause of disk corruption is flakey ram.  Get the free ISO from http://www.memtest86.com and boot it to run the memory test.  Let it run for at least a few hours.
0
 
LVL 30

Accepted Solution

by:
pgm554 earned 500 total points
ID: 35008490
RAID 5 and terabyte drives is a very bad idea unless these are enterprise class and even then I would go RAID 6.

You have a 30% chance that if you have a drive failure ,it will fail during a rebuild.
Your issues might be related to to that.

Make and model of disks please.
0
 
LVL 1

Author Comment

by:darnitol500
ID: 35008556
Tell me more about this 30 percent chance of failure during rebuild, please, is it a logical or mechanical issue?  Do you have any external references that provide additional details?

This is not a Raid rebuild situation, but this sounds like important info nonetheless.
0
 
LVL 1

Author Comment

by:darnitol500
ID: 35008605
Seagate st310034as 7200 rpm sata hdd's
0
 
LVL 1

Author Comment

by:darnitol500
ID: 35008606
Seagate st310034as 7200 rpm sata hdd's
0
 
LVL 30

Expert Comment

by:pgm554
ID: 35008614
Also,download an eval copy of Backup Exec 2010 System Restore and create an image out to a USB drive.

Use the restore disk to test on another system using the restore anywhere option and see if your corruption goes away.

If so,I would rebuild the RAID using RAID 6 and restore the image to the new config.
0
 
LVL 30

Assisted Solution

by:pgm554
pgm554 earned 500 total points
ID: 35008758
ST31000340AS

That's a desktop drive and should not be used in a RAID config.

Main difference between the two is something called TLER.
Essentially it involves error recovery methods.

http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery

If you want cheap SATA RAID dives,Samsung Spinpoint F1's can be had for under a $100.
0
 
LVL 30

Assisted Solution

by:pgm554
pgm554 earned 500 total points
ID: 35008840
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 30

Assisted Solution

by:pgm554
pgm554 earned 500 total points
ID: 35008892
0
 
LVL 30

Assisted Solution

by:pgm554
pgm554 earned 500 total points
ID: 35009176
0
 
LVL 1

Author Comment

by:darnitol500
ID: 35009468
These are excellent resources, pgm554!

Thank you for going above and beyond and "teaching a man to fish."

I am busy shifting operations to an alternate server while I troubleshoot this one - I will likely be replacing the mobo RAID with a 3Ware controller (a recurring theme from other forums is that the Intel mobo RAID is not known for reliability), and I will be reusing the existing drives to construct two RAID 1 volumes - I know those Seagate drives are consumer drives, but the customer cannot afford Enterprise-class drives.  They would have afforded them if they knew how much the cheap ones really cost over the long run!.

Before I do that, however, I will be running diagnostics against the RAM, RAID, and MOBO to see if there are any overt failures.

I will keep you posted!

0
 
LVL 1

Author Comment

by:darnitol500
ID: 35009776
The failure is snowballing - I'm glad I'm on top of it - this error began appearing in this mornings Application log:

Source:  ESENT  Event ID:  508

wins (3672) A request to write to the file "C:\WINDOWS\system32\wins\j50tmp.log" at offset 0 (0x0000000000000000) for 4096 (0x00001000) bytes succeeded, but took an abnormally long time (91 seconds) to be serviced by the OS. This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.

I'm 75% certain its a failing drive, and 25% certain that it is a failing controller.  Both are going to be ordered!
0
 
LVL 1

Author Comment

by:darnitol500
ID: 35010232
I can't find Spinpoint F1's, are Spinpoint F3's acceptable?
0
 
LVL 1

Author Comment

by:darnitol500
ID: 35010680
I have analyzed the hard drive space requirements and have found that 1TB is way too much space, the server OS + it's data is under 100GB.

SO:

I have ordered the following:

3ware 9650SE-4LPMLl + it's 4 drive breakout cable
4 x Western Digital RE4 WD2503ABYX 250GB 7200 RPM

I will create two RAID 1 volumes, 1 for the OS, and 1 for the data.

I am planning on using the Backup Exec 2010 System Restore for the move.
0
 
LVL 30

Assisted Solution

by:pgm554
pgm554 earned 500 total points
ID: 35011849
I can't find Spinpoint F1's, are Spinpoint F3's acceptable?
No


http://www.excaliberpc.com/583847/samsung-spinpoint-f1-raid-class.html

The WD's should be fine.

Remember RAID 6 or RAID 10 is also fine

You can use RAID  5 if you use the WD Raptor.
Basically SAS drives with a SATA interface.
Very fast,very reliable,but priced accordingly.
0
 
LVL 1

Author Comment

by:darnitol500
ID: 35044642
Found the issue, one of the drives in the existing array had bad blocks but it was never reported to the controller - cheap Intel RAID - had to figure it out by doing a consistency check then watching the lights on the hdd's - the flaky one stayed almost constantly illuminated as it ruminated on its bad sectors.  An error, however, was never reported during the in-OS consistency checks, I had to do the BIOS check instead.

Since I can recover the existing system I'm going to replace the failed drive, get it going soundly, then migrate to the new (more reliable) array.

The Symantec solution is crippled unless I buy it, so I'm going to set aside the repaired array and use ASR + NTBackup for the restore to the new array.
0
 
LVL 1

Author Comment

by:darnitol500
ID: 35073687
There is an excellent article on Tech Republic regarding RAID 6 and why we shouldn't jump on it too quickly.

http://www.techrepublic.com/blog/datacenter/raid-6-do-you-really-want-it/119
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Every server (virtual or physical) needs a console: and the console can be provided through hardware directly connected, software for remote connections, local connections, through a KVM, etc. This document explains the different types of consol…
These days socially coordinated efforts have turned into a critical requirement for enterprises.
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
This video demonstrates how to create an example email signature rule for a department in a company using CodeTwo Exchange Rules. The signature will be inserted beneath users' latest emails in conversations and will be displayed in users' Sent Items…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now