Solved

Data Corruption on Raid Volume in Windows 2003 and 2008 Servers

Posted on 2009-05-17
11
939 Views
Last Modified: 2013-11-14
Hi,

I have a corruption issue that i cant seem to shake. It has been ongoing for some months now.

It relates to a Dell Poweredge 1600sc  server that was running Windows 2003 R2 Server.
I had 2 ide drives configured as a raid volume and I began to get corruption warnings from the operating system.

Oddly it was not usually on current files - just the archive stuff.

As all files (that could be read) were copied to Western Digital 500GB External drives (network attached) i wasn't overly concerned.

Especially when a check-disk and a reboot repaired the issue - most of the time.

Some time ago i installed a Silicon Image 3114 raid card and two Maxtor SATA drives, replacing the IDE drives, thinking they must have had an issue.

For a few weeks this did resolve the issue,  but recently it has reappeared.

I found a note on the Microsoft Site about Windows 2003 R2 having a corruption issue (relating to drives formatted with cluster sizes less than 4096) and applied the hotfix, but it did not correct things. http://support.microsoft.com/kb/932578

Last night I reinstalled the server reformatted the drives (not quick - a full format) and installed Windows Server 2008, thinking that would resolve the issue.

The Dell has the latest bios, all hardware devices are detected and operational in Windows Server 2008.

I began copying the files back from the external drives, and about an hour later received the first corruption warning.

I did a check disk and a reboot and all seemed ok... for now...

The Windows 2008 warnings, don't list the file or directly - just the drive - which is a bit annoying as its hard to check if the files have been recovered...so now i am a bit more concerned.

Not sure what to do here - perhaps reformat the raid array with a different (larger) cluster size?

0
Comment
Question by:thetimp
  • 7
  • 4
11 Comments
 
LVL 6

Expert Comment

by:tatw
ID: 24409654
Did you check the SMART status of your harddrives? Are they healthy?
0
 

Author Comment

by:thetimp
ID: 24409749
Hi Tatw,

I have just pulled the lid off the machine and found I have 2 Sata II drives connected to the raid card (Silicon Image) but not configured as a stripe as we had assumed.

There is a 160GB Seagate and a 500GB Maxtor drive.

The Seagate is setup as a single partiton of 149GB.
The Maxtor is split into 2 partitions 100Gb/ 377GB
Anecdotally the 377GB partition contained the most common corruptions, but they were not confined to this drive.
(The corruptions occurred prior to the drive update, and somehow managed to travel from the old drives to the new)

There are no errors displayed during the boot sequence - where i would expect to see smart information.

I have just  installed Everest, to do some disk, benchmarking/ testing and it shows the Smart status of the Maxtor Drive being ok, but doesn't list the Seagate....

should I be concerned - how else can I check the smart status?

0
 

Author Comment

by:thetimp
ID: 24409765
NOTE: The corruption is not occurring on a Raid volume, the files were on a raid volume and now have been copied to a partitioned on a single disk.
I will edit the Question text as soon as I work out how to do that!
0
 
LVL 6

Expert Comment

by:tatw
ID: 24414231
Please also get more messages from the eventviewer about the the corruption.
0
 

Author Comment

by:thetimp
ID: 24414634
Hi tatw,

The error msg is in the code snippet box - this occurred prior to my actions below.
(I am currently copying data to the drives again now, too see if the error reappears.)

Since my last message I have:
Ran Everest Disk Benchmarks over the drives - no errors, but a bit slow - see attached files

Installed (the rather buggy) SeaTools from Seagate, and ran the tests, which either were unavailable or passed.

As the drives are on a raid card, they cant be seen form the SeaTools Dos/CD and cant be formatted via SeaTools inside Windows...

So I have given the drives a low level format, to be sure. No errors were reported.

Updated the Raid Drivers from Silicon Image and installed the management utility checked the 's SMART status's - they are ok.

Turned off the write cache.

As the Silicon Image controller can only handle 1.5Gbps - should i jumper the SATA drives to limit the transfer - rather than relying on autonegotiation? - The Raid utility shows the drives are running at 1.5Gbps fine.....

Perhaps the errors are related to "Native Command Queuing or Tagged Command Queuing"?
Especially when you read this: http://www.hardwareanalysis.com/content/topic/12946/?o=220
But I cant see the setting they discuss in Windows Server 2008.

But I am tempted to replace the drives with 4 new ones tomorrow (in raid 10 - stripe with parity) as I have lost confidence in them.  Perhaps these: Seagate SATAII NCQ 750GB 7200RPM 32mb Cache (ST3750330AS) ?


Log Name:      System

Source:        Ntfs

Date:          17/05/2009 6:11:21 PM

Event ID:      55

Task Category: (2)

Level:         Error

Keywords:      Classic

User:          N/A

Computer:      XXXXXXXX

Description:

The file system structure on the disk is corrupt and unusable. Please run the chkdsk utility on the volume SATA_Filestore.

Event Xml:

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">

  <System>

    <Provider Name="Ntfs" />

    <EventID Qualifiers="49156">55</EventID>

    <Level>2</Level>

    <Task>2</Task>

    <Keywords>0x80000000000000</Keywords>

    <TimeCreated SystemTime="2009-05-17T08:11:21.843Z" />

    <EventRecordID>8021</EventRecordID>

    <Channel>System</Channel>

    <Computer>XXXXXXXX</Computer>

    <Security />

  </System>

  <EventData>

    <Data>

    </Data>

    <Data>SATA_Filestore</Data>

    <Binary>00000C000200380002000000370004C000000000020100C0000000000000000000000000000000004E0C14001B00000000000100</Binary>

  </EventData>

</Event>

Open in new window

ReadTestsMaxtor.png
LinearWrite.png
AvgWriteAccess.png
ReadTests-Seagate.png
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 

Author Comment

by:thetimp
ID: 24416263
Hi tatw,

Partway through copying to the F drive this error appeared about an issue on the E drive.

I had previously copied a few gb onto the E drive and had been "in and out of it" while files were being copied to the F drive.

Then all of a sudden this message - taking out the entire drive..

The E Drive is the Seagate - anecdotally the better of the two.

The error msg is in the code snippet box.

Log Name:      System

Source:        Ntfs

Date:          19/05/2009 6:19:17 AM

Event ID:      55

Task Category: (2)

Level:         Error

Keywords:      Classic

User:          N/A

Computer:      XXXXXXXX

Description:

The file system structure on the disk is corrupt and unusable. Please run the chkdsk utility on the volume Sata_Data.

Event Xml:

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">

  <System>

    <Provider Name="Ntfs" />

    <EventID Qualifiers="49156">55</EventID>

    <Level>2</Level>

    <Task>2</Task>

    <Keywords>0x80000000000000</Keywords>

    <TimeCreated SystemTime="2009-05-18T20:19:17.656Z" />

    <EventRecordID>14170</EventRecordID>

    <Channel>System</Channel>

    <Computer>XXXXXXXX</Computer>

    <Security />

  </System>

  <EventData>

    <Data>

    </Data>

    <Data>Sata_Data</Data>

    <Binary>0C000C000200380002000000370004C000000000020100C0000000000000000000000000000000004E0C14000500000000000500</Binary>

  </EventData>

</Event>

Open in new window

0
 
LVL 6

Expert Comment

by:tatw
ID: 24422349
Actually, what is the cluster size you are using for your drive in NTFS? the default is 4096. Since you already replaced the hardware, I don't think about it at the moment. I remember I have been working with Microsoft on similar issue. The problem finally is because of two many files in the volume. You may try to disable8dot3 on ntfs to fix it. You have to reboot to become effective.
0
 

Author Comment

by:thetimp
ID: 24427191
Hi Tatw,
The cluster size is default - I am tempted to change it, but all the documentation I find says not to..

Surely it cant be a too many files issue?

Quick Summary:
I had the issue under Windows 2003 and couldn't resolve it, so I copied all the files to the external drives no worries, rebuilt the server as Windows 2008, copied some of the files back - got some corruption, so I copied the directories that were not corrupt to my machine (vista 32) - reformatted and benchmarked the server then copies them back, then the last massive corruption occurred.

The last masive "Whole Drive" corruption happened when the drive (logical 100GB on the 500GB Physical) contained one folder containing 1967  files in 187subfolders (2.61GB)

This is currently contained in a folder on my Vista machine: 48.8 GB - 125,898  files in 8076 subfolders.

There was (and still is) no corruption on the Vista machine.

What sort of file numbers per volume are you looking at with Microsoft?
0
 
LVL 6

Expert Comment

by:tatw
ID: 24428377
Last time, I am working on more than 128000 in a single directory.
0
 

Author Comment

by:thetimp
ID: 24428397
Hi Tatw,

OK, that's a lot of files in the one directory - I dont have that number of files - thankfully.

Interestingly the Boot Disc is a 36GB  scsi - which never has had any corruption issues...
0
 

Accepted Solution

by:
thetimp earned 0 total points
ID: 24447083
Update

I have copied the files from the backup source to a new pc, and have received no file corruption in 24 hours.
(files are in use by multiple users)

As the corruption occurred with the IDE Drives, and then the Sata Drives on the Raid card (but not in a raid configuration) can i assume the the Motherboard has an issue?

Especially now on another PC there are no corruption issues - with the same files...

There is a SCSI drive attached the the motherboard, via an on-board SCSI interface, that does not have any issues - is it worth investigating using that with some other drives or am i just asking for trouble?
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

Possible fixes for Windows 7 and Windows Server 2008 updating problem. Solutions mentioned are from Microsoft themselves. I started a case with them from our Microsoft Silver Partner option to open a case and get direct support from Microsoft. If s…
A procedure for exporting installed hotfix details of remote computers using powershell
To efficiently enable the rotation of USB drives for backups, storage pools need to be created. This way no matter which USB drive is installed, the backups will successfully write without any administrative intervention. Multiple USB devices need t…
This tutorial will show how to configure a new Backup Exec 2012 server and move an existing database to that server with the use of the BEUtility. Install Backup Exec 2012 on the new server and apply all of the latest hotfixes and service packs. The…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now