Link to home
Start Free TrialLog in
Avatar of thetimp
thetimp

asked on

Data Corruption on Raid Volume in Windows 2003 and 2008 Servers

Hi,

I have a corruption issue that i cant seem to shake. It has been ongoing for some months now.

It relates to a Dell Poweredge 1600sc  server that was running Windows 2003 R2 Server.
I had 2 ide drives configured as a raid volume and I began to get corruption warnings from the operating system.

Oddly it was not usually on current files - just the archive stuff.

As all files (that could be read) were copied to Western Digital 500GB External drives (network attached) i wasn't overly concerned.

Especially when a check-disk and a reboot repaired the issue - most of the time.

Some time ago i installed a Silicon Image 3114 raid card and two Maxtor SATA drives, replacing the IDE drives, thinking they must have had an issue.

For a few weeks this did resolve the issue,  but recently it has reappeared.

I found a note on the Microsoft Site about Windows 2003 R2 having a corruption issue (relating to drives formatted with cluster sizes less than 4096) and applied the hotfix, but it did not correct things. http://support.microsoft.com/kb/932578 

Last night I reinstalled the server reformatted the drives (not quick - a full format) and installed Windows Server 2008, thinking that would resolve the issue.

The Dell has the latest bios, all hardware devices are detected and operational in Windows Server 2008.

I began copying the files back from the external drives, and about an hour later received the first corruption warning.

I did a check disk and a reboot and all seemed ok... for now...

The Windows 2008 warnings, don't list the file or directly - just the drive - which is a bit annoying as its hard to check if the files have been recovered...so now i am a bit more concerned.

Not sure what to do here - perhaps reformat the raid array with a different (larger) cluster size?

Avatar of Eric Wong
Eric Wong
Flag of Hong Kong image

Did you check the SMART status of your harddrives? Are they healthy?
Avatar of thetimp
thetimp

ASKER

Hi Tatw,

I have just pulled the lid off the machine and found I have 2 Sata II drives connected to the raid card (Silicon Image) but not configured as a stripe as we had assumed.

There is a 160GB Seagate and a 500GB Maxtor drive.

The Seagate is setup as a single partiton of 149GB.
The Maxtor is split into 2 partitions 100Gb/ 377GB
Anecdotally the 377GB partition contained the most common corruptions, but they were not confined to this drive.
(The corruptions occurred prior to the drive update, and somehow managed to travel from the old drives to the new)

There are no errors displayed during the boot sequence - where i would expect to see smart information.

I have just  installed Everest, to do some disk, benchmarking/ testing and it shows the Smart status of the Maxtor Drive being ok, but doesn't list the Seagate....

should I be concerned - how else can I check the smart status?

Avatar of thetimp

ASKER

NOTE: The corruption is not occurring on a Raid volume, the files were on a raid volume and now have been copied to a partitioned on a single disk.
I will edit the Question text as soon as I work out how to do that!
Please also get more messages from the eventviewer about the the corruption.
Avatar of thetimp

ASKER

Hi tatw,

The error msg is in the code snippet box - this occurred prior to my actions below.
(I am currently copying data to the drives again now, too see if the error reappears.)

Since my last message I have:
Ran Everest Disk Benchmarks over the drives - no errors, but a bit slow - see attached files

Installed (the rather buggy) SeaTools from Seagate, and ran the tests, which either were unavailable or passed.

As the drives are on a raid card, they cant be seen form the SeaTools Dos/CD and cant be formatted via SeaTools inside Windows...

So I have given the drives a low level format, to be sure. No errors were reported.

Updated the Raid Drivers from Silicon Image and installed the management utility checked the 's SMART status's - they are ok.

Turned off the write cache.

As the Silicon Image controller can only handle 1.5Gbps - should i jumper the SATA drives to limit the transfer - rather than relying on autonegotiation? - The Raid utility shows the drives are running at 1.5Gbps fine.....

Perhaps the errors are related to "Native Command Queuing or Tagged Command Queuing"?
Especially when you read this: http://www.hardwareanalysis.com/content/topic/12946/?o=220
But I cant see the setting they discuss in Windows Server 2008.

But I am tempted to replace the drives with 4 new ones tomorrow (in raid 10 - stripe with parity) as I have lost confidence in them.  Perhaps these: Seagate SATAII NCQ 750GB 7200RPM 32mb Cache (ST3750330AS) ?


Log Name:      System
Source:        Ntfs
Date:          17/05/2009 6:11:21 PM
Event ID:      55
Task Category: (2)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      XXXXXXXX
Description:
The file system structure on the disk is corrupt and unusable. Please run the chkdsk utility on the volume SATA_Filestore.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Ntfs" />
    <EventID Qualifiers="49156">55</EventID>
    <Level>2</Level>
    <Task>2</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2009-05-17T08:11:21.843Z" />
    <EventRecordID>8021</EventRecordID>
    <Channel>System</Channel>
    <Computer>XXXXXXXX</Computer>
    <Security />
  </System>
  <EventData>
    <Data>
    </Data>
    <Data>SATA_Filestore</Data>
    <Binary>00000C000200380002000000370004C000000000020100C0000000000000000000000000000000004E0C14001B00000000000100</Binary>
  </EventData>
</Event>

Open in new window

ReadTestsMaxtor.png
LinearWrite.png
AvgWriteAccess.png
ReadTests-Seagate.png
Avatar of thetimp

ASKER

Hi tatw,

Partway through copying to the F drive this error appeared about an issue on the E drive.

I had previously copied a few gb onto the E drive and had been "in and out of it" while files were being copied to the F drive.

Then all of a sudden this message - taking out the entire drive..

The E Drive is the Seagate - anecdotally the better of the two.

The error msg is in the code snippet box.

Log Name:      System
Source:        Ntfs
Date:          19/05/2009 6:19:17 AM
Event ID:      55
Task Category: (2)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      XXXXXXXX
Description:
The file system structure on the disk is corrupt and unusable. Please run the chkdsk utility on the volume Sata_Data.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Ntfs" />
    <EventID Qualifiers="49156">55</EventID>
    <Level>2</Level>
    <Task>2</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2009-05-18T20:19:17.656Z" />
    <EventRecordID>14170</EventRecordID>
    <Channel>System</Channel>
    <Computer>XXXXXXXX</Computer>
    <Security />
  </System>
  <EventData>
    <Data>
    </Data>
    <Data>Sata_Data</Data>
    <Binary>0C000C000200380002000000370004C000000000020100C0000000000000000000000000000000004E0C14000500000000000500</Binary>
  </EventData>
</Event>

Open in new window

Actually, what is the cluster size you are using for your drive in NTFS? the default is 4096. Since you already replaced the hardware, I don't think about it at the moment. I remember I have been working with Microsoft on similar issue. The problem finally is because of two many files in the volume. You may try to disable8dot3 on ntfs to fix it. You have to reboot to become effective.
Avatar of thetimp

ASKER

Hi Tatw,
The cluster size is default - I am tempted to change it, but all the documentation I find says not to..

Surely it cant be a too many files issue?

Quick Summary:
I had the issue under Windows 2003 and couldn't resolve it, so I copied all the files to the external drives no worries, rebuilt the server as Windows 2008, copied some of the files back - got some corruption, so I copied the directories that were not corrupt to my machine (vista 32) - reformatted and benchmarked the server then copies them back, then the last massive corruption occurred.

The last masive "Whole Drive" corruption happened when the drive (logical 100GB on the 500GB Physical) contained one folder containing 1967  files in 187subfolders (2.61GB)

This is currently contained in a folder on my Vista machine: 48.8 GB - 125,898  files in 8076 subfolders.

There was (and still is) no corruption on the Vista machine.

What sort of file numbers per volume are you looking at with Microsoft?
Last time, I am working on more than 128000 in a single directory.
Avatar of thetimp

ASKER

Hi Tatw,

OK, that's a lot of files in the one directory - I dont have that number of files - thankfully.

Interestingly the Boot Disc is a 36GB  scsi - which never has had any corruption issues...
ASKER CERTIFIED SOLUTION
Avatar of thetimp
thetimp

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial