First the specs:
Role: Main File Server/Data Dump. Server consists of small and large files including Outlook PST’s ranging from 200megs to 5gigs. As well as large creative images (1gig) and small documents 1k to 25megs in size.
Dell Poweredge 2800
2x Intel Xeon 3ghz hyperthreading off
2gig DDR2 ECC
AHA-39160 Ultra 160 PCI used by Powervault 124T LTO3 Autoloader
PERC 4/SC PCI
PERC 4e/Di 256meg (Embedded)
8x 300gig Ultra 320 RAID 5 on PERC 4e/Di (Data)
2x 36gig Ultra 320 RAID 1 on PERC 4/SC (Windows/Boot files)
Adaptive Read Ahead/Write Back/Direct IO Raid Policy
Intel Pro 1000/MT running at 100BaseT Full Duplex
Windows Server 2003 R2 SP1 /w up to date patches
120 users approx 50 users connected at any given time
• Symantec Antivirus Client 10.0.1
• Symantec Backup Exec 10d
• Diskeeper 2007 Enterprise Edition
Symptoms: IO jumps to 100% read times randomly for a few min on the data drive during peek hours which causes the server to slow down to the point where users get disconnected and windows becomes unresponsive. Using perfmon write time on the data drive CPU/RAM/network utilization minimal during this time.
Solutions that didn’t work:
• Originally the server was installed on a separate partition but on the same RAID 5 array as the data drive (8x 300gig Ultra 320 RAID 5 on PERC 4e/Di) resulting in windows becoming unresponsive when the IO jumps. I recently added a separate PERC 4/SC and moved windows/boot files to its own RAID 1 array. This relieved windows of it unresponsiveness but the problem still persisted on the DATA drive.
• Installed http://support.microsoft.com/kb/915691
• Turned off antivirus
• Defragged Drive
• Installed Diskeeper 2007 Enterprise Edition
• Switched from 1000BaseTx to 100BaseTx
• Changed to No Read Ahead/Write Through/Cache IO Raid Policy
• Ran various Dell Diagnostics tools… server passed with no errors. Event Viewer shows no errors.
• Ran raid consistency checks
• Tried other various solutions found in experts-exchange.com
Temporary Solutions that work:
• Reboot the server
• Disable the network card thereby disconnecting all users.
Solutions that I’m considering:
• Replace PERC 4e/di (faulty?) and recreate the raid 5 array backup/restore data
• Install Windows Server 2003 SP2
Thank You and I hope someone can provide some insight.