Solved

Dell Poweredge R620 continuous OS corruption

Posted on 2014-01-03
5
123 Views
Last Modified: 2016-11-23
We have a brand new Dell Poweredge R620 that was put into production in September 2013 and since then, every 4 weeks the system files become corrupted and we have to reload the OS from scratch or recover from a Backup Exec DR disk.  We notice the issue when we try to launch an application on the server, such as Backup Exec, and receive a missing DLL file message and the program won't launch.  If the systems is rebooted it goes into recovery mode and can't load the OS. We have opened numerous cases with Dell and Microsoft, ran diagnostics on the server and tape library, updated all the firmware, drivers, etc but no solution. The corruption always happens on a Monday, that is why I think there is some sort of process running on the Dell that is causing the corruption. Yesterday, I discovered the Patrol Read process that appears to run on a Saturday, once a month on the embedded H310 mini controller. I set that to "manual" in case that is causing the corruption on the local RAID 1 array.

The server has a  PERC 310 mini embedded controller for 2 local drives in a RAID 1 array.
There is a 6Gbps SAS controller connected to an external tape library (Brand new Dell PV 124t LTO 6 library).  There is a PERC h810 controller card attached to an external DAS (brand new Dell MD1200).   The server was originally loaded with Windows 2008 R2 x64 and after the first crash we installed Windows 2008 Standard x64.  The only software running on the server is Backup Exec 2012 SP3 and EMC Application Xtender for our document management system.  This program just stores some configuration settings for the Document storage repository on the MD1200 and isn't running many processes.  The same software is installed on numerous Windows 7 PCs in our environment and has never caused any issues.  

I'm  guessing that there is some sort of Dell process that is causing the corruption or perhaps a bad sector on one of the drives or an issue with the PERC controller. However, none of the Dell diags have shown any H/W errors.  

We have many Dell PE servers (r610s), PE 2950s and never experienced this type of issue. This "12th" generation server takes forever to boot and has been a complete nightmare!

Any suggestions are appreciated.
0
Comment
Question by:City_of_Del_Mar_IT
5 Comments
 
LVL 19

Expert Comment

by:strivoli
ID: 39755761
Does the server run any AV (AntiVirus) that performs real-time or scheduled scans of server's files?
0
 
LVL 18

Accepted Solution

by:
Netflo earned 500 total points
ID: 39755885
I've seen a similar situation on HP hardware, it was a RAID card problem. When working with the vendor I ran their diagnostics on a 100 loop test and still said the RAID was okay. After replacing the motherboard we were back in action.

If the RAID is embedded into your motherboard, and depending on your warranty agreement with Dell. If you have ProSupport get in touch with your TAM and get them to replace the whole motherboard. Explain you've waited and suffered long enough and this is harming your business.
0
 
LVL 34

Expert Comment

by:Seth Simmons
ID: 39756260
agree with netflo about escalation
this is a bizarre issue and could be the system board and/or perc 310 but definitely a hardware issue.

another thing i thought of is the raid array itself.  has that been destroyed and created again during the times windows was reinstalled or restored?  has the controller log been examined?  only raid1 so not that complex (compared to 5 or 10) but curious if the controller log yielded anything useful about the controller or drives

at my last place i had dozens of dell servers going back 8th/9th gen and had various hardware problems but never had an issue like this either.  i found the 12th gen servers more reliable than previous models and performed well, though agree the POST time is annoying
0
 

Author Comment

by:City_of_Del_Mar_IT
ID: 39759574
Thanks all for the comments. We did not install any AV on this server to eliminate that as a potential cause of our issue. Glad to hear that someone else has seen the same issue. I agree that it's got to be a hardware issue and I will definately pursue getting the motherboard replaced since this is an embedded controller.   I posted the same comment on the Dell forums and one of the Dell reps said that I should upgrade the controller. It seems that the current config should work without error with out having to spend extra $$ on an upgrade that is overkill for the function of this server.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Scenario:  You do full backups to a internal hard drive in either product (SBS or Server 2008).  All goes well for a very long time.  One day, backups begin to fail with a message that the disk is full.  Your disk contains many, many more backups th…
Every server (virtual or physical) needs a console: and the console can be provided through hardware directly connected, software for remote connections, local connections, through a KVM, etc. This document explains the different types of consol…
This tutorial will walk an individual through the steps necessary to enable the VMware\Hyper-V licensed feature of Backup Exec 2012. In addition, how to add a VMware server and configure a backup job. The first step is to acquire the necessary licen…
To efficiently enable the rotation of USB drives for backups, storage pools need to be created. This way no matter which USB drive is installed, the backups will successfully write without any administrative intervention. Multiple USB devices need t…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now