Dell Poweredge T300 intermittant boot issues

Posted on 2011-04-21
Last Modified: 2012-05-11
I have a Dell Poweredge T300 running SBS 2003.  It is setup with RAID 0 with a SAS6/IR controller.  It has 2 swappable 500 GB drives partitioned with a 25GB system drive and a 900+GB "data" partition.

The server had an issue with the Exchange databse maxing out, however, instead of calling me someone decided to "reboot" the server.  I am pretty sure they pulled the plug out of the back when the server hung up on shutdown.  When they tried to bring it back up the server would not boot.  The server displayed the F1 to retry boot, F2 to enter setup menu and then they called me.

I rebooted the server a couple times and the controller was detecting Volume (00:00) as failed and I ended up at the F1/F2 screen.  I hit F1 and it would not boot.  So I pulled HD0 out of the cage and shook it.  Put it back in and booted up and I was able to get into Windows.  So to make sure I shut down the server and it start having the same issues.  I decided to call Dell.

The Dell tech said I needed to reset the BIOS and then once back in Windows update the BIOS and controller firmware and that will resolve my issue.  Dell had me reset the bios which took it back to the defaults and renabled SATA-A and the server threw an error about that.  So Dell had me disable SATA-A and then the server booted into Windows.  I told
the Dell rep that I wanted to shut it down and see if it will come up again without issue.  It didn't.  It threw the Volume (00:00) failed error after the controller initialized and the I got the F1/F2 error.

So Dell had me create a diagnostic CD and boot from that.  While the hard drive test was running I explained to him that there are 2 500GB drives in the server an only one was lit up with green lights (HD1).  HD0 had no light on at all while HD1 had one solid and one blinking when there was HD activity.  After about 20 minutes into the HD test the completion percentage was not increasing from 4% but the byte/sector count was.  So he had me stop the HD test because he said HD0 wasn't even being tested because the lights aren't on and it's dead and he is sending me a new hard drive.  I did explain to him that when the system does boot OK both hard drives have 2 green lights.

So while I was giving him my info I decided to reboot the server.  After 3 or 4 unsuccessful boot attempts I was able to get it to boot up. Each time I rebooted I reseated HD0.

I'm confused because at first the Dell tech said the issue was the Bios and controller needing updates and then all the sudden he had me stop the test and he deemed it HD0 failing as the issue.

So basically my questions are:
How do I know or figure out if its the HD that is the problem and it isn't the controller or backplane?
If the HD is bad how come I can get it into Windows and Windows seems to operate just fine when running?
Whats the deal with the green light on HD0 not working and why wouldn't they be amber or red if it really was dead?

I just seems to me like a controller or the MB is bad and I'm sure Dell would much rather it be a $80 500 GB HD than an expensive MB or RAID controller.

Thanks Experts!
Question by:xactdesign
    LVL 34

    Assisted Solution

    First off raid 0 is a very bad idea if this is an important sever, as it has no redundancy. If one of the drives complete fails(something reseating it wont temporarily remedy) they system is down and any data that wasn't backed up will be very expensive to recover, since reseating it temporarily remedies the issue I'd say either the drive or the backplane has a bad connection, the only way to know for sure is to use a know good hdd, since you have raid 0 when you replace the possibly faulty drive you will have to recreate the raid, reinstall(reimage) and restore data from backup.
    LVL 2

    Author Comment

    Thanks jamietoner, I am aware of this WRT the Raid 0.  Unfortunatly I didnt setup the server so this is what I am stuck with until we upgrade/replace the server.

    The server is up now and I am running backups.  I threw the question out there because I don't want to reinstall the OS and do a recovery if its a backplane issue.  Not only will it throw the same error with a new drive but it will also waste about 8 hours of my time.....
    LVL 32

    Accepted Solution

    To answer your questions:

    How do I know or figure out if its the HD that is the problem and it isn't the controller or backplane?
    It is sometimes difficult to determine which is the point of failure without other parts to swap out and try with, but the rate of failure of a drive is MUCH higher than the rate of failure of a backplane or controller.  Sure those components can fail, but it is much more likely a bad drive.  So, without a way to determine for sure, Dell has a 99% chance of fixing the issue with a replacement drive.

    If the HD is bad how come I can get it into Windows and Windows seems to operate just fine when running?
    Intermittent problems can cause a drive to work fine one moment and not the next, so if it is an intermittent problem, you may get lucky and get past the point of failure.

    Whats the deal with the green light on HD0 not working and why wouldn't they be amber or red if it really was dead?
    The light on the hard drive is green when Online, or participating in an array, and amber when Offline, or NOT participating in an array.  It is not an indication of the drive's health (unless blinking alternately green/amber).  A drive can pass diagnostics but be offline and the light will be amber.  Likewise, a drive can fail diagnostics and be online and the light will be green.  It is merely an online/offline status indicator.

    Good luck.
    LVL 27

    Assisted Solution

    "So I pulled HD0 out of the cage and shook it."

    Can I just check we understood this. did you really take out the HDD from your server and shake it like a box of sweets?

    No offence mate but I think you should do what dell say and stop your own diagnostic techniques.

    Get a new HDD and get this on raid1 asap. raid 0 on a server is ridiculous and is highly likely to fail again.
    I'd also recommend updating the firmware on the Raid/Disk controller as these are known to be issues for dells.

    LVL 2

    Author Comment

    That is good Qlemo
    LVL 67

    Expert Comment

    This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.

    Featured Post

    How to run any project with ease

    Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
    - Combine task lists, docs, spreadsheets, and chat in one
    - View and edit from mobile/offline
    - Cut down on emails

    Join & Write a Comment

    Suggested Solutions

    INTRODUCTION The purpose of this document is to demonstrate the Installation and configuration, of the HP EVA 4400 SAN Storage. The name , IP and the WWN ID’s used here are not the real ones. ABOUT THE STORAGE For most of you reading this, you …
    Data center, now-a-days, is referred as the home of all the advanced technologies. In-fact, most of the businesses are now establishing their entire organizational structure around the IT capabilities.
    In this sixth video of the Xpdf series, we discuss and demonstrate the PDFtoPNG utility, which converts a multi-page PDF file to separate color, grayscale, or monochrome PNG files, creating one PNG file for each page in the PDF. It does this via a c…
    Internet Business Fax to Email Made Easy - With eFax Corporate (, you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…

    745 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    14 Experts available now in Live!

    Get 1:1 Help Now