Solved

PE2900 hangs in POST at DRAC

Posted on 2014-01-14
16
383 Views
Last Modified: 2014-01-20
I have a customer that is having boot problem during POST.  It will advance through POST until it attempts to make a connection through the DRAC (see attached file).  Can someone tell me what steps I could have her take to troubleshoot the issue?
kali---boot-issues.bmp
0
Comment
Question by:PhillyGee
  • 7
  • 5
  • 4
16 Comments
 
LVL 32

Accepted Solution

by:
PowerEdgeTech earned 400 total points
ID: 39780489
Actually, the 5-second CTRL-E prompt signals the end of the POST sequence.  After 5 seconds, the system begins loading the boot devices, so the fact that the system stops here indicates the OS is not loading properly.  Assuming you have no amber LCD error message and no failed drives or RAID arrays or other storage problems, I would attempt a repair of your OS at this point.  Which OS are you running?
0
 
LVL 2

Assisted Solution

by:Christopher Reed
Christopher Reed earned 100 total points
ID: 39780520
My suggestion, in line with PowerEdgeTech, would be to do a simple boot to CD to make sure that the system can in fact boot at all.

I'm new to PowerEdge servers, but could it be a possibility that the net address for the DRAC is causing some type of hang issue?

Just my two cents, best of luck.
-Chris
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 39780537
I'm not saying it's impossible, but I've never seen it hang at this spot where it was anything but a damaged OS (there are messages that are normally seen when the DRAC is malfunctioning).  I think it's much more likely to be the OS than the server.
0
 
LVL 2

Expert Comment

by:Christopher Reed
ID: 39780563
I appreciate the feedback.  Given your name, I think you might know a little more about PE servers than myself.  :-)

For my own knowledge, shouldn't the server throw back some type of error saying that it can't find an OS or something similar?  Is this "hang" somewhat of a normal thing?
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 39780586
"shouldn't the server throw back some type of error saying that it can't find an OS or something similar"

Well, the server doesn't have control at this point.  It has told the "hard drive" to boot, and the drive is trying - spinning its wheels trying to find/read/load boot files, unable to get far enough to have even splashed an image on the screen.  The OS is hung doing its thing, and as far as the server is concerned, it is up and running on it - it could be chugging along authenticating users, server web pages, etc., as far as server management (BMC) knows.

If the boot device didn't exist (failed RAID array, controller is disabled, etc.), then the server would have said "no boot device found" after attempting all boot devices unsuccessfully.

Again, I can't promise that is what is going on in this particular situation, but I've seen it happen MANY times and believe it is the most likely scenario at this point.
0
 
LVL 2

Expert Comment

by:Christopher Reed
ID: 39780614
PhillyGee,

Sorry to hijack your question, I hope that you get your "hang" issue fixed.  Please let us know the solution.

PowerEdgeTech,

Thank you for the quick PE server lesson.

-Chris
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 39780620
Hopefully it helps pg too ;)
0
 

Author Comment

by:PhillyGee
ID: 39780803
Thanks for all the feedback, guys.  She's now sending me more information a little at a time.
- a failed 400GB SAS hard drive in (what looks like) a four disk RAID5 array
- a failed cache battery.
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 39780824
- a failed 400GB SAS hard drive in (what looks like) a four disk RAID5 array

OS corruption could have occurred from the failed disk (errors on remaining disks, etc.).

- a failed cache battery.

OS corruption could have occurred from the write cache data being lost before being committed to disk when important files were in a critical state.

Barring data corruption, RAID 5 should still be operational with a single failed disk.
0
 
LVL 2

Expert Comment

by:Christopher Reed
ID: 39780826
If there IS data corruption, is all lost so that even RAID rebuild can't fix it?
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 39780861
Depends on the extent of the damage ... it may "only" be OS corruption, in which case, it may be repairable (Recovery Console - chkdsk /r, fixmbr, fixboot; and other utilities such as SFC).  If it is array corruption, there is less that can be done ... a rebuild might be successful and may work normally, or the rebuild may fail.  With only a single disk remaining, if there are any errors in the data on the remaining disk then it will probably just fail the rebuild trying to read it.
0
 

Author Comment

by:PhillyGee
ID: 39789267
Thank you all for the help.  This issue is still outstanding.  The customer is running RH Linux (something I know nothing about). After trying a number of things the customer sought help from RH who booted to rescue mode and re-installed grub to the MBR. After rebooting, the server was still not getting to stage 1 of grub.  They didn't think it was a OS problem.
0
 

Author Comment

by:PhillyGee
ID: 39789291
Now I am told that the four drives are not in a RAID 5 but two mirrors.
0
 

Author Comment

by:PhillyGee
ID: 39794649
PowerEdgeTech, you're going to love this one. A brilliant colleague of mine came up with the solution.
Problem was a controller setting was changed in the BIOS plus an alert setting was set to Disable.
There are two PERCs in this box - a PERC6i (boot controller) and a PERC5e (to a PV MD1000 disk array).
After a 400GB hard drive and cache battery failed on the PERC6i controller somehow the PERC5e BIOS setting set itself to "Enable" causing a conflict as the boot PERC BIOS was also (properly) set to "Enable". On top of that, the “Enable BIOS Stop On Error” was disabled on both array controllers, so no error was ever reported.
The BOIS settings were corrected, both the hard drive and cache battery have been replaced and everything is now working smoothly.

I don't know what it is with PowerEdge BIOS. I've seen it where the BIOS would spontaneously switch the embedded RAID controller setting in "Integrated Devices" from RAID Enabled to SCSI Enabled but this one is a new one for me.

Thank you both for your input.
0
 
LVL 2

Expert Comment

by:Christopher Reed
ID: 39794960
I didn't help much, but thank you for the assist points.  I'm just glad you were able to get it figured out.

Hat's off to your colleague for figuring it out.
0
 
LVL 32

Expert Comment

by:PowerEdgeTech
ID: 39795108
This is a new one for me too, however, I do not often use external storage devices that would connect via an (E)xternal PERC.  But as for:

I don't know what it is with PowerEdge BIOS. I've seen it where the BIOS would spontaneously switch the embedded RAID controller setting in "Integrated Devices" from RAID Enabled to SCSI Enabled

I've never seen this happen "spontaneously" ... there is usually something to precipitate it - CMOS battery failure, power event, BIOS update, hardware failure, etc.  The BIOS default for Embedded RAID is OFF or SCSI Enabled (also keep in mind, this is only for older systems, like the 26x0/28x0), so anytime the BIOS configuration stored in NVRAM is cleared or corrupted, the setting will return to its default.

In any case, I'm glad you figured it out.
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Hi there, This article summarizes what you need if you are going to set up your home or small business Network Attached Storage (NAS) to be accessible from the internet. Of course there are configuration differences based on your NAS or router ma…
this article is a guided solution for most of the common server issues in server hardware tasks we are facing in our routine job works. the topics in the following article covered are, 1) dell hardware raidlevel (Perc) 2) adding HDD 3) how t…
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

25 Experts available now in Live!

Get 1:1 Help Now