• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 397
  • Last Modified:

PE2900 hangs in POST at DRAC

I have a customer that is having boot problem during POST.  It will advance through POST until it attempts to make a connection through the DRAC (see attached file).  Can someone tell me what steps I could have her take to troubleshoot the issue?
kali---boot-issues.bmp
0
PhillyGee
Asked:
PhillyGee
  • 7
  • 5
  • 4
2 Solutions
 
PowerEdgeTechIT ConsultantCommented:
Actually, the 5-second CTRL-E prompt signals the end of the POST sequence.  After 5 seconds, the system begins loading the boot devices, so the fact that the system stops here indicates the OS is not loading properly.  Assuming you have no amber LCD error message and no failed drives or RAID arrays or other storage problems, I would attempt a repair of your OS at this point.  Which OS are you running?
0
 
Christopher ReedLevel 2 Software Support EngineerCommented:
My suggestion, in line with PowerEdgeTech, would be to do a simple boot to CD to make sure that the system can in fact boot at all.

I'm new to PowerEdge servers, but could it be a possibility that the net address for the DRAC is causing some type of hang issue?

Just my two cents, best of luck.
-Chris
0
 
PowerEdgeTechIT ConsultantCommented:
I'm not saying it's impossible, but I've never seen it hang at this spot where it was anything but a damaged OS (there are messages that are normally seen when the DRAC is malfunctioning).  I think it's much more likely to be the OS than the server.
0
SMB Security Just Got a Layer Stronger

WatchGuard acquires Percipient Networks to extend protection to the DNS layer, further increasing the value of Total Security Suite.  Learn more about what this means for you and how you can improve your security with WatchGuard today!

 
Christopher ReedLevel 2 Software Support EngineerCommented:
I appreciate the feedback.  Given your name, I think you might know a little more about PE servers than myself.  :-)

For my own knowledge, shouldn't the server throw back some type of error saying that it can't find an OS or something similar?  Is this "hang" somewhat of a normal thing?
0
 
PowerEdgeTechIT ConsultantCommented:
"shouldn't the server throw back some type of error saying that it can't find an OS or something similar"

Well, the server doesn't have control at this point.  It has told the "hard drive" to boot, and the drive is trying - spinning its wheels trying to find/read/load boot files, unable to get far enough to have even splashed an image on the screen.  The OS is hung doing its thing, and as far as the server is concerned, it is up and running on it - it could be chugging along authenticating users, server web pages, etc., as far as server management (BMC) knows.

If the boot device didn't exist (failed RAID array, controller is disabled, etc.), then the server would have said "no boot device found" after attempting all boot devices unsuccessfully.

Again, I can't promise that is what is going on in this particular situation, but I've seen it happen MANY times and believe it is the most likely scenario at this point.
0
 
Christopher ReedLevel 2 Software Support EngineerCommented:
PhillyGee,

Sorry to hijack your question, I hope that you get your "hang" issue fixed.  Please let us know the solution.

PowerEdgeTech,

Thank you for the quick PE server lesson.

-Chris
0
 
PowerEdgeTechIT ConsultantCommented:
Hopefully it helps pg too ;)
0
 
PhillyGeeAuthor Commented:
Thanks for all the feedback, guys.  She's now sending me more information a little at a time.
- a failed 400GB SAS hard drive in (what looks like) a four disk RAID5 array
- a failed cache battery.
0
 
PowerEdgeTechIT ConsultantCommented:
- a failed 400GB SAS hard drive in (what looks like) a four disk RAID5 array

OS corruption could have occurred from the failed disk (errors on remaining disks, etc.).

- a failed cache battery.

OS corruption could have occurred from the write cache data being lost before being committed to disk when important files were in a critical state.

Barring data corruption, RAID 5 should still be operational with a single failed disk.
0
 
Christopher ReedLevel 2 Software Support EngineerCommented:
If there IS data corruption, is all lost so that even RAID rebuild can't fix it?
0
 
PowerEdgeTechIT ConsultantCommented:
Depends on the extent of the damage ... it may "only" be OS corruption, in which case, it may be repairable (Recovery Console - chkdsk /r, fixmbr, fixboot; and other utilities such as SFC).  If it is array corruption, there is less that can be done ... a rebuild might be successful and may work normally, or the rebuild may fail.  With only a single disk remaining, if there are any errors in the data on the remaining disk then it will probably just fail the rebuild trying to read it.
0
 
PhillyGeeAuthor Commented:
Thank you all for the help.  This issue is still outstanding.  The customer is running RH Linux (something I know nothing about). After trying a number of things the customer sought help from RH who booted to rescue mode and re-installed grub to the MBR. After rebooting, the server was still not getting to stage 1 of grub.  They didn't think it was a OS problem.
0
 
PhillyGeeAuthor Commented:
Now I am told that the four drives are not in a RAID 5 but two mirrors.
0
 
PhillyGeeAuthor Commented:
PowerEdgeTech, you're going to love this one. A brilliant colleague of mine came up with the solution.
Problem was a controller setting was changed in the BIOS plus an alert setting was set to Disable.
There are two PERCs in this box - a PERC6i (boot controller) and a PERC5e (to a PV MD1000 disk array).
After a 400GB hard drive and cache battery failed on the PERC6i controller somehow the PERC5e BIOS setting set itself to "Enable" causing a conflict as the boot PERC BIOS was also (properly) set to "Enable". On top of that, the “Enable BIOS Stop On Error” was disabled on both array controllers, so no error was ever reported.
The BOIS settings were corrected, both the hard drive and cache battery have been replaced and everything is now working smoothly.

I don't know what it is with PowerEdge BIOS. I've seen it where the BIOS would spontaneously switch the embedded RAID controller setting in "Integrated Devices" from RAID Enabled to SCSI Enabled but this one is a new one for me.

Thank you both for your input.
0
 
Christopher ReedLevel 2 Software Support EngineerCommented:
I didn't help much, but thank you for the assist points.  I'm just glad you were able to get it figured out.

Hat's off to your colleague for figuring it out.
0
 
PowerEdgeTechIT ConsultantCommented:
This is a new one for me too, however, I do not often use external storage devices that would connect via an (E)xternal PERC.  But as for:

I don't know what it is with PowerEdge BIOS. I've seen it where the BIOS would spontaneously switch the embedded RAID controller setting in "Integrated Devices" from RAID Enabled to SCSI Enabled

I've never seen this happen "spontaneously" ... there is usually something to precipitate it - CMOS battery failure, power event, BIOS update, hardware failure, etc.  The BIOS default for Embedded RAID is OFF or SCSI Enabled (also keep in mind, this is only for older systems, like the 26x0/28x0), so anytime the BIOS configuration stored in NVRAM is cleared or corrupted, the setting will return to its default.

In any case, I'm glad you figured it out.
0

Featured Post

SMB Security Just Got a Layer Stronger

WatchGuard acquires Percipient Networks to extend protection to the DNS layer, further increasing the value of Total Security Suite.  Learn more about what this means for you and how you can improve your security with WatchGuard today!

  • 7
  • 5
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now