?
Solved

PE2900 hangs in POST at DRAC

Posted on 2014-01-14
16
Medium Priority
?
393 Views
Last Modified: 2014-01-20
I have a customer that is having boot problem during POST.  It will advance through POST until it attempts to make a connection through the DRAC (see attached file).  Can someone tell me what steps I could have her take to troubleshoot the issue?
kali---boot-issues.bmp
0
Comment
Question by:PhillyGee
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
  • 4
16 Comments
 
LVL 33

Accepted Solution

by:
PowerEdgeTech earned 1200 total points
ID: 39780489
Actually, the 5-second CTRL-E prompt signals the end of the POST sequence.  After 5 seconds, the system begins loading the boot devices, so the fact that the system stops here indicates the OS is not loading properly.  Assuming you have no amber LCD error message and no failed drives or RAID arrays or other storage problems, I would attempt a repair of your OS at this point.  Which OS are you running?
0
 
LVL 2

Assisted Solution

by:Christopher Reed
Christopher Reed earned 300 total points
ID: 39780520
My suggestion, in line with PowerEdgeTech, would be to do a simple boot to CD to make sure that the system can in fact boot at all.

I'm new to PowerEdge servers, but could it be a possibility that the net address for the DRAC is causing some type of hang issue?

Just my two cents, best of luck.
-Chris
0
 
LVL 33

Expert Comment

by:PowerEdgeTech
ID: 39780537
I'm not saying it's impossible, but I've never seen it hang at this spot where it was anything but a damaged OS (there are messages that are normally seen when the DRAC is malfunctioning).  I think it's much more likely to be the OS than the server.
0
NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

 
LVL 2

Expert Comment

by:Christopher Reed
ID: 39780563
I appreciate the feedback.  Given your name, I think you might know a little more about PE servers than myself.  :-)

For my own knowledge, shouldn't the server throw back some type of error saying that it can't find an OS or something similar?  Is this "hang" somewhat of a normal thing?
0
 
LVL 33

Expert Comment

by:PowerEdgeTech
ID: 39780586
"shouldn't the server throw back some type of error saying that it can't find an OS or something similar"

Well, the server doesn't have control at this point.  It has told the "hard drive" to boot, and the drive is trying - spinning its wheels trying to find/read/load boot files, unable to get far enough to have even splashed an image on the screen.  The OS is hung doing its thing, and as far as the server is concerned, it is up and running on it - it could be chugging along authenticating users, server web pages, etc., as far as server management (BMC) knows.

If the boot device didn't exist (failed RAID array, controller is disabled, etc.), then the server would have said "no boot device found" after attempting all boot devices unsuccessfully.

Again, I can't promise that is what is going on in this particular situation, but I've seen it happen MANY times and believe it is the most likely scenario at this point.
0
 
LVL 2

Expert Comment

by:Christopher Reed
ID: 39780614
PhillyGee,

Sorry to hijack your question, I hope that you get your "hang" issue fixed.  Please let us know the solution.

PowerEdgeTech,

Thank you for the quick PE server lesson.

-Chris
0
 
LVL 33

Expert Comment

by:PowerEdgeTech
ID: 39780620
Hopefully it helps pg too ;)
0
 

Author Comment

by:PhillyGee
ID: 39780803
Thanks for all the feedback, guys.  She's now sending me more information a little at a time.
- a failed 400GB SAS hard drive in (what looks like) a four disk RAID5 array
- a failed cache battery.
0
 
LVL 33

Expert Comment

by:PowerEdgeTech
ID: 39780824
- a failed 400GB SAS hard drive in (what looks like) a four disk RAID5 array

OS corruption could have occurred from the failed disk (errors on remaining disks, etc.).

- a failed cache battery.

OS corruption could have occurred from the write cache data being lost before being committed to disk when important files were in a critical state.

Barring data corruption, RAID 5 should still be operational with a single failed disk.
0
 
LVL 2

Expert Comment

by:Christopher Reed
ID: 39780826
If there IS data corruption, is all lost so that even RAID rebuild can't fix it?
0
 
LVL 33

Expert Comment

by:PowerEdgeTech
ID: 39780861
Depends on the extent of the damage ... it may "only" be OS corruption, in which case, it may be repairable (Recovery Console - chkdsk /r, fixmbr, fixboot; and other utilities such as SFC).  If it is array corruption, there is less that can be done ... a rebuild might be successful and may work normally, or the rebuild may fail.  With only a single disk remaining, if there are any errors in the data on the remaining disk then it will probably just fail the rebuild trying to read it.
0
 

Author Comment

by:PhillyGee
ID: 39789267
Thank you all for the help.  This issue is still outstanding.  The customer is running RH Linux (something I know nothing about). After trying a number of things the customer sought help from RH who booted to rescue mode and re-installed grub to the MBR. After rebooting, the server was still not getting to stage 1 of grub.  They didn't think it was a OS problem.
0
 

Author Comment

by:PhillyGee
ID: 39789291
Now I am told that the four drives are not in a RAID 5 but two mirrors.
0
 

Author Comment

by:PhillyGee
ID: 39794649
PowerEdgeTech, you're going to love this one. A brilliant colleague of mine came up with the solution.
Problem was a controller setting was changed in the BIOS plus an alert setting was set to Disable.
There are two PERCs in this box - a PERC6i (boot controller) and a PERC5e (to a PV MD1000 disk array).
After a 400GB hard drive and cache battery failed on the PERC6i controller somehow the PERC5e BIOS setting set itself to "Enable" causing a conflict as the boot PERC BIOS was also (properly) set to "Enable". On top of that, the “Enable BIOS Stop On Error” was disabled on both array controllers, so no error was ever reported.
The BOIS settings were corrected, both the hard drive and cache battery have been replaced and everything is now working smoothly.

I don't know what it is with PowerEdge BIOS. I've seen it where the BIOS would spontaneously switch the embedded RAID controller setting in "Integrated Devices" from RAID Enabled to SCSI Enabled but this one is a new one for me.

Thank you both for your input.
0
 
LVL 2

Expert Comment

by:Christopher Reed
ID: 39794960
I didn't help much, but thank you for the assist points.  I'm just glad you were able to get it figured out.

Hat's off to your colleague for figuring it out.
0
 
LVL 33

Expert Comment

by:PowerEdgeTech
ID: 39795108
This is a new one for me too, however, I do not often use external storage devices that would connect via an (E)xternal PERC.  But as for:

I don't know what it is with PowerEdge BIOS. I've seen it where the BIOS would spontaneously switch the embedded RAID controller setting in "Integrated Devices" from RAID Enabled to SCSI Enabled

I've never seen this happen "spontaneously" ... there is usually something to precipitate it - CMOS battery failure, power event, BIOS update, hardware failure, etc.  The BIOS default for Embedded RAID is OFF or SCSI Enabled (also keep in mind, this is only for older systems, like the 26x0/28x0), so anytime the BIOS configuration stored in NVRAM is cleared or corrupted, the setting will return to its default.

In any case, I'm glad you figured it out.
0

Featured Post

Veeam Task Manager for Hyper-V

Task Manager for Hyper-V provides critical information that allows you to monitor Hyper-V performance by displaying real-time views of CPU and memory at the individual VM-level, so you can quickly identify which VMs are using host resources.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Learn about cloud computing and its benefits for small business owners.
Arrow Electronics was searching for a KVM  (Keyboard/Video/Mouse) switch that could display on one single monitor the current status of all units being tested on the rack.
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
Suggested Courses
Course of the Month15 days, 5 hours left to enroll

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question