Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3142
  • Last Modified:

Rs/6000 stuck in reboot loop. Won't enter into AIX. Last error displayed is: E051

Hello all,

One of my RS/6000 (44P Model 170) machines is stuck in a reboot loop. Nothing was changed. I arrived in the morning to start up the machine and it arrives to the IBM bootup screen and the last thing displayed is:

Starting software...
Boot device /pci@fef00000/scsi@c/sd@4,0
Closing stdin and stout....

Then it reboots. The last things on the error display is: E051

Does anyone know what's wrong?

Thanks,

Rob
0
robbo007
Asked:
robbo007
  • 19
  • 15
1 Solution
 
gheistCommented:
Service guide at:
http://publib16.boulder.ibm.com/pseries/en_US/infocenter/base/hardware_docs/pdf/380560.pdf

tells in page 70  that
E051 Reading Processor VPD - replace processor card.

I tell that dust has settled in case, so read this service guide to remove cards, and place them back in same places. Digital camera is handy here ( or call IBM service techie to help you out), most likely - remove only CPU card, maybe clean dust and place it back as first effort.

be very careful to avoid static discharge, and to have top quality screwdriver, if you do yourself

VPD - Vital Product Data
FRU - field Replaceable Unit

bad it is too late for backups, avoid altering scsi wiring to prevent boot problems after fixing
0
 
robbo007Author Commented:
Niceone. I'll check it out.... I'll let you know how it goes. Thanks,

Rob
0
 
gheistCommented:
To keep your hardware in top shape run diag -a at root prompt right after system boots up and clear all errors, and then get into track of new errors.
Then upgrade system microcodes ( lscfg -vp | grep lterable # for list)
Then patch your AIX to latest maintenance levels

Now your hardware and operating system needs attention only twice a year or so, so you can fix application and user education problems, and have planned interruptions twice a year.... (which is reasonably high availability in 99.9% of cases)
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
gheistCommented:
You have quite old machine, so run diag, and schedule all regular diag procedures (at different hours, to know what failes if anything crashed) , so you are first to know of hardware faults.

You can searc AIX in UNIX TA, and look around this TA to find more interesting information on maintenance of your system
0
 
gheistCommented:
One more - always run ECC scrubber, for same reason...(smitty chgsys)
0
 
robbo007Author Commented:
Ok.

I have removed the CPU card and cleaned it. It had quiet a lot of dust on it. Replaced it and the error still shows and then reboots automatically. I have tried entering in the F1 but can't see any errors in the log nor can I see anything that can help me.

How do I know for certain that the CPU card is stuffed? I gather these things costs a loads and I don't want to raise the purcharse order and find its not the root of the problem?

Any more ideas? I have a digital camera.

Rob
0
 
robbo007Author Commented:
The problem is I can't get into the OS. I only get to the screen where it displayes the IBM logo and the icons for the hardware. Then it reboots automatically.

I guess I can only run the diag once AIX is loaded and I have the command prompt?

Rob

0
 
gheistCommented:
next errors read mainboard data, so it might be fried ( or empty battery [tm] )
0
 
gheistCommented:
These are service processor, not aix errors.
It shows that it has to load from that device, and then starts to account all hardware.

Switch off
Press Space on keyboard.
Use verbose boot to get these errors on screen.

0
 
robbo007Author Commented:
Sorry for the stupid question. How do I get it to boot with verbose messages?

Thanks,

Rob
0
 
gheistCommented:
You do this when computer is powered and switched off.
Screen is blank but service processor works
You can enter service processor menus, then 4- Power igues 2 - verbose boot or so

...
0
 
robbo007Author Commented:
Right, now I am completely lost. Power is off but the power cable is plugged into the IBM. I have the green light on the front display screen flashing and the display says "OK".

I try pressing the Space bar and keying a few numbers and the F keys. Nothing.

I am going to consult the manual but if you could explain it a little more that would be most appreciated. I suspect the CPU card is stuffed but I would like to double check.

Cheers,

Rob
0
 
robbo007Author Commented:
Right after reading that the hell is the Service Processor Menu I now understand. Sorry for my ignorance before. :-P

But it won't let me into the menu. I have tried "Enter" twice, "Space" twice etc and nothing. Could the hardware resposible for this be damaged too?

Rob
0
 
gheistCommented:
Do you have graphical or the text console ???
0
 
robbo007Author Commented:
The console that is displayed at boot time is graphical white background, with the icons of the keyboard, sound card etc and the IBM logo in the top lefthand corner.

Rob
0
 
gheistCommented:
Service processor menus description starts at page 115.
"Accessing service processor menus locally"
(you must use InfoWindow....)

Download referenced documents from publib.boulder.ibm.com too for even more documentation units

I do not have computer like yours, cannot help much.

Main idea is use service processor to find out what is wrong ( it will play all 50xx numbers on console, one after another, in sequence like on operator panel), and fix all to the stage when OS diagnostics can be used to figure out problem further.

I suggested [space] , since this is "redraw" for service processor menue.

To get system powered off with service processor active disconnect it from mains, wait minute, connect wait another and start hitting keyboard.
0
 
gheistCommented:
-  wait another and start hitting

+ wait for OK to appear...
0
 
gheistCommented:
http://www.faqs.org/faqs/aix-faq/

look for serial console cable to normal PC, since you do not have serial terminal device...
0
 
robbo007Author Commented:
Hello,

Finally I have managed to get into the Service Processor Menus. I am using a serial cable connected to another PC and using Hypertermial.

In the "System Information Menu", "Read Progress Indicators from Last System Boot" I have the following:


E075
E076
E075
E076
E075
E076
E075
E043
E042
E070
E031
E050
E021
E020
E051
E011
E012
E032
E030
E040
E010

Then when I rebooted I got this:

          E075
          E043
          E042
          E070
          E031
          E050
          E021
          E020
          E051
          E011
          E012
          E032
          E030
          E040
          E010

I looked under "Read Post Errors" and there are no problems. I looked under "Read Processor Error Logs" and there are no errors?

I gather if the CPU card was stuffed it would show up there? Apart from calling a IBM enginner is there anything else I can do?




0
 
gheistCommented:
E010 - Starting sp self-tests / replace the ststem board


... clean dust or change CMOS battery ...
0
 
robbo007Author Commented:
ok.

I gather that the last error is the error where its failing? Why did it display on the LCD panel E051?

I'll take a closer look at the CMOS battery.
0
 
gheistCommented:
E070 looks unnecessary, disable calout in SP menus (could be a sign of E031 ????)
0
 
robbo007Author Commented:
Doh!

I just changed the battery. Not it gets stuck at: BOOT TP s1,s2,s3 etc..

I am looking in the system information menu.

0
 
robbo007Author Commented:
How can I change it so it boot from the SCSI disks? ERROR: E175 which is set to BOOTIP. We don't boot via the network card. It boots via the SCSI disks.

Rob
0
 
gheistCommented:
press 8 on graphic or f8 on text console,
then type
boot> boot  /pci@fef00000/scsi@c/sd@4,0

:-)

just like PowerMAC ....
0
 
gheistCommented:
great it now boots past hw diags (tm)

0
 
robbo007Author Commented:
hehe. Great now it boots like before: Reboots on the E051 Error on the panel.

I can't see what else there is to do apart from changing hardware?



0
 
gheistCommented:
can you get openfirmware prompt via serial console ??? (sorry F8 on graphic, 8 on serial)
0
 
gheistCommented:
you need to look at service processor steps as they display on serial console, not on operator panel.
0
 
robbo007Author Commented:
Ok. I gather they are in order of boot process?

0
 
gheistCommented:
E010 is first message
E075 is entering menus
I guess this list is from entering menus, not from booting the system, so I suggest using serial console to serviceprocessor to boot system up and see messages
0
 
robbo007Author Commented:
Hello,

I have configure the console so its using a PC via a serial cable to bootup. Basically it does not show any more information. I have searched through the Service Processor Menus and find nothing that will allow me to boot in a verbose mode or view the messages at bootup.

Basically all the logs show no problems and everything is normal. Ther thing that I don't understand is shy does it reboot automatically?

If it was an error shouldn't it stop at freeze up with an error code?

Rob
0
 
gheistCommented:
no it does not stop
it shows these messages, and one +/-1 from this one displayed last is guilty....
0
 
robbo007Author Commented:
Closing,

Problem was a fautly CPU card. More expensive to replace the card than to buy a new Sun Blade Workstation. We are changing to Sun Solaris.

Rob
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 19
  • 15
Tackle projects and never again get stuck behind a technical roadblock.
Join Now