Link to home
Start Free TrialLog in
Avatar of zkaiserm
zkaiserm

asked on

Operating system crash

What are the most important things we need to troubleshoot when a server goes down or a system crashes? What do we need to do to bring it backup? Consider Redhat Linux or hp-ux
SOLUTION
Avatar of shivam099
shivam099

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
> a server goes down
It does not
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Panjandrum
Panjandrum

Log files to check after panic

/etc/shutdownlog
/var/adm/syslog/OLDsyslog.log - look for errors
/var/adm/crash/crash.X/INDEX - panic string is on the line begining with panic
/var/tombstones/ts99 - copy of chasis codes

Reboot after panic: , isr.ior (notice the space around the ",")
This is usually a hardware problem but check the ts99 to make sure valid data is present
grep -i timestamp /var/tombstones/ts99
If the timestamp is current then hardware needs to analyze the ts99 file
If No valid timestamp comes back then software needs to analyze the dump

Reboot after panic: HPMC
HPMC (High Priority Machine Check) - Hardware cause
One thing to remember on a HP-UX system/hardware

If the system is powered on and it's passes it's selftest the problem is always related to Software.
If the system doesn't pass it selftest you need to go to minimal configuration if you cannot determine the exact cause of the problem.
So if you have a 4 CPU system, remove all but one. Same with memory, I/O cards, powersupplies, etc...

HW problems on a HP-UX server are logged in the GSP/MP ==> Ctrl+B to access it and then check the entrances with show logs.
If you cannot determine the cause of this, HP will definately ask for this information to troubleshoot.

The question is more like a general question with no right/wrong answer but it doesn't botter me at all.

It's just a way of working, and if Kaiser wants to know what the most effectice way is for troubleshooting a crashed system or bringing it back online then he is entitled for an answer, especially when he pays for it, homework or not.

That's my opinion.
Avatar of zkaiserm

ASKER

Panjandrum,
   Thanks for your support. I hope others do understand that these are not my home work questions. These are real "real" time questions. I hope this wont happen with any others who come to this site in the quest for knowledge.


Thanks,
Kaiser