zkaiserm
asked on
Operating system crash
What are the most important things we need to troubleshoot when a server goes down or a system crashes? What do we need to do to bring it backup? Consider Redhat Linux or hp-ux
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Log files to check after panic
/etc/shutdownlog
/var/adm/syslog/OLDsyslog. log - look for errors
/var/adm/crash/crash.X/IND EX - panic string is on the line begining with panic
/var/tombstones/ts99 - copy of chasis codes
Reboot after panic: , isr.ior (notice the space around the ",")
This is usually a hardware problem but check the ts99 to make sure valid data is present
grep -i timestamp /var/tombstones/ts99
If the timestamp is current then hardware needs to analyze the ts99 file
If No valid timestamp comes back then software needs to analyze the dump
Reboot after panic: HPMC
HPMC (High Priority Machine Check) - Hardware cause
/etc/shutdownlog
/var/adm/syslog/OLDsyslog.
/var/adm/crash/crash.X/IND
/var/tombstones/ts99 - copy of chasis codes
Reboot after panic: , isr.ior (notice the space around the ",")
This is usually a hardware problem but check the ts99 to make sure valid data is present
grep -i timestamp /var/tombstones/ts99
If the timestamp is current then hardware needs to analyze the ts99 file
If No valid timestamp comes back then software needs to analyze the dump
Reboot after panic: HPMC
HPMC (High Priority Machine Check) - Hardware cause
One thing to remember on a HP-UX system/hardware
If the system is powered on and it's passes it's selftest the problem is always related to Software.
If the system doesn't pass it selftest you need to go to minimal configuration if you cannot determine the exact cause of the problem.
So if you have a 4 CPU system, remove all but one. Same with memory, I/O cards, powersupplies, etc...
HW problems on a HP-UX server are logged in the GSP/MP ==> Ctrl+B to access it and then check the entrances with show logs.
If you cannot determine the cause of this, HP will definately ask for this information to troubleshoot.
If the system is powered on and it's passes it's selftest the problem is always related to Software.
If the system doesn't pass it selftest you need to go to minimal configuration if you cannot determine the exact cause of the problem.
So if you have a 4 CPU system, remove all but one. Same with memory, I/O cards, powersupplies, etc...
HW problems on a HP-UX server are logged in the GSP/MP ==> Ctrl+B to access it and then check the entrances with show logs.
If you cannot determine the cause of this, HP will definately ask for this information to troubleshoot.
The question is more like a general question with no right/wrong answer but it doesn't botter me at all.
It's just a way of working, and if Kaiser wants to know what the most effectice way is for troubleshooting a crashed system or bringing it back online then he is entitled for an answer, especially when he pays for it, homework or not.
That's my opinion.
It's just a way of working, and if Kaiser wants to know what the most effectice way is for troubleshooting a crashed system or bringing it back online then he is entitled for an answer, especially when he pays for it, homework or not.
That's my opinion.
ASKER
Panjandrum,
Thanks for your support. I hope others do understand that these are not my home work questions. These are real "real" time questions. I hope this wont happen with any others who come to this site in the quest for knowledge.
Thanks,
Kaiser
Thanks for your support. I hope others do understand that these are not my home work questions. These are real "real" time questions. I hope this wont happen with any others who come to this site in the quest for knowledge.
Thanks,
Kaiser
It does not