?
Solved

ESXi PSoD Exception 14

Posted on 2014-10-25
9
Medium Priority
?
653 Views
Last Modified: 2014-10-30
Hi All,

Can anyone here please assist me in troubleshooting the problem in random PSoD that affects one of my HP Blade server running ESXi 5.1 as per below screenshot:

PSoD
I'm not sure what else to do to begin troubleshooting this problem ?

Thanks
0
Comment
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
9 Comments
 
LVL 123

Accepted Solution

by:
Andrew Hancock (VMware vExpert / EE MVE^2) earned 2000 total points
ID: 40403925
Most PSODs are caused by a hardware issue.

This could be non compatible hardware, which is not on the HCL.

Check the VMware Hardware Compatability Lists HCL here

The VMware Hardware Compatibility List is the detailed lists showing actual vendor devices that are either physically tested or are similar to the devices tested by VMware or VMware partners. Items on the list are tested with VMware products and are known to operate correctly.Devices which are not on the list may function, but will not be supported by VMware.

http://www.vmware.com/go/hcl

So first checks...

1. What is the HP Blade Server, that's rather generic ? Is your hardware on the HCL ?

2. is your hardware up to date with Firmware, for BIOS, Storage, Network Controllers ?

3. Are you using the OEM HP version of ESXi 5.1 ?

4. Have you checked the memory is seated correctly?

5. Have you checked fans, CPU heatsinks, and fans?

6. Have you tested memory using memtest86+

7. If you have a support contract with HP, log a support request.

8. If you have a support contract with VMware, log a support request.

9. Random faults are difficult to track down.....how many VMs were running at time of crash?

10. Look back at your change database, what changes have been made to the server and environment.

11. Do you have a syslog server, or persistant storage of logs, so you can check back and have a look at the logs /var/logs/vmkernel.log, to check for any errors before the PSOD ?

12. Build version of ESXi - is it the latest?

13. Track down the World ID and the VM?

14. Is that VM OS supported for ESXi 5.1 ?

15. Is the network card in the VM VMXNET3 or E1000, there have been issues with builds of ESXi and VM nics, causing PSOD, e.g. bug in ESXi!

16. Supported CPU microcode, and are both CPUs the same.

17. Memory installed in correct banks.

18. Certified memory installed.

These are the troubleshooting steps you need to start performing.

There is not really a simple answer, of AH the PSOD is caused by that!

We've had issues where servers have been stable for years, and when we started to load them, and more VMs were on them, they used more memory, and we had a memory fault at the TOP of RAM on a server, at 496GB ish, and when the server was heavily loaded with VMs, and used that "memory module" the server would PSOD!
0
 
LVL 8

Author Comment

by:Senior IT System Engineer
ID: 40403927
ok, so in this case what log should I gather and analyze for the root cause analysis ?
0
 
LVL 123
ID: 40404043
I've listed the log in my post!

It may not reveal anything, but it worth a look, I'm also waiting for answers to the questions in my post.
0
Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

 
LVL 8

Author Comment

by:Senior IT System Engineer
ID: 40405803
SOmehow when I log the case to HP, they recommends me to update the iLO v4 firmware from the existing v1.4.0 into v2.02 (http://h20566.www2.hp.com/portal/site/hpsc/template.PAGE/public/psi/swdDetails/?sp4ts.oid=5228286&spf_p.tpst=swdMain&spf_p.prp_swdMain=wsrp-navigationalState%3Didx%253D2%257CswItem%253DMTX_8372c55483b9432abd53d91951%257CswEnvOID%253D4115%257CitemLocale%253D%257CswLang%253D%257Cmode%253D4%257Caction%253DdriverDocument&javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken)

since this is a well known issue according to them... that's rather strange, because how come I can see the one particular VMname in there not the ESXi host ?
0
 
LVL 123

Assisted Solution

by:Andrew Hancock (VMware vExpert / EE MVE^2)
Andrew Hancock (VMware vExpert / EE MVE^2) earned 2000 total points
ID: 40405812
This was Bullet Point 2 in my post, Update and Check Firmware!
1
 
LVL 8

Author Closing Comment

by:Senior IT System Engineer
ID: 40414816
Thanks !
0
 
LVL 8

Author Comment

by:Senior IT System Engineer
ID: 40414818
So in this case why the PSOD shows the VM name ? not the actual host name.

is there something happened caused by that particular VM ?
0
 
LVL 123
ID: 40414824
It's possible we've seen VMs running unsupported OS, or network interfaces, or using defective memory cause PSODs.

Is it always this vm?
0
 
LVL 8

Author Comment

by:Senior IT System Engineer
ID: 40415050
No it is not always. but just curious as to why that VM name is displayed on the PSoD.

next time when the crashed happened i'll get some more information and post it in here.

My manager doesn't like the idea of upgrading the firmware for all of the Blade components for the time being, unless it is a must to upgrade from ESXi 5.1u1 into ESXi 5.5 and above.
0

Featured Post

Limited time offer using promo code EXPERTS30

Designed with a wealth of functionality and convenience, ATEN's new Thunderbolt™ 2 Sharing Switch takes your Thunderbolt setup to the next level. Now through September 15, 2017, Experts Exchange members get 30% off the US7220 on the ATEN USA eShop using promo code EXPERTS30.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this step by step tutorial with screenshots, we will show you HOW TO: Enable SSH Remote Access on a VMware vSphere Hypervisor 6.5 (ESXi 6.5). This is important if you need to enable SSH remote access for additional troubleshooting of the ESXi hos…
Giving access to ESXi shell console is always an issue for IT departments to other Teams, or Projects. We need to find a way so that teams can use ESXTOP for their POCs, or tests without giving them the access to ESXi host shell console with a root …
Michael from AdRem Software outlines event notifications and Automatic Corrective Actions in network monitoring. Automatic Corrective Actions are scripts, which can automatically run upon discovery of a certain undesirable condition in your network.…
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …
Suggested Courses

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question