ESX host purple screens

We have more purple screens in our environment these days, we use vmware esxi 5.5 env. Most of the purple screens are due to driver issues and hardware failures from HP Proliant Gen 8 Servers. Anyone aware is there any common or known issues with these combination.
What do you think we can to ensure that vmware environment is stable?
GoodADAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Seth SimmonsSr. Systems AdministratorCommented:
are you using the HP ISO for ESX?
are all firmware updates applied?
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Purple or Pink screens of death are usually caused by hardware issues.

Has it done this more than once ?

Can you reproduce the issue?

Are you using the HP OEM version of ESXi 5.5 ?

I would ensure that you have updated ALL the firmware on the server to the latest by HP.

Are you using the VMXNET3, if you are not replace the E1000 interface with it in the VM...

I would swap the network interface to the VMXNET3, and make sure VMware Tools is installed to support it.

Check again, setup syslog for logging, and persistent logs.

Part 11: HOW TO: Suppress Configuration Issues System logs on host are stored on non-persistent storage

so in the event if a crash in the future, you can look at the logs before crash.

Check FANS, Heatsinks, CPUs, Memory is seated correctly.

Check and Run Memtest86+ to check memory.
0
GoodADAuthor Commented:
We have taken the history of all 2014 failures and the respective HP and Vmware support tickets, then analyzed the failures to understand the trends and most common issues to focus on.  During this process, we have segregated the issues into 4 different categories and created the action plan / best practices for each category:
1.      Hardware issues
2.      Driver issues
3.      VMware issues  
4.      Configuration issues

Analysed hardware issues, and escalated to vendor and got the best practices to be followed.

Driver issues:
      Created a script that checks the repository for any changes and notify for latest firmware and drivers
      review them
       HP SIM will be implemented to track the complete inventory of hardware and firmware versions, this will give us the overview of the environment and gives us the complete gap analysis between the environment and HP baselines. will decide the updates for deployment depending upon the gap analysis
      4 deployment cycles will be scheduled for 2015, and will decide what updates should go during this deployment cycles.

VMware/Configuration issues :
listed all the known issues that we had in our environment and created the best practices.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
GoodADAuthor Commented:
other comments are tactical solutions, but my comments are more process and technical approach.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VMware

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.