PSOD when using HP ESXi or Offline bundle

I have spent the last couple of weeks in and out of PSODs testing different configurations and versions of ESXi and Offline Bundles on an HP Proliant DL180 G6.  HP ESXi (September 2010) yielded PSOD.  Striaght VMware build 260247 was stable until I installed Offline Bundle 1.0 or 1.1.  All PSODs occured quite consistently about 2 minutes after boot.  Finally found a December 2010 version of HP ESXi which seemed to be stable.  Installed VMs and configured one of the servers with vCenter.  Logged into vCenter, added host, the system added vCenter agents and after a few minutes I got the PSOD again.  Now it is consistent again but about 10 minutes after boot.

Checked ESXi updates using Sphere CLI vihostupdate.pl and it seems that the "HP" version of ESXi is nothing more than Build 260247 with the Offline Bundle and HP NMI Sourcing Driver updates added.  Both are version 1.0.  Removed these two updates and server is once again stable.  I need the monitoring though as this is going to be a remote server and I want to configure alerts for storage.

After googling extensively and finding all sorts of red herrings I "think" I am closer to a solution but I would like feedback.  One explanation states:  "think the reason the HP version of ESXi PSOD on these servers is that there is a watchdog timer linked to the iLO2/3 ASIC"  Sounds reasonable.  (http://forums13.itrc.hp.com/service/forums/questionanswer.do?admit=109447627+1298414940467+28353475&threadId=1375244)

At the end of that article is a link (http://forum.lettronics.com/forums/thread/1273.aspx
) to a possible solution that describes editing a file in the ESXi server.  I haven't tried it yet but ... after this long winded description of my problem and troubleshooting ... I am asking for help to understand the file they ask to edit a little better.  They recommend removing all entries that don't refer to the Smart Array controller specifically.  I would like to be able to leave as much as possible and remove only the culprit.  I have provided the links to the articles I found and attached a copy of the file in the ESXi server.  Can anybody provide feedback?

File removed by Netminder (intellectual property of VMWare) 25 Feb 2011
eldtechAsked:
Who is Participating?
 
bgoeringConnect With a Mentor Commented:
Have you tried a full version of ESX rather than ESXi. This box appears to be one where ESX is supported but ESXi is not. Generally there is a reason for that...

Also if you are using 4.0 and not 4.1 there may be a patch bundle that is required.

http://www.vmware.com/resources/compatibility/search.php?action=search&deviceCategory=server&productId=1&advancedORbasic=advanced&maxDisplayRows=1000000&key=dl180&release%5B%5D=-1&datePosted=-1&partnerId%5B%5D=-1&formFactorId%5B%5D=-1&filterByEVC=0&filterByFT=0&min_sockets=&min_cores=&min_memory=&rorre=0

Good Luck
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
The server you are using is on the HCL.

Have you checked to see if you have Intel-VT enabled in the BIOS.

and secondly, it could be a hardware fault with the memory have you tried HP Diagnostics or Memtest86+

http://www.memtest.org/

I've seen the very same thing on a HP Proliant Server with ESXi 4.1 installed without Intel-VT, it booted fine, and a few minutes later gave me a PSOD.
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
I'm not sure if we can discuss editing source files for VMware on EE.

Have you raised these concerns with HP and/or VMware, as the server is on the HCL, the HP downloaded version should work out of the box no issue.
0
Cloud Class® Course: C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

 
eldtechAuthor Commented:
Intel-VT is enabled in the BIOS and I've swapped out memory as well as tried different memory configurations.  I even tried with a different array controller.  The reason I have gotten away from thinking it's hardware is because this only happens when HP CIM agents are installed.
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Have you escalated the issue to HP/VMware?
0
 
eldtechAuthor Commented:
Yes, cases with both VMWare and HP.  VMware tells me that it's HPs problem and HP has had me going around in circles checking firmware versions and pointing to VMware.
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
You could leave out HP CIM at present until VMware/HP resolve it? (as a temporary workaround).

Is this for Business Production, Research and Development, Lab?
0
 
eldtechAuthor Commented:
No I haven't and I'd rather stick with ESXi as this is the way VMware says they're going.  VMware's HCL does state that iit supports this server with ESXi.
This directly from their HCL:
HP ProLiant DL180 G6 Intel Xeon 56xx Series ESX 4.1 U1, ESX 4.1, ESX 4.0 U21 , ESX 4.0 U11  
0
 
eldtechAuthor Commented:
And I read it once again back to myself.  You ARE correct bgoering - ESXi is NOT listed.  Wow, I've read that multiple times and saw what I wanted.  Let me look into that.  Thanks for pointing it out.
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
that article references ESXi 4.0, not 4.1. Also the ML110 is not on the HCL.

and what they are referring to is "modifying" ESX to work with an unsupported configuration.

and I don't think I can discuss that on EE.
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
@bgoering: long day small screen, missed the i!
0
 
eldtechAuthor Commented:
The first line of the article does state "Note, extra notes for ESXi 4.1 are posted below these instructions" and if you go to the next section it outlines the specifics for 4.1.  And I understand it is a different server but it seems that everywhere I've looked there are problems all over the board directly related to HP CIM agents so I thought it was worth a shot.  I wasn't aware that unsupported configurations were not discussed in EE.  I think for me right now it's going to boil down to the fact that I was going all this time thinking ESXi was supported and was just made to realize that it is not.  Back to the drawing board and thanks for all the comments and feedback.
0
 
bgoeringCommented:
We discuss unsupported configurations all the time and often refer folks to the "whitebox" hcl for help while all the while cautioning them not to use it for a production environment. We have even pointed out things to look for in various config files.

I wouldn't be afraid to try the fix in the article you posted.. looks like all it does is disable the cim monitoring for just about everything except the raid controller and if that is acceptable to you then go for it.

I probably would try the full ESX and see if that is stable though - then make a decision and go from there.

Good Luck
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
It's not about unsupported configurations, it's about the distribution and modiying of running code, that is not open source, that I beleive is not tolerated on EE.

Also, the inclusion of a file from the ESXi software, may also be considered as wrong on EE.

These are no my rules, but we are bound by them.
0
 
eldtechAuthor Commented:
Just to add a note on the outcome of my experimentation.  I did finally edit the configuration file mentioned in the article and removed all references except the ones for power, SAS, and smartarray.  This worked and it is stable.  But, as has been pointed out it is unsupported for the DL180.  Apparently, because the server does not have all the devices on which monitoring is attempted.  We are using this server internally for development purposes only as we won't be putting an unsupported server into production.  Thanks again for all the feedback.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.