Solved

PSOD when using HP ESXi or Offline bundle

Posted on 2011-02-22
15
1,615 Views
Last Modified: 2012-05-11
I have spent the last couple of weeks in and out of PSODs testing different configurations and versions of ESXi and Offline Bundles on an HP Proliant DL180 G6.  HP ESXi (September 2010) yielded PSOD.  Striaght VMware build 260247 was stable until I installed Offline Bundle 1.0 or 1.1.  All PSODs occured quite consistently about 2 minutes after boot.  Finally found a December 2010 version of HP ESXi which seemed to be stable.  Installed VMs and configured one of the servers with vCenter.  Logged into vCenter, added host, the system added vCenter agents and after a few minutes I got the PSOD again.  Now it is consistent again but about 10 minutes after boot.

Checked ESXi updates using Sphere CLI vihostupdate.pl and it seems that the "HP" version of ESXi is nothing more than Build 260247 with the Offline Bundle and HP NMI Sourcing Driver updates added.  Both are version 1.0.  Removed these two updates and server is once again stable.  I need the monitoring though as this is going to be a remote server and I want to configure alerts for storage.

After googling extensively and finding all sorts of red herrings I "think" I am closer to a solution but I would like feedback.  One explanation states:  "think the reason the HP version of ESXi PSOD on these servers is that there is a watchdog timer linked to the iLO2/3 ASIC"  Sounds reasonable.  (http://forums13.itrc.hp.com/service/forums/questionanswer.do?admit=109447627+1298414940467+28353475&threadId=1375244)

At the end of that article is a link (http://forum.lettronics.com/forums/thread/1273.aspx
) to a possible solution that describes editing a file in the ESXi server.  I haven't tried it yet but ... after this long winded description of my problem and troubleshooting ... I am asking for help to understand the file they ask to edit a little better.  They recommend removing all entries that don't refer to the Smart Array controller specifically.  I would like to be able to leave as much as possible and remove only the culprit.  I have provided the links to the articles I found and attached a copy of the file in the ESXi server.  Can anybody provide feedback?

File removed by Netminder (intellectual property of VMWare) 25 Feb 2011
0
Comment
Question by:eldtech
  • 7
  • 6
  • 2
15 Comments
 
LVL 118
ID: 34956812
The server you are using is on the HCL.

Have you checked to see if you have Intel-VT enabled in the BIOS.

and secondly, it could be a hardware fault with the memory have you tried HP Diagnostics or Memtest86+

http://www.memtest.org/

I've seen the very same thing on a HP Proliant Server with ESXi 4.1 installed without Intel-VT, it booted fine, and a few minutes later gave me a PSOD.
0
 
LVL 118
ID: 34956832
I'm not sure if we can discuss editing source files for VMware on EE.

Have you raised these concerns with HP and/or VMware, as the server is on the HCL, the HP downloaded version should work out of the box no issue.
0
 

Author Comment

by:eldtech
ID: 34956836
Intel-VT is enabled in the BIOS and I've swapped out memory as well as tried different memory configurations.  I even tried with a different array controller.  The reason I have gotten away from thinking it's hardware is because this only happens when HP CIM agents are installed.
0
 
LVL 118
ID: 34956874
Have you escalated the issue to HP/VMware?
0
 

Author Comment

by:eldtech
ID: 34956902
Yes, cases with both VMWare and HP.  VMware tells me that it's HPs problem and HP has had me going around in circles checking firmware versions and pointing to VMware.
0
 
LVL 118
ID: 34956938
You could leave out HP CIM at present until VMware/HP resolve it? (as a temporary workaround).

Is this for Business Production, Research and Development, Lab?
0
 
LVL 28

Accepted Solution

by:
bgoering earned 500 total points
ID: 34956939
Have you tried a full version of ESX rather than ESXi. This box appears to be one where ESX is supported but ESXi is not. Generally there is a reason for that...

Also if you are using 4.0 and not 4.1 there may be a patch bundle that is required.

http://www.vmware.com/resources/compatibility/search.php?action=search&deviceCategory=server&productId=1&advancedORbasic=advanced&maxDisplayRows=1000000&key=dl180&release%5B%5D=-1&datePosted=-1&partnerId%5B%5D=-1&formFactorId%5B%5D=-1&filterByEVC=0&filterByFT=0&min_sockets=&min_cores=&min_memory=&rorre=0

Good Luck
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 

Author Comment

by:eldtech
ID: 34956958
No I haven't and I'd rather stick with ESXi as this is the way VMware says they're going.  VMware's HCL does state that iit supports this server with ESXi.
This directly from their HCL:
HP ProLiant DL180 G6 Intel Xeon 56xx Series ESX 4.1 U1, ESX 4.1, ESX 4.0 U21 , ESX 4.0 U11  
0
 

Author Comment

by:eldtech
ID: 34956964
And I read it once again back to myself.  You ARE correct bgoering - ESXi is NOT listed.  Wow, I've read that multiple times and saw what I wanted.  Let me look into that.  Thanks for pointing it out.
0
 
LVL 118
ID: 34956973
that article references ESXi 4.0, not 4.1. Also the ML110 is not on the HCL.

and what they are referring to is "modifying" ESX to work with an unsupported configuration.

and I don't think I can discuss that on EE.
0
 
LVL 118
ID: 34956978
@bgoering: long day small screen, missed the i!
0
 

Author Comment

by:eldtech
ID: 34957023
The first line of the article does state "Note, extra notes for ESXi 4.1 are posted below these instructions" and if you go to the next section it outlines the specifics for 4.1.  And I understand it is a different server but it seems that everywhere I've looked there are problems all over the board directly related to HP CIM agents so I thought it was worth a shot.  I wasn't aware that unsupported configurations were not discussed in EE.  I think for me right now it's going to boil down to the fact that I was going all this time thinking ESXi was supported and was just made to realize that it is not.  Back to the drawing board and thanks for all the comments and feedback.
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34957047
We discuss unsupported configurations all the time and often refer folks to the "whitebox" hcl for help while all the while cautioning them not to use it for a production environment. We have even pointed out things to look for in various config files.

I wouldn't be afraid to try the fix in the article you posted.. looks like all it does is disable the cim monitoring for just about everything except the raid controller and if that is acceptable to you then go for it.

I probably would try the full ESX and see if that is stable though - then make a decision and go from there.

Good Luck
0
 
LVL 118
ID: 34957086
It's not about unsupported configurations, it's about the distribution and modiying of running code, that is not open source, that I beleive is not tolerated on EE.

Also, the inclusion of a file from the ESXi software, may also be considered as wrong on EE.

These are no my rules, but we are bound by them.
0
 

Author Comment

by:eldtech
ID: 35032571
Just to add a note on the outcome of my experimentation.  I did finally edit the configuration file mentioned in the article and removed all references except the ones for power, SAS, and smartarray.  This worked and it is stable.  But, as has been pointed out it is unsupported for the DL180.  Apparently, because the server does not have all the devices on which monitoring is attempted.  We are using this server internally for development purposes only as we won't be putting an unsupported server into production.  Thanks again for all the feedback.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

It Is not possible to enable LLDP in vSwitch(at least is not supported by VMware), so in this article we will enable this, and also go trough how to enabled CDP and how to get this information in vSwitches and also in vDS.
In this article, I show you step by step with screenshots to assist you - HOW TO: Deploy and Install the VMware vCenter Server Appliance 6.5 (VCSA 6.5), with some helpful tips along the way.
Teach the user how to install vSphere Update Manager  Console to Windows system:  Install vSphere Update Manager: Configure vSphere Update Manager plug-in in vSphere Client: Verify vSphere Update Manager settings in vSphere Client:
This video shows you how easy it is to boot from ISO images for virtual machines with the ISO images stored on a local datastore on the ESXi host.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now