?
Solved

PSOD when using HP ESXi or Offline bundle

Posted on 2011-02-22
15
Medium Priority
?
1,649 Views
Last Modified: 2012-05-11
I have spent the last couple of weeks in and out of PSODs testing different configurations and versions of ESXi and Offline Bundles on an HP Proliant DL180 G6.  HP ESXi (September 2010) yielded PSOD.  Striaght VMware build 260247 was stable until I installed Offline Bundle 1.0 or 1.1.  All PSODs occured quite consistently about 2 minutes after boot.  Finally found a December 2010 version of HP ESXi which seemed to be stable.  Installed VMs and configured one of the servers with vCenter.  Logged into vCenter, added host, the system added vCenter agents and after a few minutes I got the PSOD again.  Now it is consistent again but about 10 minutes after boot.

Checked ESXi updates using Sphere CLI vihostupdate.pl and it seems that the "HP" version of ESXi is nothing more than Build 260247 with the Offline Bundle and HP NMI Sourcing Driver updates added.  Both are version 1.0.  Removed these two updates and server is once again stable.  I need the monitoring though as this is going to be a remote server and I want to configure alerts for storage.

After googling extensively and finding all sorts of red herrings I "think" I am closer to a solution but I would like feedback.  One explanation states:  "think the reason the HP version of ESXi PSOD on these servers is that there is a watchdog timer linked to the iLO2/3 ASIC"  Sounds reasonable.  (http://forums13.itrc.hp.com/service/forums/questionanswer.do?admit=109447627+1298414940467+28353475&threadId=1375244)

At the end of that article is a link (http://forum.lettronics.com/forums/thread/1273.aspx
) to a possible solution that describes editing a file in the ESXi server.  I haven't tried it yet but ... after this long winded description of my problem and troubleshooting ... I am asking for help to understand the file they ask to edit a little better.  They recommend removing all entries that don't refer to the Smart Array controller specifically.  I would like to be able to leave as much as possible and remove only the culprit.  I have provided the links to the articles I found and attached a copy of the file in the ESXi server.  Can anybody provide feedback?

File removed by Netminder (intellectual property of VMWare) 25 Feb 2011
0
Comment
Question by:eldtech
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 6
  • 2
15 Comments
 
LVL 122
ID: 34956812
The server you are using is on the HCL.

Have you checked to see if you have Intel-VT enabled in the BIOS.

and secondly, it could be a hardware fault with the memory have you tried HP Diagnostics or Memtest86+

http://www.memtest.org/

I've seen the very same thing on a HP Proliant Server with ESXi 4.1 installed without Intel-VT, it booted fine, and a few minutes later gave me a PSOD.
0
 
LVL 122
ID: 34956832
I'm not sure if we can discuss editing source files for VMware on EE.

Have you raised these concerns with HP and/or VMware, as the server is on the HCL, the HP downloaded version should work out of the box no issue.
0
 

Author Comment

by:eldtech
ID: 34956836
Intel-VT is enabled in the BIOS and I've swapped out memory as well as tried different memory configurations.  I even tried with a different array controller.  The reason I have gotten away from thinking it's hardware is because this only happens when HP CIM agents are installed.
0
Migrating Your Company's PCs

To keep pace with competitors, businesses must keep employees productive, and that means providing them with the latest technology. This document provides the tips and tricks you need to help you migrate an outdated PC fleet to new desktops, laptops, and tablets.

 
LVL 122
ID: 34956874
Have you escalated the issue to HP/VMware?
0
 

Author Comment

by:eldtech
ID: 34956902
Yes, cases with both VMWare and HP.  VMware tells me that it's HPs problem and HP has had me going around in circles checking firmware versions and pointing to VMware.
0
 
LVL 122
ID: 34956938
You could leave out HP CIM at present until VMware/HP resolve it? (as a temporary workaround).

Is this for Business Production, Research and Development, Lab?
0
 
LVL 28

Accepted Solution

by:
bgoering earned 2000 total points
ID: 34956939
Have you tried a full version of ESX rather than ESXi. This box appears to be one where ESX is supported but ESXi is not. Generally there is a reason for that...

Also if you are using 4.0 and not 4.1 there may be a patch bundle that is required.

http://www.vmware.com/resources/compatibility/search.php?action=search&deviceCategory=server&productId=1&advancedORbasic=advanced&maxDisplayRows=1000000&key=dl180&release%5B%5D=-1&datePosted=-1&partnerId%5B%5D=-1&formFactorId%5B%5D=-1&filterByEVC=0&filterByFT=0&min_sockets=&min_cores=&min_memory=&rorre=0

Good Luck
0
 

Author Comment

by:eldtech
ID: 34956958
No I haven't and I'd rather stick with ESXi as this is the way VMware says they're going.  VMware's HCL does state that iit supports this server with ESXi.
This directly from their HCL:
HP ProLiant DL180 G6 Intel Xeon 56xx Series ESX 4.1 U1, ESX 4.1, ESX 4.0 U21 , ESX 4.0 U11  
0
 

Author Comment

by:eldtech
ID: 34956964
And I read it once again back to myself.  You ARE correct bgoering - ESXi is NOT listed.  Wow, I've read that multiple times and saw what I wanted.  Let me look into that.  Thanks for pointing it out.
0
 
LVL 122
ID: 34956973
that article references ESXi 4.0, not 4.1. Also the ML110 is not on the HCL.

and what they are referring to is "modifying" ESX to work with an unsupported configuration.

and I don't think I can discuss that on EE.
0
 
LVL 122
ID: 34956978
@bgoering: long day small screen, missed the i!
0
 

Author Comment

by:eldtech
ID: 34957023
The first line of the article does state "Note, extra notes for ESXi 4.1 are posted below these instructions" and if you go to the next section it outlines the specifics for 4.1.  And I understand it is a different server but it seems that everywhere I've looked there are problems all over the board directly related to HP CIM agents so I thought it was worth a shot.  I wasn't aware that unsupported configurations were not discussed in EE.  I think for me right now it's going to boil down to the fact that I was going all this time thinking ESXi was supported and was just made to realize that it is not.  Back to the drawing board and thanks for all the comments and feedback.
0
 
LVL 28

Expert Comment

by:bgoering
ID: 34957047
We discuss unsupported configurations all the time and often refer folks to the "whitebox" hcl for help while all the while cautioning them not to use it for a production environment. We have even pointed out things to look for in various config files.

I wouldn't be afraid to try the fix in the article you posted.. looks like all it does is disable the cim monitoring for just about everything except the raid controller and if that is acceptable to you then go for it.

I probably would try the full ESX and see if that is stable though - then make a decision and go from there.

Good Luck
0
 
LVL 122
ID: 34957086
It's not about unsupported configurations, it's about the distribution and modiying of running code, that is not open source, that I beleive is not tolerated on EE.

Also, the inclusion of a file from the ESXi software, may also be considered as wrong on EE.

These are no my rules, but we are bound by them.
0
 

Author Comment

by:eldtech
ID: 35032571
Just to add a note on the outcome of my experimentation.  I did finally edit the configuration file mentioned in the article and removed all references except the ones for power, SAS, and smartarray.  This worked and it is stable.  But, as has been pointed out it is unsupported for the DL180.  Apparently, because the server does not have all the devices on which monitoring is attempted.  We are using this server internally for development purposes only as we won't be putting an unsupported server into production.  Thanks again for all the feedback.
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If we need to check who deleted a Virtual Machine from our vCenter. Looking this task in logs can be painful and spend lot of time, so the best way to check this is in the vCenter DB. Just connect to vCenter DB(default DB should be VCDB and using…
Veeam Backup & Replication has added a new integration – Veeam Backup for Microsoft Office 365.  In this blog, we will discuss how you can benefit from Office 365 email backup with the Veeam’s new product and try to shed some light on the needs and …
Teach the user how to rename, unmount, delete and upgrade VMFS datastores. Open vSphere Web Client: Rename VMFS and NFS datastores: Upgrade VMFS-3 volume to VMFS-5: Unmount VMFS datastore: Delete a VMFS datastore:
Teach the user how to use configure the vCenter Server storage filters Open vSphere Web Client:  Navigate to vCenter Server Advanced Settings: Add the four vCenter Server storage filters: Review the advanced settings: Modify the values of the four v…
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question