I have spent the last couple of weeks in and out of PSODs testing different configurations and versions of ESXi and Offline Bundles on an HP Proliant DL180 G6. HP ESXi (September 2010) yielded PSOD. Striaght VMware build 260247 was stable until I installed Offline Bundle 1.0 or 1.1. All PSODs occured quite consistently about 2 minutes after boot. Finally found a December 2010 version of HP ESXi which seemed to be stable. Installed VMs and configured one of the servers with vCenter. Logged into vCenter, added host, the system added vCenter agents and after a few minutes I got the PSOD again. Now it is consistent again but about 10 minutes after boot.
Checked ESXi updates using Sphere CLI vihostupdate.pl and it seems that the "HP" version of ESXi is nothing more than Build 260247 with the Offline Bundle and HP NMI Sourcing Driver updates added. Both are version 1.0. Removed these two updates and server is once again stable. I need the monitoring though as this is going to be a remote server and I want to configure alerts for storage.
After googling extensively and finding all sorts of red herrings I "think" I am closer to a solution but I would like feedback. One explanation states: "think the reason the HP version of ESXi PSOD on these servers is that there is a watchdog timer linked to the iLO2/3 ASIC" Sounds reasonable. (http://forums13.itrc.hp.com/service/forums/questionanswer.do?admit=109447627+1298414940467+28353475&threadId=1375244
At the end of that article is a link (http://forum.lettronics.com/forums/thread/1273.aspx
) to a possible solution that describes editing a file in the ESXi server. I haven't tried it yet but ... after this long winded description of my problem and troubleshooting ... I am asking for help to understand the file they ask to edit a little better. They recommend removing all entries that don't refer to the Smart Array controller specifically. I would like to be able to leave as much as possible and remove only the culprit. I have provided the links to the articles I found and attached a copy of the file in the ESXi server. Can anybody provide feedback?
File removed by Netminder (intellectual property of VMWare) 25 Feb 2011