ESXi PSOD

We setup an ESXi box just over a month ago. There are 4 VMs running out of it. It has been running fine for weeks, but all of a sudden in the past 3 days, it has Pink Screened Twice.

The First time - it was an Out of Memory Error

The Second time - it was an unknown error ( I have attached this screen shot )

Unfortunately, The Raid card we have in our server was not fully supported at the time we installed ESXi, so because of that ESXi is currently running off of a USB key.

Server Specs:

Mobo: Supermicro x7DWU
CPU: 2x Intel Xeon E5405 @ 2.00GHz
Raid: 3Ware 9650SE
HDD: 6TB (Raid 10) - 3TB usable.


The pink screen doesn't bother me all that much - mostly because I'm a Windows guy =P
That is - it would be really nice to get rid of the Pink Screen all together.

But the biggest part that bugs me, why doesn't the pink screen auto reboot.
Is there a way to force the Pink Screen to auto reboot?

Please post any suggestions.

Thanks,

Steve
error.bmp
LVL 7
svellutoAsked:
Who is Participating?
 
za_mkhConnect With a Mentor Commented:
It looks like it has an issue with one of your Physical CPUs. Maybe try to disable 1 or check that the processors are still securely plugged in? Same goes for memory.
Another thread I found after googling one line from your PSOD screenshot seems to point to ensuring BIOS is upto date include microcode updates?
Hope this helps
http://communities.vmware.com/message/1059724
 
0
 
vmwarun - ArunCommented:
PSOD or Purple Screen of Death can occur due to multiple reasons.

If it occurs during installation, then the most probable reason is trying to install ESX/ESXi on unsupported hardware or on hardware not listed in VMware HCL.

Sometimes the installation may be successful, but when the ESX Server loads, it may throw PSOD because of the inability to load unsupported drivers.
0
 
markzzCommented:
I would suggest that you didn't create a diags partition imediatly after you installed the ESX OS.
Therefore when the PSOD occures it doesn't have anywhere to dump the log.
Maybe you could move ESXi to a much larger memory stick, create a digs partition and see if this helps
The diags partition will be about 100MB.
You create it via disk management under configuration.
0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

 
svellutoAuthor Commented:
For the three week prior to these two stalls, ESX was working just fine, and then it was like out of the blue, two PSOD's in a couple days.

ESX is currently running on a 1GB USB key, so it should have sufficient space remaining.

0
 
svellutoAuthor Commented:
thanks - i'll take a look at the bios update.

As for the CPU and the Memory securely plugged in, this is a server sitting in a rack at the datacenter, so it may take some time to check that, the BIOS i can check next time i reboot it.

0
 
markzzCommented:
Do you have a diagnostics partition?
0
 
svellutoAuthor Commented:
A diagnostics partition?
I don't think so, How do we set that up?
0
 
vmwarun - ArunCommented:
The default partitioning layout of a normal ESX 3.5 Server would be

/ (root) - 5 GB
/boot - 100 MB
/var/log - 2.5 GB
swap - 544 MB
vmkcore (Diagnostic Partition) - 100 MB
vmfs - Remaining Space.

vmkcore is used to store the diagnostice info when the ESX/ESXi Host PSODs.



0
 
markzzCommented:
via the VI Client go to configuration-> Storage. Add Storage
It will be the 3rd option.
0
 
jirikiCommented:
Your PSOD shouldn't auto-reboot as you should be left with a prompt to 'Press Escape to enter local debugger', which I assume is the faux COS console.  It acts like the dump didn't complete for some reason.

Not that I'm a guru or anything, but my understanding with ESXi (and I'm emphisizing the 'i' which everyone seems to overlook and makes VMWare support cringe even though their sales reps are pushing it) you do not have the option to alter the partition build.  The installer does this automatically and should create the vmkore partition (per http://www.boche.net/blog/?p=120 ).  However, you can use the unsupported enabling of a faux COS console an manually create one, as per the link given above and below, it is possible to trash the default install.

Now getting that dump file isn't so easy peasy... http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004128 dictates how to do this from ESXi (emphsizing the 'i') .  This article specifically denotes that the 'USB key for ESXi Embedded contains a VMKore parition".  It seems you have to get into DEBUG mode which the PSOD 'should' allow you to do.. however, my experience with the 1 I've suffered, I could not get into that mode.

I can see that I have the VMKore paritition within the VI Client by going to Configuration tab, Storage, right-click and choose properties on the ESXi system disk and the Extent Device pain shws VMWare Diagnostic of 109MB.  You can also add it this way or possibly as Markzz indicates above (I don't have any spare disk space on my system to verify I can create a VMFS diagnostic -fc- type parition).

Being a USB version you could mount the USB /dev/disks/vmhba32:0:0:7 volume on another *NIX box and retrieve the file.  I'm not *NIX savy enough to know if you could access this from a Windows box (although the article notes you can use Disk Management/FDisk to see the VMKore parition exists.)

You cannot browse the VMKcore parition in the Datastore Browser, nor access it with the freeware Veam Backup & FastSCP (not on v3, I've not tried the v4 just released).  Nor can I verify any of this is valid for vSphere4i.

Sorry I'm not giving you a direct answer, but its the best I've got.  I'd say that if you do have a VMKore parition and a diag of the USB key integrity pans out (I don't know how to do off the top of my head from a *NIX box... DO NOT try and run a chkdsk from a windows mount), then its likely a RAM, CPU or Mobo.  The dump is supposed to help give you more info, but would suspect that normal hardware diag and/or process of elimination by removing/disabling various components will end up being what you'll have to do.

Hope this helps at least with places to look for info...
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.