Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

ESXi PSOD

Posted on 2009-05-06
10
Medium Priority
?
2,543 Views
Last Modified: 2012-05-06
We setup an ESXi box just over a month ago. There are 4 VMs running out of it. It has been running fine for weeks, but all of a sudden in the past 3 days, it has Pink Screened Twice.

The First time - it was an Out of Memory Error

The Second time - it was an unknown error ( I have attached this screen shot )

Unfortunately, The Raid card we have in our server was not fully supported at the time we installed ESXi, so because of that ESXi is currently running off of a USB key.

Server Specs:

Mobo: Supermicro x7DWU
CPU: 2x Intel Xeon E5405 @ 2.00GHz
Raid: 3Ware 9650SE
HDD: 6TB (Raid 10) - 3TB usable.


The pink screen doesn't bother me all that much - mostly because I'm a Windows guy =P
That is - it would be really nice to get rid of the Pink Screen all together.

But the biggest part that bugs me, why doesn't the pink screen auto reboot.
Is there a way to force the Pink Screen to auto reboot?

Please post any suggestions.

Thanks,

Steve
error.bmp
0
Comment
Question by:svelluto
  • 3
  • 3
  • 2
  • +2
10 Comments
 
LVL 19

Expert Comment

by:vmwarun - Arun
ID: 24316472
PSOD or Purple Screen of Death can occur due to multiple reasons.

If it occurs during installation, then the most probable reason is trying to install ESX/ESXi on unsupported hardware or on hardware not listed in VMware HCL.

Sometimes the installation may be successful, but when the ESX Server loads, it may throw PSOD because of the inability to load unsupported drivers.
0
 
LVL 8

Expert Comment

by:markzz
ID: 24317182
I would suggest that you didn't create a diags partition imediatly after you installed the ESX OS.
Therefore when the PSOD occures it doesn't have anywhere to dump the log.
Maybe you could move ESXi to a much larger memory stick, create a digs partition and see if this helps
The diags partition will be about 100MB.
You create it via disk management under configuration.
0
 
LVL 7

Author Comment

by:svelluto
ID: 24317569
For the three week prior to these two stalls, ESX was working just fine, and then it was like out of the blue, two PSOD's in a couple days.

ESX is currently running on a 1GB USB key, so it should have sufficient space remaining.

0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 21

Accepted Solution

by:
za_mkh earned 1000 total points
ID: 24319538
It looks like it has an issue with one of your Physical CPUs. Maybe try to disable 1 or check that the processors are still securely plugged in? Same goes for memory.
Another thread I found after googling one line from your PSOD screenshot seems to point to ensuring BIOS is upto date include microcode updates?
Hope this helps
http://communities.vmware.com/message/1059724
 
0
 
LVL 7

Author Comment

by:svelluto
ID: 24321389
thanks - i'll take a look at the bios update.

As for the CPU and the Memory securely plugged in, this is a server sitting in a rack at the datacenter, so it may take some time to check that, the BIOS i can check next time i reboot it.

0
 
LVL 8

Expert Comment

by:markzz
ID: 24326614
Do you have a diagnostics partition?
0
 
LVL 7

Author Comment

by:svelluto
ID: 24327736
A diagnostics partition?
I don't think so, How do we set that up?
0
 
LVL 19

Expert Comment

by:vmwarun - Arun
ID: 24338002
The default partitioning layout of a normal ESX 3.5 Server would be

/ (root) - 5 GB
/boot - 100 MB
/var/log - 2.5 GB
swap - 544 MB
vmkcore (Diagnostic Partition) - 100 MB
vmfs - Remaining Space.

vmkcore is used to store the diagnostice info when the ESX/ESXi Host PSODs.



0
 
LVL 8

Expert Comment

by:markzz
ID: 24338335
via the VI Client go to configuration-> Storage. Add Storage
It will be the 3rd option.
0
 

Expert Comment

by:jiriki
ID: 24805579
Your PSOD shouldn't auto-reboot as you should be left with a prompt to 'Press Escape to enter local debugger', which I assume is the faux COS console.  It acts like the dump didn't complete for some reason.

Not that I'm a guru or anything, but my understanding with ESXi (and I'm emphisizing the 'i' which everyone seems to overlook and makes VMWare support cringe even though their sales reps are pushing it) you do not have the option to alter the partition build.  The installer does this automatically and should create the vmkore partition (per http://www.boche.net/blog/?p=120 ).  However, you can use the unsupported enabling of a faux COS console an manually create one, as per the link given above and below, it is possible to trash the default install.

Now getting that dump file isn't so easy peasy... http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004128 dictates how to do this from ESXi (emphsizing the 'i') .  This article specifically denotes that the 'USB key for ESXi Embedded contains a VMKore parition".  It seems you have to get into DEBUG mode which the PSOD 'should' allow you to do.. however, my experience with the 1 I've suffered, I could not get into that mode.

I can see that I have the VMKore paritition within the VI Client by going to Configuration tab, Storage, right-click and choose properties on the ESXi system disk and the Extent Device pain shws VMWare Diagnostic of 109MB.  You can also add it this way or possibly as Markzz indicates above (I don't have any spare disk space on my system to verify I can create a VMFS diagnostic -fc- type parition).

Being a USB version you could mount the USB /dev/disks/vmhba32:0:0:7 volume on another *NIX box and retrieve the file.  I'm not *NIX savy enough to know if you could access this from a Windows box (although the article notes you can use Disk Management/FDisk to see the VMKore parition exists.)

You cannot browse the VMKcore parition in the Datastore Browser, nor access it with the freeware Veam Backup & FastSCP (not on v3, I've not tried the v4 just released).  Nor can I verify any of this is valid for vSphere4i.

Sorry I'm not giving you a direct answer, but its the best I've got.  I'd say that if you do have a VMKore parition and a diag of the USB key integrity pans out (I don't know how to do off the top of my head from a *NIX box... DO NOT try and run a chkdsk from a windows mount), then its likely a RAM, CPU or Mobo.  The dump is supposed to help give you more info, but would suspect that normal hardware diag and/or process of elimination by removing/disabling various components will end up being what you'll have to do.

Hope this helps at least with places to look for info...
0

Featured Post

Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

HOW TO: Upload an ISO image to a VMware datastore for use with VMware vSphere Hypervisor 6.5 (ESXi 6.5) using the vSphere Host Client, and checking its MD5 checksum signature is correct.  It's a good idea to compare checksums, because many installat…
In this step by step tutorial with screenshots, we will show you HOW TO: Enable SSH Remote Access on a VMware vSphere Hypervisor 6.5 (ESXi 6.5). This is important if you need to enable SSH remote access for additional troubleshooting of the ESXi hos…
Teach the user how to configure vSphere Replication and how to protect and recover VMs Open vSphere Web Client: Verify vsphere Replication is enabled: Enable vSphere Replication for a virtual machine: Verify replicated VM is created: Recover replica…
This tutorial will walk an individual through the steps necessary to enable the VMware\Hyper-V licensed feature of Backup Exec 2012. In addition, how to add a VMware server and configure a backup job. The first step is to acquire the necessary licen…

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question