Solved

ESXi PSOD

Posted on 2009-05-06
10
2,446 Views
Last Modified: 2012-05-06
We setup an ESXi box just over a month ago. There are 4 VMs running out of it. It has been running fine for weeks, but all of a sudden in the past 3 days, it has Pink Screened Twice.

The First time - it was an Out of Memory Error

The Second time - it was an unknown error ( I have attached this screen shot )

Unfortunately, The Raid card we have in our server was not fully supported at the time we installed ESXi, so because of that ESXi is currently running off of a USB key.

Server Specs:

Mobo: Supermicro x7DWU
CPU: 2x Intel Xeon E5405 @ 2.00GHz
Raid: 3Ware 9650SE
HDD: 6TB (Raid 10) - 3TB usable.


The pink screen doesn't bother me all that much - mostly because I'm a Windows guy =P
That is - it would be really nice to get rid of the Pink Screen all together.

But the biggest part that bugs me, why doesn't the pink screen auto reboot.
Is there a way to force the Pink Screen to auto reboot?

Please post any suggestions.

Thanks,

Steve
error.bmp
0
Comment
Question by:svelluto
  • 3
  • 3
  • 2
  • +2
10 Comments
 
LVL 19

Expert Comment

by:vmwarun - Arun
Comment Utility
PSOD or Purple Screen of Death can occur due to multiple reasons.

If it occurs during installation, then the most probable reason is trying to install ESX/ESXi on unsupported hardware or on hardware not listed in VMware HCL.

Sometimes the installation may be successful, but when the ESX Server loads, it may throw PSOD because of the inability to load unsupported drivers.
0
 
LVL 8

Expert Comment

by:markzz
Comment Utility
I would suggest that you didn't create a diags partition imediatly after you installed the ESX OS.
Therefore when the PSOD occures it doesn't have anywhere to dump the log.
Maybe you could move ESXi to a much larger memory stick, create a digs partition and see if this helps
The diags partition will be about 100MB.
You create it via disk management under configuration.
0
 
LVL 7

Author Comment

by:svelluto
Comment Utility
For the three week prior to these two stalls, ESX was working just fine, and then it was like out of the blue, two PSOD's in a couple days.

ESX is currently running on a 1GB USB key, so it should have sufficient space remaining.

0
 
LVL 21

Accepted Solution

by:
za_mkh earned 250 total points
Comment Utility
It looks like it has an issue with one of your Physical CPUs. Maybe try to disable 1 or check that the processors are still securely plugged in? Same goes for memory.
Another thread I found after googling one line from your PSOD screenshot seems to point to ensuring BIOS is upto date include microcode updates?
Hope this helps
http://communities.vmware.com/message/1059724
 
0
 
LVL 7

Author Comment

by:svelluto
Comment Utility
thanks - i'll take a look at the bios update.

As for the CPU and the Memory securely plugged in, this is a server sitting in a rack at the datacenter, so it may take some time to check that, the BIOS i can check next time i reboot it.

0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 8

Expert Comment

by:markzz
Comment Utility
Do you have a diagnostics partition?
0
 
LVL 7

Author Comment

by:svelluto
Comment Utility
A diagnostics partition?
I don't think so, How do we set that up?
0
 
LVL 19

Expert Comment

by:vmwarun - Arun
Comment Utility
The default partitioning layout of a normal ESX 3.5 Server would be

/ (root) - 5 GB
/boot - 100 MB
/var/log - 2.5 GB
swap - 544 MB
vmkcore (Diagnostic Partition) - 100 MB
vmfs - Remaining Space.

vmkcore is used to store the diagnostice info when the ESX/ESXi Host PSODs.



0
 
LVL 8

Expert Comment

by:markzz
Comment Utility
via the VI Client go to configuration-> Storage. Add Storage
It will be the 3rd option.
0
 

Expert Comment

by:jiriki
Comment Utility
Your PSOD shouldn't auto-reboot as you should be left with a prompt to 'Press Escape to enter local debugger', which I assume is the faux COS console.  It acts like the dump didn't complete for some reason.

Not that I'm a guru or anything, but my understanding with ESXi (and I'm emphisizing the 'i' which everyone seems to overlook and makes VMWare support cringe even though their sales reps are pushing it) you do not have the option to alter the partition build.  The installer does this automatically and should create the vmkore partition (per http://www.boche.net/blog/?p=120 ).  However, you can use the unsupported enabling of a faux COS console an manually create one, as per the link given above and below, it is possible to trash the default install.

Now getting that dump file isn't so easy peasy... http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004128 dictates how to do this from ESXi (emphsizing the 'i') .  This article specifically denotes that the 'USB key for ESXi Embedded contains a VMKore parition".  It seems you have to get into DEBUG mode which the PSOD 'should' allow you to do.. however, my experience with the 1 I've suffered, I could not get into that mode.

I can see that I have the VMKore paritition within the VI Client by going to Configuration tab, Storage, right-click and choose properties on the ESXi system disk and the Extent Device pain shws VMWare Diagnostic of 109MB.  You can also add it this way or possibly as Markzz indicates above (I don't have any spare disk space on my system to verify I can create a VMFS diagnostic -fc- type parition).

Being a USB version you could mount the USB /dev/disks/vmhba32:0:0:7 volume on another *NIX box and retrieve the file.  I'm not *NIX savy enough to know if you could access this from a Windows box (although the article notes you can use Disk Management/FDisk to see the VMKore parition exists.)

You cannot browse the VMKcore parition in the Datastore Browser, nor access it with the freeware Veam Backup & FastSCP (not on v3, I've not tried the v4 just released).  Nor can I verify any of this is valid for vSphere4i.

Sorry I'm not giving you a direct answer, but its the best I've got.  I'd say that if you do have a VMKore parition and a diag of the USB key integrity pans out (I don't know how to do off the top of my head from a *NIX box... DO NOT try and run a chkdsk from a windows mount), then its likely a RAM, CPU or Mobo.  The dump is supposed to help give you more info, but would suspect that normal hardware diag and/or process of elimination by removing/disabling various components will end up being what you'll have to do.

Hope this helps at least with places to look for info...
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Last article we focus in how to VMware: How to create and use VMs TAGs – Part 1 so before follow this article and perform the next tasks, you should read the first article how to create the TAG before using them in Veeam Backup Jobs.
This article will show you how to create an ISO CD-ROM/DVD-ROM image (*.iso), and MD5 checksum signature, for use with VMware vSphere Hypervisor 6.5 (ESXi 6.5). It's a good idea to compare checksums, because many installations fail because of a corr…
Teach the user how to install vSphere Update Manager  Console to Windows system:  Install vSphere Update Manager: Configure vSphere Update Manager plug-in in vSphere Client: Verify vSphere Update Manager settings in vSphere Client:
Teach the user how to configure vSphere clusters to support the VMware FT feature Open vSphere Web Client: Verify vSphere HA is enabled: Verify netowrking for vMotion and FT Logging is in place or create it: Turn On FT for a virtual machine: Verify …

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now