Solved

ESXi PSOD

Posted on 2009-05-06
10
2,469 Views
Last Modified: 2012-05-06
We setup an ESXi box just over a month ago. There are 4 VMs running out of it. It has been running fine for weeks, but all of a sudden in the past 3 days, it has Pink Screened Twice.

The First time - it was an Out of Memory Error

The Second time - it was an unknown error ( I have attached this screen shot )

Unfortunately, The Raid card we have in our server was not fully supported at the time we installed ESXi, so because of that ESXi is currently running off of a USB key.

Server Specs:

Mobo: Supermicro x7DWU
CPU: 2x Intel Xeon E5405 @ 2.00GHz
Raid: 3Ware 9650SE
HDD: 6TB (Raid 10) - 3TB usable.


The pink screen doesn't bother me all that much - mostly because I'm a Windows guy =P
That is - it would be really nice to get rid of the Pink Screen all together.

But the biggest part that bugs me, why doesn't the pink screen auto reboot.
Is there a way to force the Pink Screen to auto reboot?

Please post any suggestions.

Thanks,

Steve
error.bmp
0
Comment
Question by:svelluto
  • 3
  • 3
  • 2
  • +2
10 Comments
 
LVL 19

Expert Comment

by:vmwarun - Arun
ID: 24316472
PSOD or Purple Screen of Death can occur due to multiple reasons.

If it occurs during installation, then the most probable reason is trying to install ESX/ESXi on unsupported hardware or on hardware not listed in VMware HCL.

Sometimes the installation may be successful, but when the ESX Server loads, it may throw PSOD because of the inability to load unsupported drivers.
0
 
LVL 8

Expert Comment

by:markzz
ID: 24317182
I would suggest that you didn't create a diags partition imediatly after you installed the ESX OS.
Therefore when the PSOD occures it doesn't have anywhere to dump the log.
Maybe you could move ESXi to a much larger memory stick, create a digs partition and see if this helps
The diags partition will be about 100MB.
You create it via disk management under configuration.
0
 
LVL 7

Author Comment

by:svelluto
ID: 24317569
For the three week prior to these two stalls, ESX was working just fine, and then it was like out of the blue, two PSOD's in a couple days.

ESX is currently running on a 1GB USB key, so it should have sufficient space remaining.

0
Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

 
LVL 21

Accepted Solution

by:
za_mkh earned 250 total points
ID: 24319538
It looks like it has an issue with one of your Physical CPUs. Maybe try to disable 1 or check that the processors are still securely plugged in? Same goes for memory.
Another thread I found after googling one line from your PSOD screenshot seems to point to ensuring BIOS is upto date include microcode updates?
Hope this helps
http://communities.vmware.com/message/1059724
 
0
 
LVL 7

Author Comment

by:svelluto
ID: 24321389
thanks - i'll take a look at the bios update.

As for the CPU and the Memory securely plugged in, this is a server sitting in a rack at the datacenter, so it may take some time to check that, the BIOS i can check next time i reboot it.

0
 
LVL 8

Expert Comment

by:markzz
ID: 24326614
Do you have a diagnostics partition?
0
 
LVL 7

Author Comment

by:svelluto
ID: 24327736
A diagnostics partition?
I don't think so, How do we set that up?
0
 
LVL 19

Expert Comment

by:vmwarun - Arun
ID: 24338002
The default partitioning layout of a normal ESX 3.5 Server would be

/ (root) - 5 GB
/boot - 100 MB
/var/log - 2.5 GB
swap - 544 MB
vmkcore (Diagnostic Partition) - 100 MB
vmfs - Remaining Space.

vmkcore is used to store the diagnostice info when the ESX/ESXi Host PSODs.



0
 
LVL 8

Expert Comment

by:markzz
ID: 24338335
via the VI Client go to configuration-> Storage. Add Storage
It will be the 3rd option.
0
 

Expert Comment

by:jiriki
ID: 24805579
Your PSOD shouldn't auto-reboot as you should be left with a prompt to 'Press Escape to enter local debugger', which I assume is the faux COS console.  It acts like the dump didn't complete for some reason.

Not that I'm a guru or anything, but my understanding with ESXi (and I'm emphisizing the 'i' which everyone seems to overlook and makes VMWare support cringe even though their sales reps are pushing it) you do not have the option to alter the partition build.  The installer does this automatically and should create the vmkore partition (per http://www.boche.net/blog/?p=120 ).  However, you can use the unsupported enabling of a faux COS console an manually create one, as per the link given above and below, it is possible to trash the default install.

Now getting that dump file isn't so easy peasy... http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004128 dictates how to do this from ESXi (emphsizing the 'i') .  This article specifically denotes that the 'USB key for ESXi Embedded contains a VMKore parition".  It seems you have to get into DEBUG mode which the PSOD 'should' allow you to do.. however, my experience with the 1 I've suffered, I could not get into that mode.

I can see that I have the VMKore paritition within the VI Client by going to Configuration tab, Storage, right-click and choose properties on the ESXi system disk and the Extent Device pain shws VMWare Diagnostic of 109MB.  You can also add it this way or possibly as Markzz indicates above (I don't have any spare disk space on my system to verify I can create a VMFS diagnostic -fc- type parition).

Being a USB version you could mount the USB /dev/disks/vmhba32:0:0:7 volume on another *NIX box and retrieve the file.  I'm not *NIX savy enough to know if you could access this from a Windows box (although the article notes you can use Disk Management/FDisk to see the VMKore parition exists.)

You cannot browse the VMKcore parition in the Datastore Browser, nor access it with the freeware Veam Backup & FastSCP (not on v3, I've not tried the v4 just released).  Nor can I verify any of this is valid for vSphere4i.

Sorry I'm not giving you a direct answer, but its the best I've got.  I'd say that if you do have a VMKore parition and a diag of the USB key integrity pans out (I don't know how to do off the top of my head from a *NIX box... DO NOT try and run a chkdsk from a windows mount), then its likely a RAM, CPU or Mobo.  The dump is supposed to help give you more info, but would suspect that normal hardware diag and/or process of elimination by removing/disabling various components will end up being what you'll have to do.

Hope this helps at least with places to look for info...
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

In this article, I will show you HOW TO: Perform a Physical to Virtual (P2V) Conversion the easy way from a computer backup (image).
In this article we will learn how to backup a VMware farm using Nakivo Backup & Replication. In this tutorial we will install the software on a Windows 2012 R2 Server.
Teach the user how to install log collectors and how to configure ESXi 5.5 for remote logging Open console session and mount vCenter Server installer: Install vSphere Core Dump Collector: Install vSphere Syslog Collector: Open vSphere Client: Config…
This video shows you how easy it is to boot from ISO images for virtual machines with the ISO images stored on a local datastore on the ESXi host.

790 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question