[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 4493
  • Last Modified:

ESXi 4.1 Host crashes when GUEST OS is close to completeing large file transfer to a physical machine

Transferring completed torrents from a Windows 7 machine (1 of the guest VMs on the ESX box) to another win7 box (physical) esx "crashes".

There is nothing to indicate an error on the ESX screen, no PSOD, no kernal panic...just....dies... cannot plug in a keyboard to troubleshoot - as it doesnt detect, cannot ping any of the vms hosted by the box, cannot ping the box.. and it just shows what it always does on the main screen..

usually happens just after, or when the file transfer is close to completion (typically large files)

When ESX dies, vSphere disconnects (i can actually watch the file transfer via console view right up till the crash), and obviously because i cannot ping (ip or hostname) any of the vms and vsphere will not reconnect, i can not RDP to any of the guest os's either (one of which hosts DHCP, DNS, AD, etc, etc)

bounce ESX and everything is back to normal, until it comes time to transfer files again... :(

any suggestions? any logs i can look into? - without getting support from vmware?
0
datacomsmt
Asked:
datacomsmt
1 Solution
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
You can certainly inspect the logs

VMware KB: Location of ESXi log files

http://kb.vmware.com/kb/1021801 

What server do you have, is on the Hardware Compatibility List?

http://www.vmware.com/go/hcl

It suggest it's have issues with the datastore, or storage controller?

again is the storage controller on the HCL or on the following list

http://www.vm-help.com/esx40i/esx40_whitebox_HCL.php#Storage
0
 
Neil RussellTechnical Development LeadCommented:
I have seen this when the machines are not using the VMXNet3 NIC adapter type in the VM settings on some servers.  Try changing all your NICs to VMXNet3
0
 
datacomsmtAuthor Commented:
@hanccocka

"NOTE: this list includes a number of SATA
controllers that provide RAID functionily via a
software component in the drivers supplied with
the controller. Examples would be the Intel ICH
series and the nVidia MCP series. ESX 4.x and
ESXi 4.x do not support that software RAID
functionality thus you will only be able to access
the individual drives connected to controllers
such as these."

Mobo is : Asus P5N32-E SLI Plus

Northbridge: C55 a.k.a. nForce 650i SLI
Southbridge: MCP55P a.k.a. nForce 570 SLI

CPU: some Core 2 DUO

It's just a box i built to run ESXi , not a rack mount or anything specifically built to run VMs unfortunately : (

Im not running any form of raid on the box, hardware or software, just a single disk.


@Neilsr

I.....am pretty maximum noob at ESX sorry, not sure where to check this, but i see the NIC (on that specific VM) is labelled @ "Adapter type" as "E1000" whatever that means..?

and is installed on the VM as "intel PRO/1000 MT" PCI\VEN_8086&DEV_100F.....


I've just completely rebuilt the machine that im having troubles with (that is to say the VM, not the esx box), so i can say forsure it's definately not going to be a corrupt vmdk or something



Thanks for both your help so far :)
0
Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

 
datacomsmtAuthor Commented:
scratch that last, i just found what you mean on vmxnet3 (just googled)

will install that now and give it a whirl
0
 
datacomsmtAuthor Commented:
doesnt like the VMXNET3 driver :(  " This device cannot start. (Code 10)"

Is there somewhere i can get an updated version of the device driver? i searched on Vmware's website to no avail
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
This is similar to many questions we get on EE, with users that have built what we call" White Boxes", they may experience issues, because of incompatible hardware.

They may work or they may not work. "You mileage will vary."

Personally, I've not seen any issues with certified hardware, that causes "ESX" to crash when using virtual NICs, E1000, VMXNET2, VMXNET3 or older AMD Lance.

But I have seen many instances on unstable ESXi platforms, built on non-certifed hardware, that cannot be explained or fixed.
0
 
Danny McDanielClinical Systems AnalystCommented:
are you using thin provisioned virtual disks and the datastore is filling up???
0
 
datacomsmtAuthor Commented:
@danm66

nup, disks on each vm are static

@hanccoka
yeah, it's a bugger they dont have such a big list of supported hardware as windows.. but i spose windows has been around forever, and vendors build drivers specifically for it..

i appreciate the help all the same, and i realise i may not get it sorted - and thats cool, but i wanna try

i just realised im retarded and actually made the NIC "e1000" when i created the VM... so... there was no point trying to put VMXNET3 driver on E1000 virtual hardware, which is why driver didnt work.. recreated the NIC as VMXNET3, installed appropriate driver...and copied a big file..

... and... it.... hasnt.... crashed.... yet..... :D (but there have been occasions when it doesnt - will just have to wait and see)

 *crosses fingers*
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
It's an Enterprise Class Datacentre Server Operating system, the same is true for Windows Datacentre Server.

It's not designed to run on any old PC with an Intel chip!

It's not a Desktop operating system! (designed for the masses!)
0
 
datacomsmtAuthor Commented:
hahah, i know! im not fighting/arguing with you about that! - or anything else for that matter! :) .. was just saying

i think windows has given me unrealistic hardware support expectations :P you're quite right, it isnt designed to run on just anything with a cpu :P - i've just had mates tell me "oh i run it on a laptop at home and it's fine" so im like "well... i'll give it a go then" since ESXi is free................anyway, i appreciate the help, i'll let you all know if it dies again.
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
It's easier to get ESXi 4.1 running on a laptop or desktop (using Intel-VT) using WIndows something, and VMware Workstation 7.1 because the "virtual hardware" is compatible! This method works, but is slower, than installing on bare metal!

I think people have also forgotten that there is also a Windows Hardware Compatibility List!

0
 
datacomsmtAuthor Commented:
no luck :( died again

.. IF.. this is fixable... what exactly/roughly should i be looking for in the epic 417kb messages.txt file to help me troubleshoot?
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Is this for production or lab/home/test learning, if for the later you may have better luck in using VMware Workstation 7.1. (although this would have to be purchased).

This IS to be expected with non-supported hardware or hardware that has not been confirmed to work reliable by trial and error.

The areas that seem to affect ESXi support are storage controllers and network interface cards. If you get the correct storage controller and network interface cards, that are on the VMware HCL, you may have better luck.

http://www.vmware.com/go/hcl

The White box HCL may be of some assistance, although outdated today, because it's only for ESX 4.0.

http://www.vm-help.com/esx40i/esx40_whitebox_HCL.php

This is the issue with trying to use unsupported "built" hardware, some people take great pride, in stating, oh I build X, out of a pile of bits, and it now runs ESXi.

It's easier to find working components, by searching the forums, to alsmost give you some "guarantee" that the system will work, if you get the correct components. (but you'll spend a lot of time by trial and error, trying to get it wo work, and money).

Personally, I'm in favour, of purchasing a very low cost, refrubished/old server from eBay, that reliable works and is supported by ESXi 4.1. e.g. HP DL385/DL585, although not supported since ESX 3.5 U5, they do work with ESXi 4.0/4.1, not on the HCL. (so I wouldn't want to use in a mission critical environment).
0
 
datacomsmtAuthor Commented:
is for mostly home with a bit'o learning, will see what i can find as far as replacement goes i guess. thanks again for the help : )
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
If it's for home for learning.

Replace the host OS, with Windows 64bit (something), enable Intel-VT (I'm sure this is already running).

Install the trial of VMware Workstation 7.1, and create a ESXi 4.1 Virtual Machine, and run ESXi 4.1 in a virtual machine, (only disadvantage you will not be able to run any 64 bit VM machines).

see here

http://www.vladan.fr/how-to-install-esxi-4-1-inside-of-vmware-workstation-7-1/
0
 
datacomsmtAuthor Commented:
i've found a cheapish DL585

p.s. incase it's of any relevance to anyone in future, i managed to get syslog dump of ESXi right before it craps itself and the last meaningful entries are:

04-26-2011      04:03:56      Local6.Notice      192.168.1.200      Apr 25 18:03:51 vmkernel: 0:01:19:06.362 cpu0:4392)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x41027f39ec40) to NMP device "t10.ATA_____ST3400620AS_________________________________________5QH0D78A" failed on physical path "vmhba1:C0:T0:L0" H:0x4

04-26-2011      04:03:56      Local6.Notice      192.168.1.200      Apr 25 18:03:51 vmkernel: 0:01:19:06.338 cpu0:4392)ScsiDeviceIO: 1672: Command 0x28 to device "t10.ATA_____ST3400620AS_________________________________________5QH0D78A" failed H:0x4 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

04-26-2011      04:03:56      Local6.Notice      192.168.1.200      Apr 25 18:03:51 vmkernel: 0:01:19:06.330 cpu0:4392)ScsiDeviceIO: 1672: Command 0x28 to device "t10.ATA_____ST3400620AS_________________________________________5QH0D78A" failed H:0x4 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

04-26-2011      04:03:56      Local7.Debug      192.168.1.200      D:0x0 P:0x0 Possible sense da

04-26-2011      04:03:56      Local6.Notice      192.168.1.200      Apr 25 18:03:51 vmkernel: 0:01:19:06.315 cpu0:4392)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x41027f39ec40) to NMP device "t10.ATA_____ST3400620AS_________________________________________5QH0D78A" failed on physical path "vmhba1:C0:T0:L0" H:0x4

04-26-2011      04:03:56      Local7.Debug      192.168.1.200      D:0x0 P:0x0 Possible sense da

04-26-2011      04:03:56      Local6.Notice      192.168.1.200      Apr 25 18:03:51 vmkernel: 0:01:19:06.295 cpu0:4392)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x41027f39ec40) to NMP device "t10.ATA_____ST3400620AS_________________________________________5QH0D78A" failed on physical path "vmhba1:C0:T0:L0" H:0x4

04-26-2011      04:03:56      Local6.Notice      192.168.1.200      Apr 25 18:03:51 vmkernel: 0:01:19:06.272 cpu0:4392)ScsiDeviceIO: 1672: Command 0x28 to device "t10.ATA_____ST3400620AS_________________________________________5QH0D78A" failed H:0x4 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

04-26-2011      04:03:56      Local7.Debug      192.168.1.200      D:0x0 P:0x0 Possible sense da

04-26-2011      04:03:56      Local6.Notice      192.168.1.200      Apr 25 18:03:51 vmkernel: 0:01:19:06.260 cpu0:4392)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x41027f39ec40) to NMP device "t10.ATA_____ST3400620AS_________________________________________5QH0D78A" failed on physical path "vmhba1:C0:T0:L0" H:0x4

04-26-2011      04:03:56      Local7.Debug      192.168.1.200      D:0x0 P:0x0 Possible sense da

04-26-2011      04:03:56      Local6.Notice      192.168.1.200      Apr 25 18:03:51 vmkernel: 0:01:19:06.244 cpu0:4392)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x41027f39ec40) to NMP device "t10.ATA_____ST3400620AS_________________________________________5QH0D78A" failed on physical path "vmhba1:C0:T0:L0" H:0x4

04-26-2011      04:03:56      Local6.Notice      192.168.1.200      Apr 25 18:03:51 vmkernel: 0:01:19:06.228 cpu0:4392)ScsiDeviceIO: 1672: Command 0x28 to device "t10.ATA_____ST3400620AS_________________________________________5QH0D78A" failed H:0x4 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

04-26-2011      04:03:56      Local7.Debug      192.168.1.200      D:0x0 P:0x0 Possible sense da

04-26-2011      04:03:56      Local6.Notice      192.168.1.200      Apr 25 18:03:51 vmkernel: 0:01:19:06.216 cpu0:4392)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x41027f39ec40) to NMP device "t10.ATA_____ST3400620AS_________________________________________5QH0D78A" failed on physical path "vmhba1:C0:T0:L0" H:0x4

04-26-2011      04:03:56      Local7.Debug      192.168.1.200      D:0x0 P:0x0 Possible sense da

04-26-2011      04:01:48      Local6.Error      192.168.1.200      Apr 25 18:01:48 vmkernel: 0:01:17:03.325 cpu0:10083)ALERT: APIC: 1823: APICID 0x00000000 - ESR = 0x40
0
 
datacomsmtAuthor Commented:
Much credit/thanks to hanccocka for his continued and especially FAST help :)
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
You''ll  not got far wrong with a DL585 G1/G2, fantastic servers, and still work with ESXi 4.0/ESX4.1 U1, we still run them in "production" in our offices!

We use Quads, Dual Core, fully loaded configs, we don't use local disks, because we have SANs for VMs, and we use SSD for VMware View VDI work.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now