Link to home
Start Free TrialLog in
Avatar of deanavey
deanavey

asked on

VMWare ESXI 4.1 update 1 stopped booting

I have a backup VMware server running on a Dell PE R710 that has been working fine for a year. It's just a backup/test unit that doesn't get used much. It has VMware ESXi 4.1 update 1 installed and was working fine until I tried to turn it on yesterday and the boot process just hangs. NOTHING has changed on it. It was running fine, powered off, then powered back on after a few weeks of not being used. No hardware changed, no updates applied, just powered off and back on.
Pressing ESC during boot up shows that the last line displayed reads:
Booting: MBI=0x000100c0, Entry=0x00400256

All the searching I have done so far involves problems trying to do a first install of VMware, which is not what I'm doing. This was a fully functional VMware host a couple weeks ago that ran virtual machines just fine and today it will no longer boot.

So far I've done the following with no help:
1. Ran Dell Diagnostics, no errors reported
2. Reseated RAM, Hard Drives and the bootable USB drive
3. Left it running over night but it's still at the Booting: error above

Does anyone have any insight on why a previously booting VMware host would suddenly stop with this error?
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Get another USB flash drive and reinstall ESXi 4.1

It will not overwrite vmfs or VMs
Avatar of deanavey
deanavey

ASKER

Ok, sorry, I left out the #4 thing I tried, which was to put in a new USB drive, boot to the VMware esxi 4.1 update 1 install cd and try to reinstall. The CD boots and stops at the same message:
Module: install.vgz
Loading install.vgz............
Booting: MBI=0x000100c0, entry=0x00400256

So in that respect I'm experiencing the same problem I see reported by many others installing for the first time. Except that I had this installed and working fine and only recently got this issue.
Are you using a standart version or a DELL version ?
Is this the DELL OEM version?
Any hardware changes?
BIOS changes reset to defaults
Update hardware firmware using Dell CDROM

Are you using shared storage?
Do you have an SD Module on your server ?

some DELL servers only allow this type of installation if you buy an esxi pre installed module or if you install on a SD module SDcard.

check with your DELL support
R710 supports both USB internally preinstalled or SD card if you have SD module.
Update DRAC firmware reset to defaults

Made any changes here
This is just the standard version that I installed on it about a year ago and has been running fine all this time. All hardware was detected properly and functioned perfectly without a hiccup.
No shared storage on this server, just local disks and the usb drive that I use to boot it.
I've got a Dell VMware ISO for the R420 that I thought about trying but I saw a lot of forum posts where people trying to install a fresh iso said the Dell version didn't work either.
No changes were made of any kind. It was powered down after testing and powered back on a few weeks later for another test and it won't boot. It sat untouched for those few weeks.
Might the battery that maintains the BIOS be faulty already and have scrambled the settings? I did check that VT is enabled in the BIOS but if it's scrambled due to a faulty battery then maybe I can't trust the settings I see in there? Anyone know if there is a procedure to something like set BIOS back to factory defaults, after I leave it powered on for a couple days, and then try booting? I've seen people report this error when they try to install VMware on older 32bit hardware. Could a scrambled bios make it appear to be 32bit after sitting powered down for several weeks?
I am sure it's not that I use the usb drive on the R710, I've got about 8 other R710s all booting the exact same way without ever having any problems like this. The only difference is that they are always powered on and this one can sit unplugged for weeks or months before it's needed.
USB and SD cards are both supported.

You would see an error message at boot if the BIOS settings or battery was low and needed replacing.

Intel VT if disabled, normally I've seen these errors, when trying to install on 32bit CPUs, which the R710 does not have. Unless you have a faulty processor, memory, or motherboard.

Seems to point to faulty hardware, no changes in disk selection, storage controllers.

It would be interesting to know if ESXi 5.1 worked? I would always recommend using the DELL versions, the OEM versions have functions to commnicate with DELL hardware, and additional drivers.
I've got Dell VMware install CDs for 4.1 and 5.1 so I'll give them both a try. I wasn't the first one to power it on when this happened so I am not 100% sure it didn't report a bios/battery error on the first power up. Just for testing, does anyone know the proper procedure to reset a dell R710 bios to factory defaults? Is there such a thing? If so then I may give that a try as well when I try out the Dell VMware discs.
Is the bios reset Alt-F, while in the bios settings screen?
Alt-F did restore the BIOS to defaults but didn't change anything.
I booted to the Dell 4.1 installer and it halts at the same message.
Tried installing 5.1, it got the line about loading tools and the top progress bar made it about 80-90% across and the screen went blank and it just sat there and did nothing else.
I also swapped out RAM (it has 12GB total, 3x2gb on each CPU) 2gb per CPU at a time and confirmed it halts no matter which pair of memory I use.
Also ejected all hard disks, booted and still fails at the same message.
Dell quick diagnostics passed and am waiting for extended diagnostics to complete now.
Extended diagnostics also passed. VMware maintenance expired for this server so I can't easily open up a VMware support ticket. I'm going to have to throw this one back to the client to see what they want to approve as a next step.
May have to try it but I've never had good luck convincing Dell to replace a part when diagnostics won't show a failure. I've got a desktop computer I just called on last week that keeps having blue screens and wanting to run checkdisk all the time. Asked them about replacing the drive but it passes diagnostics so they say it's a software problem. A shame, since the Dell warranty hasn't expired, only the VMware support expired, of course.
I'm having the unit shipped to my office where I can do more extensive work over time:
1. BIOS Update
2. Swap out Perc6i RAID card

And whatever else I can think of while I wait for it to get here.
We've got R710's here, so if you want to compare notes, versions etc

Working with all versions of ESXi.
The unit arrived at my office last Friday, but, due to Memorial Day holiday, I didn't get to test until today. First thing I did was sit it on my workbench, before plugging anything in, I held in the power button for 30 seconds, which I've been told is a good way to fully discharge everything. I would think sitting unplugged for several weeks, as it usually does, would accomplish the same thing but maybe not.
Low and behold, I power it on and it boots right up within about 5 minutes!
I'm somewhat dumbfounded now as to what happened. All that was shipped to me was the server itself so I'm using a different keyboard (and no mouse). At the clients office there was a mouse and keyboard that I moved to other USB ports but didn't actually try booting without them.
Ever seen a mouse or keyboard prevent VMware from booting?
ASKER CERTIFIED SOLUTION
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
After testing Dell, Logitech and Microsoft branded keyboards and mice, and having no issues at all, I returned the server to the client, hooked it up exactly as before and found it halted at the very same spot again. Keyboard was HP, mouse was generic. Unhooked the mouse, rebooted and it booted right up!
So the important lesson to take away here, don't use generic mice. On my own servers I don't use any mice at all except for those devices that I put in a rack and attach to KVM switches. Never had a single problem with the KVM devices either.
Thanks for helping me work through the various possibilities and assisting in the troubleshooting process!
It was a generic usb mouse that ended up being the cause.