VMWare ESXI 5.0.0 Host Crashing

For the third time in a month one of my hosts has crashed, it happens when restoring an oracle database. Tricky part here is that the host crashes, not the guest operating system, well, after a while none of the VM's are available. Has anyone here experienced something similar and if so, how was the issue solved?

Thanks for the help.

Regards,

Francisco
felgueraAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

MazdajaiCommented:
5.0 is out-dated. You should consider updating to 5.1 or 5.5 as there may be known bugs related to host crash in 5.0
0
felgueraAuthor Commented:
That was my second question, can I update the host while it's running or I need to bring it down?
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Check your host hardware, CPU, Fans, Memory, run a memory test for 48 hours.

Do you get a PSOD ?

How does it crash? hang?

What do you do to restart?

Updating may not be the answer, if you have a hardware issue, updating brings it's own challenges, is your server supported ?

What version of 5.0, it's completely acceptable to be running 5.0.

You will have to shutdown the Host and VMs, to update.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

MazdajaiCommented:
The upgrade requires a reboot.

If you have vcenter, you can migrate the vms to another ESX host. If this is a standalone host. You will need to shutdown all the vms.
0
felgueraAuthor Commented:
No PSOD, no hardware issues, it is strange, only happens when a VM running RHEL 6 that we are using to run Oracle DB 11G r 2, when restoring the database from a backup, the host looses its network connection, have no idea why and it is just making my head spin.

To restart I have to power cycle the box, no memory issues, the error I get is : "Lost access to volume xxxxxx (machine name) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly"

Then: "Host is not responding"

Then all the vm show as disconnected and then all goes down...

We do have another ESXI host running 5.1 but even when the machines are from the same family, they are a couple years apart in age, cannot do vmotion...
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Live vMotion is probably due to mismatched processors!

Are you using an iSCSI datastore ?

What ESXi 5.0 build are you using?

What is the server?

On the HCL ?
0
gheistCommented:
How "everything goes down"
Is redhat vm responding? Can you access shared storage from other hosts?
0
felgueraAuthor Commented:
That is correct, live vMotion is due to mismatched processors, but cannot change them so I am stuck there.

The server is an IBM 3850 M2 120GB RAM

Datastore is connected by fiber channel through a cisco nexus, the pipe is 8GB

SAN is IBM v3700 and have plenty of space

Yes it is on the HCL, we even upgraded the firmware to it's latest version.
0
felgueraAuthor Commented:
Forgot the build, is esxi 5.0.0,623860
0
gheistCommented:
RHEL 6U5 requires ESXi 5.0 U2 while you have U1, so upgrade vmware (with all reboots involved)
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
The latest version of ESXi 5.0 is Build 1918656, released on the 14 July 2014, so you are little behind!

Update and then re-test.
0
gheistCommented:
U2 should be sufficient to run RHEL6, though later version will rule out more bugs.
0
felgueraAuthor Commented:
So I should update, cannot do it today though, for this is a running production environment. Will post my results.
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Yes, I would update ASAP!
0
gheistCommented:
At least get to 5.0.0 Update 3
If you have vcenter it would be a good time to consider getting VUM to work.
0
felgueraAuthor Commented:
More issues coming up, EE thinks I had abandoned the issue, not at all. We purchased a server and setup esxi 5.5 on it, moved the machine to that server and now we are getting an nvram write failed error. Again, will update when more information is available. Maddening, this is just insane.
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
I think it would now be a good time to open a new question on this recent issue has it has nothing to do with the original question asked.

Did you upgrade the host in this question?
0
felgueraAuthor Commented:
Yes I did, new host, same virtual machine. You are right I will close this question and move on to a new one.
Thanks!

FE
0
gheistCommented:
I think we wanted to know IF you
1) tried to validate old hardware with memtest
2) upgraded to at least formally supported ESX(i) for RHEL 6U5
0
felgueraAuthor Commented:
Gheist,

Memory tests were successful
There is no point in upgrading the old machine for the processors are not compatible, even when the two machines are from the same family. We purchased a machine that is exactly the same as the one we are going to keep on our environment, thank the internet and ebay! so we would be able to do vmotion and move on. I will open a new question for the new issue I am facing.
Thanks all for your comments.

FE
0
felgueraAuthor Commented:
I don't know how to close this question, so I am going to post my solution right here:

Replaced machine by one that would be compatible to the newest member of the environment, installed esxi 5.5.0 and then configured storage connections and fabric connections, afterwards moved the vm that was crashing the host to the new box.
Since the issue was continuing I did a couple of changes on the vm, changed SCSI to paravirtual and disabled (in options) fibre channel NPIV. After that was done I tried to restore the database again, this time the process was successful.

Thanks for the help.

FE
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
felgueraAuthor Commented:
I've requested that this question be closed as follows:

Accepted answer: 0 points for felguera's comment #a40330601

for the following reason:

It required me a lot of time to figure this one out, the obvious was stated by other users, updating to 5.5.0 made it possible for the hardware to use paravirtual SCSI which was not possible with the earlier version and disabling the fibre channel npiv.
0
gheistCommented:
You silently rang to accept Hancooka's first of the first comments and give some beef to thers asking you to upgrade ESXi
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VMware

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.