vmware vsphere lost connectivity all guests powered off

I have 2 hosts with HA and FT, each has a fiber link to iSCSI san.  Each also has separate a vswitch for guests (vswitch1) and console/vmkernel (vswitch0).

I lost physical connectivity to the switch (it was power cycled).  I got an alert about vswith0 losing connectivity, no alert about vswitch1.

Here's the weird bit - when the switch came back online, ALL of the guests on both hosts were powered off.

As far as I can tell the fiber switch did not lose power - so the connection to the SAN *should* have been good the entire time.

My theory is that both hosts tried to vMotion their guests to the other host, which was also down, and the net result is everyting powers off.

Question 1: would loss of L1 connection for console/kernel cause all guests to end up powered off?

Question 2: if I lost connection from hosts to SAN, shouldn't I see an alert related to the storage adapters as well?
snowdog_2112Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

coolsport00Commented:
Q1: Check the settings for your HA...does it say to power down VMs, VMotion?
Q2: Not necessarily; there may not even be anything in the logs (a similar situation happened to me a few mos back...lost connection to host and don't know why, and not anything was in the logs).

~coolsport00
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Paul SolovyovskySenior IT AdvisorCommented:
Check your isolation responce on the cluster.  If the isolation responce is to power off than it did this correctly.  Basically it says if your ESX hosts can't communicate to the default gateway it will power off the VMs to avoid both ESX hosts bringing up the VM at the same time.

This will explain:

http://www.yellow-bricks.com/2009/05/24/vsphere-ha-isolation-response/
0
rvivek_2002Commented:
This is due to "isolation response" setting in HA cluster settings. When a Host can not communicate with other hosts on the cluster, then the Host try to ping the default gateway , if that fails , the host thinks that it is isolated from the network. Then the host would try to act as per your settings in "isolation response". I believe you have set it to "power off VMs"
0
The 7 Worst Nightmares of a Sysadmin

Fear not! To defend your business’ IT systems we’re going to shine a light on the seven most sinister terrors that haunt sysadmins. That way you can be sure there’s nothing in your stack waiting to go bump in the night.

snowdog_2112Author Commented:
You are correct, the isolation was set to "power off", and the failuredetectiontime was the default - 15s.  The switch was offline for about 45 minutes.

This is good, however...the VM guys (keep in mind, these are the knuckleheads who set this up - I'm just cleaning up the mess) said "the switch rebooted, so start a case with Cisco to look at the switch logs".

Whoo...that's funny stuff...

Um...yeah, I know the switch freaked out, but why did all the VM's POWER OFF!  Wow.

Thanks for the links!  Very useful stuff!
0
snowdog_2112Author Commented:
I split points because coolsport was first, paulsolov led me to the info, and rvivek provided some good background/foundation info.  THANKS A TON!!!
0
Paul SolovyovskySenior IT AdvisorCommented:
Just a quick question.  You said that each has a fiber link to an iSCSI SAN.  Is the link to the SAN Fiber or Copper?  Usually iSCSI is hardware or software initiator but most of the time the hardware initiator (HBA) is still cat45.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VMware

From novice to tech pro — start learning today.