VMware iSCSI reset on each hour

We have a strange problem: On the top of each hour (7:00, 8:00, 9:00, etc) we get a iSCSI reset.  I am working with Dell, EMC and VMware on this but still don't have an answer.  The key issue seems to be a VMware Host is dropping connections when the latency builds up and then  does a power on reset, which effects all hosts.  I'm not sure why this happens at the top every hour as opposed to a random time.  Once the reset happens within a minute everything is back to normal until the next hour.  The EMC VNXe3200 does not report any errors only the VMware Hosts report errors and this happens on all 3 hosts at the same time.

iSCSI configuration: We have 3 Dell R630's running VMware ESXi 6.0 (2809209) which connects to an EMC VNXe 3200 (SP A/ SP B Active/Active) on two unique Dell 6224 (1 GB) switches.  Each host has 2 connections on two different subnets to the SAN.  MTU 1500.  This is a separate iSCSI only network.

When the reset occurs, the system on the applications is one of network loss (Outlook can't connect, then connects OK).
We did have only 1 LUN configured though I'm in the process have adding more LUNs and distributing the VM's across them to help with I/O.  It seems that some process must kick off at the top of each hour but I can't find it and all 3 vendors say they don't have any idea what it could be.

Any help would be much appreciated. - Thanks...
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

joolsSenior Systems AdministratorCommented:
What network driver is being used on the Guest VM's??

Have you tried increasing the MTU? In some cases this can make things worse as we had issues in one of our environments.

Alas we dont use Dell kit and dont have that model NAS :-(

Can you monitor the systems to find out what each is doing prior to the fault showing? Is it heavy on the Network IO?
DigitalInfuzionAuthor Commented:
We upgraded the Dell NIC Firmware and VMware NIC driver (Broadcom) to the latest version.  I can't change the MTU, unless I bring down the whole environment which is something that is hard to do at this time.

VMware Errors:
Lost access to volume ... due to connectivity issues.  Recovery attempt is in progress and outcome will be reported shortly.
Sometimes I get: Path redundancy to storage device ... degraded.
Successfully restored access to volume ... following connectivity issues.
joolsSenior Systems AdministratorCommented:
So does the VM host have 2 physical NIC's assigned or is it all in a vswitch and the guests are using the vmnic as opposed to say e1000?

Have you checked the physical switch connectivity/error logs? Changed ports/vlans?

It could still be a physical cable!

You need to go thru all the logs on the guest, ESX server switch and anything else in between and get as much info as possible. I would expect that the support personnel from Dell et al have gone thru all of this but I'm surprised they came up with nothing!

Feel free to attach logs here, The more info you can give the easier it is for people to make informed comments which in the long run also help others.
Acronis True Image 2019 just released!

Create a reliable backup. Make sure you always have dependable copies of your data so you can restore your entire system or individual files.

DigitalInfuzionAuthor Commented:
I have attached a network diagram and Host 1 logs for events that start at the top of an hour.  These logs are very similar across each host (1/2/3) and each LUN (1/2/3).   They occur every hour on the hour.

The one thing we can't answer is why this happens every hour on the hour.  What is the trigger?  It seems like some process must kick off then, otherwise if it was only a network/driver/hardware/iSCSI problem it would happen more randomly.
joolsSenior Systems AdministratorCommented:
whats the guest OS on the hosts?
DigitalInfuzionAuthor Commented:
We have multiple VM guests, mainly Windows 2008 R2  and 2012 R2 with a few Windows 7, 8, 10 and a couple Ubuntu servers.   We can move these around to various hosts within the cluster and we still have the same pattern (every hour on the hour we get a power on reset from VMware).

We have now upgraded to ESXi 6.0.0 3029758 (update 1) and still have the issues.  Dell has provided the proper drivers/firmware which is all up to date.  Still no errors on the EMC side of things.  I've sent many logs to VMware and they are still reviewing them.
joolsSenior Systems AdministratorCommented:
what about errors on the switch?

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
DigitalInfuzionAuthor Commented:
The switch was the key.  It turns out that the switch management port and the iSCSI network got connected to the same LAN.  This allowed Corporate network traffic to interfere with the iSCSI network though they are on different subnets.   Disconnecting the management port resolved the issue.  Why this only happened on the hour, who knows??   --  Thank you for your assistance with this.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.