Cannot re-connect VMWare Datastore after configuring Jumbo MTU

I have an environment of VMWare 6.5 hosts running on HP bl460c servers in a c7000 enclosure. All 16 servers have two vmnic's dedicated to iSCSI, and attached to a DS which is dedicated for storage traffic. These vmnic's are bound to vmk adapters which are in turn bound to the iSCSI software adapter on the host. The server nics are connected to Cisco switches via the passthrough interconnects on the enclosure.

For my storage device, I have a NetGear ReadyNAS 4312 with four 1Gb interfaces. I have configured these into dual bonded nics and both bonds are set with Layer 2 LACP modes and configured channel-groups on the Cisco switches. The bonds were assigned IPs, the vmks, were assigned IPs, and the software adapter was configured to discover the iSCSI LUN on the appropriate IP addresses.

During the original configuration, all MTUs in the path were at 1500. All 16 servers saw the storage and it was presented as a datastore to the cluster for which it was intended. All seemed well, but once the VMs were all running on the datastore, we began to have latency issues and significat lag when interacting with the VMs, whether by SSH or the console. As a test, I moved a couple of the trouble VMs to a different storage and the problems were eliminated. I knew that jumbo frames were supposed to be best practice for iSCSI but didn't want to shutdown the entire environment in order to configure and bounce all the switching but after these problems, and reading several papers on the subject, I decided to reconfigure the switches and all devices in the path.

My current network configuration is the same, except that the vmk's, the DS, the physical switches and the bonded interfaces now all are set with an mtu of 9000. (The switches are configured with system jumbo mtu 9000). As far as I can tell, the pass-through interconnect modules on the c7000 have no configuration as they simply connect the mezz card in the server blade to the physical switch with no actual switching going on.
So it looks like this: (ex. server 1)
physical external connection on c7000 interconnect bay 5 port 1 -> Cisco switch port 1 (jumbo MTU 9000)
physical internal port map from interconnect bay 5 port 1 -> bl460c mezz slot 2 port 1 = vmnic 2
vmnic2 attached to iSCSI Distributed Switch (MTU 9000 on DS) and bound to vmk2 (mtu 9000)
vmk2 attached to iSCSI-A Portgroup on iSCSI DS with ip address of 192.168.26.171


The problem now is that only one server will connect to the datastore as a datastore. All the other servers see the storage device but report it as Not Consumed. If I remove the configuration from the server which acknowledges the store, and rescan from another server, it will attach as an attached datastore with the proper name, and I will be able to move vm's on that server to that store.

I am able to vmkping the storage unit on both IPs as long as I don't use a size larger than 1492. Anything larger and the ping dies somewhere.

Other considerations:
- I have another DS which I use for all other traffic, including vMotion. It's MTU is still 1500.
- I have as my core switch, a 4507 and none of my blades in that switch is capable of being set with jumbo frames. This switch is where my gateway is for the  default TCP/IP stack of the hosts. I don't know how it could be an issue for the storage net since the traffic is not leaving the switch (AFAIK) to get from one place to the other.
- If I execute a traceroute on any of the servers, using vmk2, it's a single hop. The packet doesn't seem to bother with the gateway (why would it when the destination is on the same switch?)
- The vmkping fails for large size packets even on the server which connects the datastore.
- The physical adapter is a Broadcom NetXtreme BCM5715S which, according to Broadcom's UG Table one, supports jumbo frames.
- The driver being used is the tg3 driver 3.131d.v60.4-2vmw.650.0.0.4564106

I'm at a loss and would really rather not suck it up and try to make things work with 1500 MTU. Any guidance/help would be appreciated.
Dave LewisUnix Systems AdministratorAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

nociSoftware EngineerCommented:
Did you enable Jumbo Frames on all intermediate switches?

On the c7000 platform i know that HP used some broadcom chipsets that only support 7500 Byte Frames, setting the MTU to anything larger than the shortest package the path support will cause breakage.  I know for sure this on the BL860c i2 (Itanium system's).
Dave LewisUnix Systems AdministratorAuthor Commented:
Noci -

All switches are set to 9000. I have researched the NIC that I use and I cannot find anything specific about the maximum MTU on the Broadcoms that I use.

I have tried doing a vmkping at several different sizes, but anything greater than 1492 fails.
nociSoftware EngineerCommented:
Assuming several systems are in the same (V)LAN,
Can you ping any of the systems in the same (V)LAN?  (Same Enclosure, Different Enclosure).
That may help finding the direction where it fails.

(ping also has options to set a packet size and set packets for don't fragment).
Active Protection takes the fight to cryptojacking

While there were several headline-grabbing ransomware attacks during in 2017, another big threat started appearing at the same time that didn’t get the same coverage – illicit cryptomining.

andyalderSaggar maker's framemakerCommented:
I can confirm that the pass-thrus have no configuration and the Quickspecs on some NICs say they support both jumbo frames and the pass-thru module which implies the pass-thru also works with jumbo frames.

Can you isolate the iSCSI switches so they don't connect to your core switches? You don't really need a default gateway on iSCSI ports as as you say the traffic only goes through one switch.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Dave LewisUnix Systems AdministratorAuthor Commented:
Noci - Yes, the systems are on the same VLAN and they are able to ping anywhere in the VLAN with the exception of utilizing jumbo frames. I use the following to ping: vmkping -I vmk2 -s 8972 -d x.x.x.x and it dies. If I change the size to 1472, it works fine. Anything larger and death to the ping.

Andyalder: I have considered doing that very thing. I was just trying to avoid it. I simply don't see why the traffic would hit the gateway if the destination is on the same switch. <sigh> I would love to eliminate the gateway statement, but esxi seems only to accept either the default setting or the alternate in the settings.

I am going to configure a switch just to do this and will get back. Thanks.
andyalderSaggar maker's framemakerCommented:
I come from a storage/fibre channel background so I avoid connecting the front end LAN to the back end SAN (except for management ports) since it's not valid on a traditional network. To me connecting them together is added complication even though convergent technology says different. The Ethernet root bridge may have to support big packets even if they don't go through it since it tells the other switches what to do, presumably that is your core but the voting is horrible and by default the oldest switch wins.
Dave LewisUnix Systems AdministratorAuthor Commented:
Thanks very much for the assist and the push to do the separate switching. It turns out that once I pulled the systems off of the main switches and onto their own, the vmks came up and happily transmitted the jumbo frames.

Appreciate the help.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Storage

From novice to tech pro — start learning today.