I have an environment of VMWare 6.5 hosts running on HP bl460c servers in a c7000 enclosure. All 16 servers have two vmnic's dedicated to iSCSI, and attached to a DS which is dedicated for storage traffic. These vmnic's are bound to vmk adapters which are in turn bound to the iSCSI software adapter on the host. The server nics are connected to Cisco switches via the passthrough interconnects on the enclosure.
For my storage device, I have a NetGear ReadyNAS 4312 with four 1Gb interfaces. I have configured these into dual bonded nics and both bonds are set with Layer 2 LACP modes and configured channel-groups on the Cisco switches. The bonds were assigned IPs, the vmks, were assigned IPs, and the software adapter was configured to discover the iSCSI LUN on the appropriate IP addresses.
During the original configuration, all MTUs in the path were at 1500. All 16 servers saw the storage and it was presented as a datastore to the cluster for which it was intended. All seemed well, but once the VMs were all running on the datastore, we began to have latency issues and significat lag when interacting with the VMs, whether by SSH or the console. As a test, I moved a couple of the trouble VMs to a different storage and the problems were eliminated. I knew that jumbo frames were supposed to be best practice for iSCSI but didn't want to shutdown the entire environment in order to configure and bounce all the switching but after these problems, and reading several papers on the subject, I decided to reconfigure the switches and all devices in the path.
My current network configuration is the same, except that the vmk's, the DS, the physical switches and the bonded interfaces now all are set with an mtu of 9000. (The switches are configured with system jumbo mtu 9000). As far as I can tell, the pass-through interconnect modules on the c7000 have no configuration as they simply connect the mezz card in the server blade to the physical switch with no actual switching going on.
So it looks like this: (ex. server 1)
physical external connection on c7000 interconnect bay 5 port 1 -> Cisco switch port 1 (jumbo MTU 9000)
physical internal port map from interconnect bay 5 port 1 -> bl460c mezz slot 2 port 1 = vmnic 2
vmnic2 attached to iSCSI Distributed Switch (MTU 9000 on DS) and bound to vmk2 (mtu 9000)
vmk2 attached to iSCSI-A Portgroup on iSCSI DS with ip address of 192.168.26.171
The problem now is that only one server will connect to the datastore as a datastore. All the other servers see the storage device but report it as Not Consumed. If I remove the configuration from the server which acknowledges the store, and rescan from another server, it will attach as an attached datastore with the proper name, and I will be able to move vm's on that server to that store.
I am able to vmkping the storage unit on both IPs as long as I don't use a size larger than 1492. Anything larger and the ping dies somewhere.
- I have another DS which I use for all other traffic, including vMotion. It's MTU is still 1500.
- I have as my core switch, a 4507 and none of my blades in that switch is capable of being set with jumbo frames. This switch is where my gateway is for the default TCP/IP stack of the hosts. I don't know how it could be an issue for the storage net since the traffic is not leaving the switch (AFAIK) to get from one place to the other.
- If I execute a traceroute on any of the servers, using vmk2, it's a single hop. The packet doesn't seem to bother with the gateway (why would it when the destination is on the same switch?)
- The vmkping fails for large size packets even on the server which connects the datastore.
- The physical adapter is a Broadcom NetXtreme BCM5715S which, according to Broadcom's UG Table one, supports jumbo frames.
- The driver being used is the tg3 driver 3.131d.v60.4-2vmw.6126.96.36.19964106
I'm at a loss and would really rather not suck it up and try to make things work with 1500 MTU. Any guidance/help would be appreciated.