We help IT Professionals succeed at work.

TCP Sliding Window and iSCSI

When you are running iSCSI, is TCP sliding window an important consideration? The situation is is Cisco UCS fabric interconnect to a Nexus 5k switch. The switch frequently drops packets inbound from the UCS and this appears to be an issue iSCSI frames from UCS being 1514 bytes which the interface on the Nexus is 1500 and jumbo framing is not enabled. I don't know why the vast majority of frames make it on through yet a significant number (in the millions) are dropped.
The port channel spikes up to about 10Gbps and most of that will be iSCSI. So the connection initiator to target works for the most part. I've planned to enable the jumbo frames as recommended by Cisco so that the 1514 iSCSI will be better processed and not dropped.

But my question is this: With iSCSI, are TCP conversations lengthy or very brief? To what degree would some dropped frames (.003%) in-path cause an issue for iSCSI TCP conversation? Or would this percentage just be noise that TCP connection orientedness should just deal with?
Comment
Watch Question

Principal Software Engineer
BRONZE EXPERT
Commented:
It is likely that the packet drop problem is due to the MTU mismatch between the UCS and the Nexus.  When a full size frame of 1514 bytes is received, the Nexus will not be able to handle it.  This results in a retransmit and the packet must be cut in two.  All of this takes time and wastes bandwidth.

If the MTU mismatch is corrected the problem should be significantly reduced.  Suggest looking at correcting that issue first.
nociSoftware Engineer
BRONZE EXPERT
Distinguished Expert 2019
Commented:

IP MTU = 1500 means the underlying ETHERNET frame = 1500 + 6 + 6 + 2 = 1514 bytes. and optionaly 4 bytes longer when VLAN tagging is in play.

IP MTU is payload size for Ethernet (aka the Maximum TCP size).  

For an ethernet frame 6 bytes source, 6 bytes destination & 2 bytes protocol ID are added. (and not of importance for IP packet sizes).

VLAN tag = 2 bytes protocl ID indicating VLAN TAG + 2 bytes with VLAN-ID & priority.


So be carefull about what packet size you compare.... IP payload can even be less: depending on PPP (PPPoE f.e.) or other tunneling concept all eating about 20 bytes from the IP packet size due to the extra IP header.


Sliding window is quite a different concept, it is the amount of UnAcked packet that can be underway.   The window on a lan often is >100KB.

which is several packets that live on the wire or in network adapter buffers.


Any switch should be able to handle 1514 bytes/packet.... for bare ethernet, and be able to handle 1518 bytes/packet for VLAN processing. 

For jumbo frames check your hardware... 4000, 5700, 6000, 8000, 8192, 9000 (the 5700 is some glitch in some broadcom chipsets) are optional packetsizes.

 

BRONZE EXPERT
Distinguished Expert 2018
Commented:
5K nexus MTU is configured per switch (via QoS). My suggestion would be not to go with MTU = 1514 bytes, instead configure MTU = 9126 bytes.

policy-map type network-qos jumbo
  class type network-qos class-default
  mtu 9216
!
system qos
  network-qos jumbo

Verification:
show queuing interface ethernet 1/1 | in MTU

Jumbo MTU should be configured on full traffic path.
If FC is involved - FC frame size is 2148 bytes, but if MTU is equal to 9126 switch is ready for whatever is needed or might be in the future.

MTU on Nexus 5k can be changed without downtime (no drop packets or reboot needed), so it can be implemented at any moment.
amigan_99Network Engineer

Author

Commented:

These were great thoughtful answers. Thank you!


I have the change planned as you describe Pedrag. Thank you for confirming it's non-impacting. Cisco said as much as well. Still with iSCSI traffic flowing through it I'll be holding my breath as I apply the policy! I wish they'd put that on another switch. But that's what I've inherited. Shared switch, share trunks in fact.

nociSoftware Engineer
BRONZE EXPERT
Distinguished Expert 2019

Commented:

Changing MTU on the switch will not hurt in any way or form on communication (buffers will get larger, also less buffers may be available). 


Another thing is changing MTU on the IP stack. The IP MTU of all systems in the same broadcast domain (aka LAN) needs to be the same, or things will grind to a halt.

So if iSCSI is on a separate (V)LAN then it can be changed.

BRONZE EXPERT
Distinguished Expert 2018

Commented:
Another thing is changing MTU on the IP stack. The IP MTU of all systems in the same broadcast domain (aka LAN) needs to be the same, or things will grind to a halt.
Not really, at least for TCP traffic (I have no knowledge if there is some control mechanism for UDP traffic so I will consider only TCP traffic here - also I am deliberately ignoring the fact that MTU configured on host is not the same one MTU configured on Cisco switches)...

With all that said - the only requirement for switching equipment (the same VLAN) is that size of switch MTU configured on switches along path needs to be equal or bigger than MTU on hosts. Hosts are negotiating MSS during 3 way handshake, since  MSS + 40 = MTU end hosts will be aware of each other's MTU therefore MTU on hosts can be different.
nociSoftware Engineer
BRONZE EXPERT
Distinguished Expert 2019

Commented:

MTU != MSS.

Unix interface MTU = 1500 (ethernet) is net payload....  ==> switch packetsize = 1514.  (or 1518 with VLAN tags).


For any device: if a packet is larger than then MTU it is ignored. (and only counted in the statistic counter: packets too large).

TCP may have MSS negotiation UDP does not. So all UDP frames that are 1501+ bytes will be lost. 


You will experience blocked sessions as at sometime one side decides to send something larger and then to wait for the result that never comes.

On switches you need enough to capture any packet... 


Also if the endpoint communicate both with MTU = 9000, and there is some intermediate having less somewhere (Switch,router..) then MSS will be around 8960 range.

No communication will happen though.. (unless packets actualy are below 1500 bytes.. 

You mean you need to determine the PMTU (path largest MTU that still works) that should be determined BEFORE starting MSS negotiation in TCP.


The same issue will happen with devices tunnelling traffic, and some traffic having the don;t fragment bit set... been seen it. Some 1500 byte packets were not delivered across a PPPoE connection causing random breakdowns of Phone calls. Some people could call, some could not. (no Invites arriving) due to too many options and some path info in the SIP packets.

BRONZE EXPERT
Distinguished Expert 2018

Commented:
If MSS + 40 = MTU  (I did not want to go into details what 40 is in this case)
then obviously
MTU != MSS

Regarding
Unix interface MTU = 1500 (ethernet) is net payload....  ==> switch packetsize = 1514.  (or 1518 with VLAN tags).

For any device: if a packet is larger than then MTU it is ignored.
I wrote:
I am deliberately ignoring the fact that MTU configured on host is not the same one MTU configured on Cisco switches
Meaning MTU = 1500 on host L3 interface is not equal to MTU = 1500 on switch for L2 transport, but MTU = 1500 on SVI is the same one MTU = 1500 on host. I did not want to go into details and still not planning to do that. :)
TCP may have MSS negotiation UDP does not. So all UDP frames that are 1501+ bytes will be lost.
My explanation is ignoring discussion regarding situation when hosts are sending L2 frames bigger than what switch can accept:
With all that said - the only requirement for switching equipment (the same VLAN) is that size of switch MTU configured on switches along path needs to be equal or bigger than MTU on hosts.
Translation:
If switch configured L2 frame size is equal or bigger than what will be sent by end hosts in the traffic will not be dropped by switch.
If switch configured MTU is 9126 bytes than 5000 or 9000 bytes frames send by host will not be dropped (when all overhead is added).

So I guess we both agree on that.

Regarding:
UDP does not negotiate MSS
This only means that UDP has no negotiation integrated into protocol itself, however application can still negotiate MSS if needed/wanted, just as UDP has no reliability, but can be implemented into application,
You mean you need to determine the PMTU (path largest MTU that still works) that should be determined BEFORE starting MSS negotiation in TCP.


The same issue will happen with devices tunnelling traffic, and some traffic having the don;t fragment bit set... been seen it. Some 1500 byte packets were not delivered across a PPPoE connection causing random breakdowns of Phone calls. Some people could call, some could not. (no Invites arriving) due to too many options and some path info in the SIP packets.
No. PMTUD is related to L3 - there is no PMTDU in strict L2 (VLAN) so it is irrelevant for the topic.   since in previous post I am discussing strictly Layer 2 - this specific sentence
The IP MTU of all systems in the same broadcast domain (aka LAN) needs to be the same, or things will grind to a halt.
I am stating that MTU on different hosts/systems in the same VLAN doesn't have to be the same as long as switch can support bigger L2 frames than L2 frames that are being send by end hosts. And I was explaining that on example of  TCP since question is related to TCP behavior.
1. switches along path can be configured with MTU 4000, 5000, 9000 as long as end hosts L2 frames + any potential header is smaller than 4000 (the smallest configured MTU on switched path).
2. If different hosts in the same VLAN are configured with different MTU values in the case of TCP there will be no issues as long as statement 1 is satisfied. UDP could be more complicated regarding hosts, but I am network guy and that's why I stated
I will consider only TCP traffic here

Best practice is to configure all switches in domain to support the same MTU value, but TCP can still work with different MTU sizes if few simple rules are satisfied...