Cisco switch troubleshooting

Here’s the design in bullet form:

•      The existing core switching is 1-Gbps (couple Cisco Catalyst 3650 48-port stacked switches, almost at full density), and the new 10-Gbps switches are Cisco Catalyst 3850 SFP+.
•      The 3850s have UCS servers (as well as Veeam backup, etc.) connected at 10-Gbps, but are connected to the core switching at 1-Gbps.
•      The 3850 SFP+ interfaces connecting to the core 3650s are configured for 1-Gbps operation.
•      The core 3650s are connected to ASA firewalls at 1-Gbps, which provides a DMZ for externally-facing applications.

It turns out that a majority of server-to-server traffic is between internal SQL instances and public resources (web & application tiers) in the DMZ, so the traffic goes 10-Gbps from ESXi to the 3850s, then has to be sent over 1-Gbps to the core 3650s towards the DMZ. When we first tried to cutover to this deployment, all server access pretty much stopped. Troubleshooting revealed that the outgoing interfaces on the 3850s were exhibiting an extremely high number of interface drops/discards. Since then, the customer is only extending very limited backup traffic (a couple small applications) over these connections, and the interface discards are still outrageously high. (Not sure if related, but the 3850 switches are also running unexpectedly high CPU utilization of 70%, and again, aren't handling most of the server traffic yet.)  As you can see in the design diagram below, the ESXi environment still has multiple 1-Gbps connections bypassing the 3850s so they're still up and running with the old design until we solve these problems, and can start sending all ESXi server traffic through the 3850s.

The images below are a VERY simplified design of the components (in reality, all of the connections and devices are redundant), a “show interface” snapshot from one of the 3850 interfaces leading to the 3650 core (again, configured as 1-Gbps), and a Solarwinds screenshot showing interface discards.

I’m looking for input regarding how to attack this problem. Can we easily solve using a shaping policy on the 3850 interfaces?

Thank you

Current designInterface discardsShow-interface.txt
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Traffic is starting to be dropped when buffer is full. Interfaces are congested (as it is typical when going from high throughput interfaces to lower throughput interface). It is simple - you need more throughput.  :)

You can try to change buffer allocation with service policy, but it may not have any effect to drops (but still can be tried).

For all possibilities there should be more details included (but my guess that either redesign or adding more throughput is needed).
Generally, for topology drawing above - you could try manipulate STP so traffic from EXSi is using etherchannel (4x1Gb) than 10Gb uplink since on link between 3850 and 3650 traffic may be dropped due to lower throughput (if traffic is forwarded from EXSi to 3850 and then to 3650). But since it is simplified topology drawing may be misleading many details are not known including VLANs present on trunks and traffic patterns.
cfan73Author Commented:
@Predrag Jovic - Thank you for your response, and of course I have some follow-up comments/questions.

While I understand the "more throughput" argument, I'm hopeful that the answer to this particular situation wouldn't land on needing to provision 10-Gbps connectivity all along the data path from the ESXi cluster and the servers in the DMZ. Oversubscription is a network reality (even in the data center, unless you're looking at Clos fabric designs), so there are always going to be situations where trying arriving on one interface will be switched (or routed) to a lower-speed interface.

In this particular case, we're trying to allow traffic between server instances running on the ESXi to get the advantage of 10-Gbps throughput, while traffic going between ESXi instances the apps in the DMZ (which will never require more than 1-Gbps throughput), can still use the same 3850 fiber switching. My hope was that we could deploy traffic shaping or some form of QoS on the 3850 outbound ports to handle the situation, but maybe that's not a workable approach. (I don't believe we'd have the ability to shape outbound on the ESXi side for particular VLANs, either, but that's not really my territory.)

Maybe the buffer allocation service policy recommendation might provide some value, but you're indicating it may have no effect on the drops?

Ultimately, I'm just surprised that this seems such a tough issue to address, where it would seem that this type of interface speed mismatch would be more common vs. a rarity.

Thanks again
You don't have 10G link on Gi1/0/12, just 10Gb capable interface that is dropping traffic - you have 1Gb GLC-T there.
TenGigabitEthernet1/0/12 is up, line protocol is up (connected)

  Hardware is Ten Gigabit Ethernet, address is 50f7.2240.dd8c (bia 50f7.2240.dd8c)
  Full-duplex, 1000Mb/s, link type is auto, media type is 10/100/1000BaseTX SFP
Shaping , in situation where you have buffer is overwhelmed with traffic, is useless to prevent drops. Shaping is done by keeping traffic is switch buffer until link is free so traffic can be sent. In your case buffer is full (that's the reason for traffic drops)- shaping will not help. Adding QoS you can just prioritize one type traffic over another (so it can be used for example to protect critical application packets to have a fewer dropped), but you can't prevent drops if there is no buffer where traffic can be stored until it can be forwarded. Interface Te1/0/12 is just receiving too much traffic for it to forward, that is reason why traffic is dropped.
Powerful Yet Easy-to-Use Network Monitoring

Identify excessive bandwidth utilization or unexpected application traffic with SolarWinds Bandwidth Analyzer Pack.

cfan73Author Commented:
@Predrag Jovic - Thanks again for your input. I'm going to ask for a bit more help, and hopefully I can settle on an appropriate move-forward solution...  

I understand the drops issue, and thanks for your clarification on shaping - makes sense. It would SEEM at this point the only options would be to somehow slow traffic down at the source, or build out 10-Gbps connectivity from end-to-end. (I'm still bothered by the latter option, since it would seem as if "faster to slower link" paths exist in pretty much all network designs, so why wouldn't this be a more common issue?  For example, let's say we have 10-Gbps links between 1-Gbps access switches and the campus core. Wouldn't all traffic returning to the workstations be suffering this issue and having problem once it hits the 1-G access ports?  OR, is the issue somehow related to the fact that we're using a 1-Gig optic in a 10-Gig SFP+ slot (vs. a true 1-Gbps interface)?)

Regardless, I'm battling a bit to figure out what a re-design would look like. These DMZ servers don't connect at 10-Gbps (nor do they need to handle traffic at a rate anywhere near 1-Gbps), and thus going through the expense of enabling 10-Gbps end-to-end would not fly.

Thanks, and I appreciate your patience.
"faster to slower link" paths exist in pretty much all network designs, so why wouldn't this be a more common issue?
It is common issue, it is called oversubscription. :)
There are two principles of dealing with traffic congestion:
- congestion avoidance (dropping some TCP packets when buffer starts to fill up - in Cisco's case RED/WRED)
- congestion management (Configuring QoS to prefer one type of traffic over another, so traffic that has better treatment is less likely to be dropped)
Influence of dropped packets on throughput may vary, in case of UDP it may render traffic useless. In the case of TCP it will slow down traffic speed by half when packet is dropped, but it will also made dropped packets to be retransmitted (it can even generate more traffic). If traffic is fragmented in 6 packets and one of packets fragmented packets is dropped, whole information will have to transferred again (all 6 again). Interesting behavior is that if TCP and UDP are mixed together and packets are starting to be dropped - UDP will start to "take over" bandwidth for itself.

Issue that you are running into can be compared to Facet - Kitchen sink - Drain:
Facet is data coming into buffer.
Kitchen sink is buffer.
Drain is how fast data can be forwarded out of buffer.

If Facet has higher throughput than drain water will not be dropped (spilled all over the kitchen) until buffer is full.

That is situation with going from 10Gb to 1Gb.
Water is filling a kitchen sink faster than drain can get. As soon as kitchen sink is full, water will flood kitchen (lost packets).

Cisco has some rules of thumb regarding oversubscription, 20:1 ratio from access uplink in direction of distribution switch, and 4:1 ratio from distribution ulink in direction to core. Sure, this is not applicable to all environments, if there have 4 servers that are forwarding 1Gb of traffic and  1 uplink to core that is 1 Gb (3/4 of traffic from servers will be dropped on uplink).
cfan73Author Commented:
@Predrag Jovic

ALL apologies for the delayed response on this thread. I was out of the country the last several days...

I understand oversubscription (mentioning it a couple posts prior), and was simply trying to determine if there was a way to design a better solution, based on VLAN reqs or otherwise, that could potentially help the situation in this case, vs. the issue potentially just being a result of having a 1-Gbps optic in a 10-Gb SFP+ port, etc.

Sorry to persist on this one, but I'd appreciate any final advice on how to move forward with an appropriate design w/o having to recommend 10-Gbps infrastructure to the DMZ.

Thanks again
cfan73Author Commented:
Apologies for a repost follow-up, but still hoping for final thoughts on this one.

Thanks again
I can only repeat what is already written above. There is no magician with magic wand to solve problem. There is no instant solution - the one that will fit to all cases. You know where congestion is, what are your traffic patterns and what are your options.

If you have too much traffic loss - QoS can't really help (when switch buffer gets full, all new traffic can be dropped - then QoS can't help) - you need more bandwidth. If traffic loss is not too big then QoS might help to prioritize important traffic over less important. It is that simple.
cfan73Author Commented:
@Predrag Jovic

I want to thank you again for your patience. I was again traveling and unable to respond to this one in any kind of timely fashion. All apologies. I'm back now for the foreseeable future.

If I may, can I base my last couple questions down to a simplified network diagram, and I think this will get me over the hump. I’m really not looking for “quick answers” (so to speak), but rather to make sure I fully understand the problem we’re battling.

Take the following diagram, where we have a oversubscription (10G to 1G) situation on the switch identified as “Access.” Workstations are connected using 1-Gig links, but connectivity back to the collapsed core is 10G. So, a pretty typical one-site network design.

The workstation on the right has requested data off of servers in the DC, and it will take more time to read all of the requested data than will be allowed by the interf ace buffers on the “Access” 1-Gbps workstation port. The servers are sending data as fast as they can, which is carried over 10-Gig connections to the Core and out to the Access switch, at which there’s a potential problem, and we’d have our overruns, correct?

The above is a pretty standard network design, built for oversubscription (within the understanding that not all access hosts are sending data on a consistent basis. That said we’re still hitting a bottleneck every time requested data is being sent to workstations from the DC (such as opening a sizable e-mail attachment).  

Two questions:

•      Is there some reason the above situation is any different that the opening scenario? (For example, in the above case, the user-facing interfaces are 1-Gbps PHY, vs. the 10-Gig ports on original case switch w/ a 1-Gig SFP inserted.)

•      Would the fact that there are overruns cause any actual data loss (or cause a communication failure of some sort), vs. TCP just resending the lost packets?

Thanks again – I owe you a gift card if you PM me your e-mail/PayPal address.
The servers are sending data as fast as they can, which is carried over 10-Gig connections to the Core and out to the Access switch, at which there’s a potential problem, and we’d have our overruns, correct?
Is there some reason the above situation is any different that the opening scenario?
Saturation of links can always be expected, in all circumstances. The only question is how often, on which place, is it problematic on that place, how to deal with it etc ...
For example if we have 48 port switch, all hosts are attached switch with 1Gb links and there is 10 Gb uplink (but still could be just 1 Gb uplink).
If all hosts start to transmit max throughput, 10Gb uplink would be saturated. But don't forget that there is possibility that it can happen also inside of switch itself (ports 1-47 are sending traffic to port 48). We can read in documentation what is maximum bus throughput inside switch, but if all that traffic goes to single interface, interface itself will not be able to buffer all traffic (due to limited buffer size) to deliver traffic to end host (or next device upstream or downstream).
There are also micro bursts, for example all ports start transmitting a lot of data at once for just few milliseconds. Buffer can be filled on destination port and a lot of traffic will be dropped.
Would the fact that there are overruns cause any actual data loss (or cause a communication failure of some sort), vs. TCP just resending the lost packets?
It can cause all sorts of problems. For example: voice don't tolerate high data loss (and don't try to recover lost packets). Even worse situation is that QoS is not configured (or misconfigured) in which case voice traffic can have high jitter beside higher latency will be in place which leads to poor voice quality.

There are few different scenarios here:
1. UDP traffic will just be dropped and not recovered (it can be recovered if application has built-in recovery).
2. TCP will, most likely, be recovered, however there will be a lots of packet retransmissions (there is also limitation where is the limit for packet recovery - TCP sessions still can be dropped - more in scenario 3). For fragmented packets that also means that if one of fragments is lost - whole fragments need to be resent again. Which leads to situation that packet loss generates extra traffic (sending the same packets multiple times) and additionally slowing down all traffic.
3. One of the facts that is obscured in congestion. UDP is a ruler of congested interfaces without QoS (or mixing UDP and TCP in the same QoS class).  
   Each time TCP packet is lost TCP will reduce windows size, slow transmission etc. However, UDP does not have control mechanism and will continue to be transmitted at the same rate. UDP will start to fill up buffer which will lead to less chance for TCP packets to even get into buffer and even more TCP packets are dropped - which leads to additional slowing down of TCP sessions which can lead TCP sessions age out and dropping TCP sessions.

So, point 3 is the reason for recommendation that can be typically found in QoS documentation - do not mix UDP and TCP traffic in the same QoS class.
QoS can help only until some point (until situation is tolerable) after that point additional throughput need to be added. That fact is obvious on WAN interfaces. For everyone is understandable why there is a need to increase internet speed from 1Mb to 10Mb. The same goes to congestion inside of LAN, but generally we get used to expect that LAN will perform well and it is, typically, hard to explain that is it practically the same situation, but on different place in network.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
cfan73Author Commented:
@Predrag Jovic

Thanks again for all of your patience/help here. Again, provide your Paypal account, and I'll send you something.  :)
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.