Output discards on Cisco 3750 and 6509 ethernet ports

We are experiencing a lot of "output discards" on our Core Cisco 6509 switch as well as our cisco 3750 switches at several remote sites.   Century Link changed our cpe from the Cisco 3400 metroe switch to a device called a "RAD" last August, it has been since then that we  have noticed the drops.  We have implemented QOS, and also increased the "hold-queue out" parameters to no avail.  We can clear the errors and they seem to reappear shortly there after.  I was wondering if anyone else is having the same problem or has a solution.  One thing I think that has contributed was about the same time we changed two of our main applications to web based apps.  We may have a problem with the servers not being able to  handle the traffic, hence the discards.   We are not seeing any excessive bandwidth being used.  Any help or suggestions would be appreciated.
OlesurfdudeAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Zephyr ICTCloud ArchitectCommented:
Are you saying that the errors (drops) are happening on the ports where servers are connected or on other connections?
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
OlesurfdudeAuthor Commented:
They are on both servers and workstations.
0
Zephyr ICTCloud ArchitectCommented:
Could it be an mtu issue? Tuning this sometimes might give better/worse results depending on what you are seeing.
Are the ports configured in auto negotiating? And what about the servers/workstations?
0
PMI ACP® Project Management

Prepare for the PMI Agile Certified Practitioner (PMI-ACP)® exam, which formally recognizes your knowledge of agile principles and your skill with agile techniques.

OlesurfdudeAuthor Commented:
The dmark device is called a "RAD" which was installed in Aug 2014.  The MTU on the RAD is set to 9000, our Cisco equipment is set to the default of 1500.  We had a discussion with Century Link about this and they said it shouldn't matter, if the RAD see's 1500, then 1500 is sent out.   All the servers and workstations are set to Auto/Auto.  It is strange t hough that the jet-directs on the printers, are hard coated 100/full, 10/half etc and they have no output discards, but other erros on the half-duplex which we would expect.
0
Zephyr ICTCloud ArchitectCommented:
Hmmmm, is it possible to show me some of the port statistics of the impacted ports/connections?
For example:  

sh int gi 1/0/1 

Open in new window


and if possible the outcome of following commands:

show interfaces switching

Open in new window


show interfaces stats

Open in new window


show ip traf

Open in new window

0
OlesurfdudeAuthor Commented:
Here you go.  This is from a 3750 stack of switches in a building next door that is connected via a routed fiber interface.  There is no RAD device in between, just a direct fiber connection on int Gi3/0/1.  This switch seems to have the most output drops on a regular basis.  Thank you so much for taking the time to  help with this.
0
Zephyr ICTCloud ArchitectCommented:
No problem, I'm not seeing attachments though? Is it me or they got stripped?
0
OlesurfdudeAuthor Commented:
0
Zephyr ICTCloud ArchitectCommented:
I see you have probably all the servers connected to a FastEthernet port, did you already try by switching one of the (most impacted) servers to a GB port, to rule out output congestion? Or is the load not that high that it could be that? It's not the line between the buildings so it seems ... That one is fine.

It could be caused by short traffic-bursts...

Can you post the result of "show buffers" please...
0
OlesurfdudeAuthor Commented:
There are no servers connected to that switch, just workstations and printers.  But that stack is the one with the most drops.  I can show you the 6509 errors, all the workstations are on Gb ports as well as the servers.  I have a seminar now,  I wil lsend the information when I return, I am sending this from my smartphone now.  jc
0
Zephyr ICTCloud ArchitectCommented:
Ok ... no problem... Thanks
0
OlesurfdudeAuthor Commented:
Show-buffers-3660.rtf
here you go.
0
Zephyr ICTCloud ArchitectCommented:
Thanks, I see some buffer failures, which is not an immediate alarm but it could indicate issues ... How's the cpu doing on the switches? Can you post the result from "sh proc cpu history"?

There might still be an over utilisation on the ports coming from upstream ... Are the people using the workstations complaining about slowness or other issues?
0
OlesurfdudeAuthor Commented:
sh-proc-cpu-history.rtf
Here you go.

We do have occasional slowness, mainly on a few of our applications.  Our RIS program and a couple of others are having slowness issues all the time.  I was attributing the output errors to that ??
0
Zephyr ICTCloud ArchitectCommented:
Yes, the output drops are an indicator something is wrong and can attribute to the slowness, the cpu is fine in general but I do see spikes to 100% cpu use, can't immediately say when exactly though, if this happens during business hours it can indicate serious issues. You should do some spot checks during the most busy hours with:

show proc cpu sorted 5s

Open in new window


Pay attention to the percent of cpu but also if you see interrupts, it will be at the top something like 10%/1%

and do the occasional

show proc cpu history

Open in new window


The output of the ports (sh int FA##) where the servers (with the slowness) are connected would be interesting to see as well ... It's hard to pinpoint when you're doing it blind somewhat :-)

For completeness, can you also get me following: "show platform port-asic stats drop"
0
OlesurfdudeAuthor Commented:
That will take some time, I am going to have to do a port scan on the switch and get the port numbers.
0
Zephyr ICTCloud ArchitectCommented:
No worries, it's nice to have ... Take your time, if you can get the other command it might shed some more light on things.
Wish something would really pop-out, besides the dropped packets and buffer failures which can indicate capacity issues there's not much what really tells me what is wrong.
0
OlesurfdudeAuthor Commented:
The show platform port_asic stats drop didn't work on the 6509 but it did on the 3750's.  I did a sho int # on the 6509 core switch and the GbEthernet stac of 3 switches that are connected to the 6509 via a 2gb port channel.
GbEthernet-sh-int---port-asic-stats-dropGbEthernet-sh-int---port-asic-stats-drop
6509-sh-int-.rtf
3660-Sh-platform-port-asics-stats-drop.r
0
Zephyr ICTCloud ArchitectCommented:
Thanks a lot, yeah ... sorry forgot to mention that command doesn't work on the 6000 platform.
I'll go over the results.
0
OlesurfdudeAuthor Commented:
Awesome, thank you very much!
0
Zephyr ICTCloud ArchitectCommented:
Ok,

This part from stack (there's 2 or is it the same file?):
Supervisor TxQueue Drop Statistics
\par     Queue  0: 0
\par     Queue  1: 0
\par     Queue  2: 0
\par     Queue  3: 18025
\par     Queue  4: 0
\par     Queue  5: 0
\par     Queue  6: 0
\par     Queue  7: 0
\par     Queue  8: 16283
\par     Queue  9: 0
\par     Queue 10: 0
\par     Queue 11: 225
\par     Queue 12: 0
\par     Queue 13: 0
\par     Queue 14: 0
\par     Queue 15: 0

Open in new window


The numbers you see are the number of packets dropped before they reach the cpu, queue 8 is broadcast, so you have quite some broadcast on your network, the other queue (3) is routing, so routing seems to be suboptimal or your stack can't quite follow, queue 11 is icmp, we can ignore that a little at the moment but it's most likely multicast and should be looked at later...

Are VLANs in play in your network? If not, I'd think about it to get the broadcast down ...

I'll take a closer look at the output tomorrow, have to get some downtime soon :)
0
Zephyr ICTCloud ArchitectCommented:
Forgot to mention, the issue seems more focussed around the 3750 stack, how old are these devices? The things I see explains why the cpu sometimes spikes and can also be a cause for slowness ... especially if your devices get to deal with a lot of broad and multicast...
0
OlesurfdudeAuthor Commented:
Ok, thanks.  Yes we have several vlans configured on the 6509 and trunked to the GbEthernet stack of 3750s.  Ok, talk with you tomorrow!
0
OlesurfdudeAuthor Commented:
The equipment is old.  The 6509e was installed in Aug 2014 and was a refurb, not sure how old is actually is.  The 3750s are all at least 10 years old or older.  They all have the latest code installed.
0
Zephyr ICTCloud ArchitectCommented:
It will depend a little on the cpu, now the drops aren't very dramatic yet, but over time could get worse, it might be that the 3750's can't handle the load anymore, but that would mean there's quite a load change on your network since they changed the previous equipment to the RAD, not uncommon, but you know your network better than I do :)
0
OlesurfdudeAuthor Commented:
One thing that changed around the same time as the installation of the 6509E and the RAD devices is that 2 of our main high use applications were changed to Web applications.  One of them was on its own vlan, and had its own workstation separate from our MIS workstations.  So now we have one more heavy app running on the main vlan which would create a lot more traffic, and it is also bursty, sending/retrieving large data files periodically.   What do you think are an adequate number of workstations per vlan?
0
Zephyr ICTCloud ArchitectCommented:
Hmmm difficult question, that is only a question that I can answer with "it depends", but lets put a number on it of about 100 clients maybe ...

It would be wise to place the servers on a separate vlan from your clients, same thing if you have a lot of printers, WiFi again, another vlan... And besides that make sure you quiet down the more "chatty" protocols (e.g Appletalk, Bonjour, ...) by disabling not used protocols and services where possible.

The bursty traffic of the web apps might be the cause of the dropped packets, but that's just guess, not a fact...
0
OlesurfdudeAuthor Commented:
We have disabled the AppleTalk, Decnet etc on the printers etc.  Good point about the servers being in a different VLan most of them are, but there are also users on some of them.  I am going to look at that a little closer.
You seem very knowledgable on the Cisco show commands for queues etc, where do you get your info/training.  Those are things I would like to learn more about, troubleshooting problems like this etc.  I got my CCNP in 2002, and don't remember going over any of this stuff!
0
Zephyr ICTCloud ArchitectCommented:
Well ... It's mostly experience, learning some commands over the years (and keeping them close)... Most of these commands can be found online though, they can be found by using search terms like "Cisco troubleshoot cpu interrupts" or "Cisco troubleshoot dropped packets" ... And besides that A LOT of reading ;-)

The most difficult part is the buffers, not much info to be found that really goes in-depth, I still struggle sometimes to find definite info on some of the things I come across.

I'd keep an eye on that 3750 stack, it seems to have some issues with the more heavy bursty traffic, also check out your routing, some indications point to some routing not being optimal, If I'm not mistaking I saw EIGRP somewhere, dynamic routing is sweet, but if it detects a bad line or calculates that the better path is the long one it can cause some headaches :-)

If re-designing or adjusting your broadcast domains doesn't seem to help much you could try by circumventing the 3750 stack when troubleshooting, maybe directly connecting to the 6500-chassis ...
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Switches / Hubs

From novice to tech pro — start learning today.