Link to home
Start Free TrialLog in
Avatar of WallaceLau
WallaceLau

asked on

Intel Switch "Out of Pools" error

Hi,

We have some ancient Intel 510T switches that used to work great but are now having some problems.  The whole switch will randomly started dropping packets, and will report "Receive Discards - Out of Pools" error on only one particular port at the same time (it is one of the busiest port because the load balancer is plugged in there).  Looking at the switch's manual, it says these errors "shows there are no memory pools left because there are so many frames stored."  And then the next line says: "Significance: The switch tries to cause collisions to increase the number of frames rejected; this gives the pools time to empty."

Since the timing of the packet loss corresponds to our load balancer log (about it getting disconnected from the firewall, likely the result of packet loss) as well as the switch increasing the count of "Out of Pools Receive Discards", I presume they are related... However, what does that error really means, what may have caused it, and how do I fix it?  Does it means I am reaching the limit of this switch and I will have to upgrade?  Would changing the switch mode from "cut-through" to "store and forward" help the issue?  I didn't know if "out of pools" is a standard networking-speak that have common solutions out there... especially the manual's explaination sounded kind'a vague to me?  Any advise will be greatly appreciated.

By the way that port in-question normally only puts out ~120 total packets per second, and when the error occurs the swich is usually not under load.  We have load-tested it to about 1800-2200 packets per second doing several large downloads at the same time, so it would seems to me that the problem is not throughput related... I mean, 120 packets per second should be tofu for these switches, no?

Thx.



Wallace
Avatar of pseudocyber
pseudocyber

Any chance of being able to call support on the switch?  What about known bugs and/or firmware upgrades?

Sounds like it might be some kind of DoS attack or something.  How about throwing a sniffer/packet capture on it to see what the traffic is?

From the sound of it, it sounds like store and forward actually might make things worse.  Although, I suppose it's something to try.

I thought I heard somewhere that Intel actually didn't make switches - that they're rebranded something else.  I'm not sure what they might be.  
Hi,

The 510T switch has a internal bandwidth of 800MB. This means if you load the switch with 15x100MB then the internal bus will be overloaded (more bandwidth is forwarded into the switch, than the switch can handel).Out of pool counter will only count if e.g. you transmitting 100MB to a 10 MB destination port. Here will the internal bus not be overloaded but the
pool buffer will be filled. Notice: this is only happening if flow control is disable.

http://www.intel.com/support/express/switches/10/sb/cs-014375.htm

customer support
see http://support.intel.com/support/9089.htm 
Avatar of WallaceLau

ASKER

Thanks for the quick responses.

pseudocyber:  We have not ruled out DoS either, actually on April 2nd we were under a fairly massive DDoS atack and our co-lo facility turned off ICMP on all of our IP addresses.  The packet drop issue started happening about a week or two after the attack.  Also I agree that store-and-forward might be worse as it will likely require more memory (since "Out of Pools" sounds like out of memory errors).  However, the switch is so old that I don't even know if Intel still supports it.  I do know the latest firmware was released like years ago (they all have the latest firmware already).  Regarding sniffers, I don't even know if the switch supports forwarding all traffic to a specific port to be sniffed... at least I havn't found out how yet.  Since we won't install any software (including sniffers) on production equipments, I can't sniff locally on the load balancer either.  The only alternative is to schedule a down-time, unplug something, and route it through a dumb network hub so I can plug a sniffer in there...


Abs_jaipur,

Thanks for the technical notes and Intel support phone numbers, if I can't resolve it here I will give them a call (again not sure if they still support it).  Regarding bandwidth, the switch is used to only serve web traffic and we only pay for 1mbps (one mbps) of bandwidth (although it is burstable to 100mbps).  Since all machine plugged into it are servers which does not generate traffic by themselves (all traffic are incoming request), I don't think it is bandwidth related.  As I said the packet count during the "packet loss" error period is only about 130 packets per second on the questionable port, and the rest of the ports have even less traffic.  The switch is also not showing oversized packets on its error counter.  Oh also flow control is enabled on all ports, and they are are negotiated at 100mbps full-duplex.  Strange?

At this point I think the question is, can some kind of DoS attack cause the switch to behave like that?

Thx.



Wallace
SOLUTION
Avatar of Sam Panwar
Sam Panwar
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Turns out it may have very little to do with DoS... I brought in a little hub and moved all the public connections off the switch, so there is now nothing plugged into VLAN #1 (public side).  VLAN#2 remain unchanged (Web DMZ).  The switch is still going nuts once in a while reporting the "Out of Pool" error, when it is only serving internal traffic.  So the theory of someone sending a mulformed or oversized packet to mess with the switch no longer apply.  (Unless those packets got through the firewall and still got forwarded to the switch... I can't imagine the PIX didn't catch it though).

I guess I needed to start sniffing the port that is reporting error...



W.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
We still have no idea of what is going on, but the problem seems to have gone away after we replaced the intel swithc with Cisco 2924XL.  Oh well.



W.