Solved

ARP collisions cause DoS

Posted on 2007-04-01
2
3,275 Views
Last Modified: 2013-11-16
Hi. We host our company's live web applications at a third-party's datacentre.  We have a Cisco PIX 515E firewall at the datacentre that represents the edge point of our equipment before it links to the ISP's equipment and hence to the internet. A few days ago the PIX experienced almost a total loss of connectivity for over an hour. This was the first time this has occurred in the two years that we've hosted our systems there. The chain of events was roughly this:

22:23:13 - Cisco PIX 515E firewall starts reporting dozens of ARP request collisions and ARP response collisions on its external interface. For instance:

<164>Mar 26 2007 23:27:05: %PIX-4-405001: Received ARP request collision from x.x.x.x/yyyy.yyyy.cb31 on interface outside
<164>Mar 26 2007 23:27:05: %PIX-4-405001: Received ARP response collision from x.x.x.x/yyyy.yyyy.cc31 on interface outside

(I have substituted x.x.x.x where the IP of our PIX's outside interface was, and yyyy.yyyy for the beginning of the MAC address reported in the log. Otherwise the log entries are unchanged)

The ARP collisions reported seemed to indicate a duplication of our IP address (x.x.x.x) within the ISP's network. I managed to trace it back to their gateway (x.x.x.1) and reported this to them. A traceroute to the IP address in question from outside got as far as the ISP's router and timed out but I think this normally happens anyway
 
The problem disappeared at 23:27:33 in the midst of the ISP's investigations (deduced from the PIX log):
<164>Mar 26 2007 23:27:19: %PIX-4-405001: Received ARP request collision from x.x.x.x/yyyy.yyyy.cc31 on interface outside
<164>Mar 26 2007 23:27:19: %PIX-4-405001: Received ARP request collision from x.x.x.x/yyyy.yyyy.cb31 on interface outside
<162>Mar 26 2007 23:27:19: %PIX-2-106001: Inbound TCP connection denied from a.a.a.a/1312 to z1.z1.z1.z1/445 flags SYN  on interface outside
<164>Mar 26 2007 23:27:19: %PIX-4-106023: Deny tcp src outside:a.a.a.a/1313 dst inside:z2.z2.z2.z2/445 by access-group "PERMIT_INET_IN"
<164>Mar 26 2007 23:27:19: %PIX-4-106023: Deny tcp src outside:a.a.a.a/1316 dst DMZ:z3.z3.z3.z3/445 by access-group "PERMIT_INET_IN"

(a.a.a.a is a host on the ISP's network which I believe was being used by their technicians to attempt to connect to our equipment (intentionally denied by me)
z1.z1.z1.z1, z2.z2.z2.z2 and z3.z3.z3.z3 are our web servers)

I reloaded the PIX for peace of mind at 23:55:37 although it had been running happily for months without being reloaded

Initially yyyy.yyyy.cc31 seemed to me to relate to a BroadCom card, perhaps a DELL machine, but then I determined that it resolved to x.x.x.1, the router next hop down from our firewall. That MAC address actually seemed to resolve to x.x.x.253 as well. It seems to be set up as a failover system. Our ISP say that the traffic is nothing out of the ordinary but I have searched all our PIX logs from the past two years and this traffic has not appeared before now. In the hour we lost service the PIX logged almost nothing but this traffic. That would seem to indicate to me that the two are related but the ISP deny this

I'm not saying that the ARP traffic isn't part of the normal working of the router failover monitoring process but there could still perhaps have been some event that caused a huge increase in that traffic. Perhaps the ISP restarted one or more routers as part of the investigation, or unplugged and reattached a cable. Any information would be helpful but none seems forthcoming from them, other than they say that everything was normal

The main issue is not apportioning blame as such, but determining whether the problem could be the fault of our firewall. Neither of the MAC addresses mentioned in the ARP collision messages in the PIX log relate to our equipment but the IP address (x.x.x.x) does. As I say it's the IP address of our outside interface. The way I see it the outside interface is seeing conflicting data but I need to confirm for sure that this confusion couldn't be caused by our PIX as, if it is, I need to take steps to avoid it happening again

I have done a lot of reading up on this issue since the downtime but I can't seem to determine exactly what happened. The Cisco documentation for error 405001 mentions that the traffic could be legitimate but whether legitimate or not the traffic seems to be deadly for our firewall. Would these messages be caused by a normal failover config on our ISP's router? The ARP collision traffic we saw coincided with a DoS so I'm sure the two are completely related as we've never seen that traffic on the PIX before in two years

As far as I'm concerned the traffic comes from one of four sources:

1) A cable plugged incorrectly by the ISP
2) A faulty config or a malfunctioning of the ISP's router (not our PIX)
3) A piece of kit from another hosted company at the ISP which shares the router (x.x.x.1) with us
4) Our PIX causing confusion or having a faulty config

My primary concern is eliminating and possibility of the source of the problem being number 4)

Anyway, I'd be interested to know if anybody has a different opinion as this has caused us a lot of hassle with some big clients and we are in danger of losing them. It's just one of those things no doubt but I'd be very interested to find out what happened and why so that we can avoid a repeat occurrence which we absolutely cannot afford
0
Comment
Question by:saville00
2 Comments
 
LVL 79

Accepted Solution

by:
lrmoore earned 250 total points
ID: 18833237
I'm almost certain that it is the ISP at fault here.
Ask them if there is anyone else that can possibly share the same broadcast boundary with your outside interface and their router. If yes, then any other firewall, like another PIX for example, in that same broadcast domain will use proxy arp to answer up for all of the addresses within its interface subnet.
I've seen the arp collisions when 2 pix's were in failover mode and one of them had a NIC going bad...
0
 

Author Comment

by:saville00
ID: 18839628
Thank you very much. That is what I thought was happening but with no access to the ISP's network to run any analysis and with the ISP themselves in full denial mode it's hard to be 100% sure on an issue such as this. Your opinion as somebody whose comments I've read with respect over the years counts for a lot with me

The answer is accepted but if anyone is still able to add comments I'd be interested in any additional thoughts

Thanks again
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

If you have an ASA5510 then this sort of thing would be better handled with a CSC Module, however on an ASA5505 thats not an option, and if you want to throw in a quick solution to stop your staff going to facebook during work time, then this is the…
From Cisco ASA version 8.3, the Network Address Translation (NAT) configuration has been completely redesigned and it may be helpful to have the syntax configuration for both at a glance. You may as well want to read official Cisco published AS…
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now