ARP collisions cause DoS

Posted on 2007-04-01
Medium Priority
Last Modified: 2013-11-16
Hi. We host our company's live web applications at a third-party's datacentre.  We have a Cisco PIX 515E firewall at the datacentre that represents the edge point of our equipment before it links to the ISP's equipment and hence to the internet. A few days ago the PIX experienced almost a total loss of connectivity for over an hour. This was the first time this has occurred in the two years that we've hosted our systems there. The chain of events was roughly this:

22:23:13 - Cisco PIX 515E firewall starts reporting dozens of ARP request collisions and ARP response collisions on its external interface. For instance:

<164>Mar 26 2007 23:27:05: %PIX-4-405001: Received ARP request collision from x.x.x.x/yyyy.yyyy.cb31 on interface outside
<164>Mar 26 2007 23:27:05: %PIX-4-405001: Received ARP response collision from x.x.x.x/yyyy.yyyy.cc31 on interface outside

(I have substituted x.x.x.x where the IP of our PIX's outside interface was, and yyyy.yyyy for the beginning of the MAC address reported in the log. Otherwise the log entries are unchanged)

The ARP collisions reported seemed to indicate a duplication of our IP address (x.x.x.x) within the ISP's network. I managed to trace it back to their gateway (x.x.x.1) and reported this to them. A traceroute to the IP address in question from outside got as far as the ISP's router and timed out but I think this normally happens anyway
The problem disappeared at 23:27:33 in the midst of the ISP's investigations (deduced from the PIX log):
<164>Mar 26 2007 23:27:19: %PIX-4-405001: Received ARP request collision from x.x.x.x/yyyy.yyyy.cc31 on interface outside
<164>Mar 26 2007 23:27:19: %PIX-4-405001: Received ARP request collision from x.x.x.x/yyyy.yyyy.cb31 on interface outside
<162>Mar 26 2007 23:27:19: %PIX-2-106001: Inbound TCP connection denied from a.a.a.a/1312 to z1.z1.z1.z1/445 flags SYN  on interface outside
<164>Mar 26 2007 23:27:19: %PIX-4-106023: Deny tcp src outside:a.a.a.a/1313 dst inside:z2.z2.z2.z2/445 by access-group "PERMIT_INET_IN"
<164>Mar 26 2007 23:27:19: %PIX-4-106023: Deny tcp src outside:a.a.a.a/1316 dst DMZ:z3.z3.z3.z3/445 by access-group "PERMIT_INET_IN"

(a.a.a.a is a host on the ISP's network which I believe was being used by their technicians to attempt to connect to our equipment (intentionally denied by me)
z1.z1.z1.z1, z2.z2.z2.z2 and z3.z3.z3.z3 are our web servers)

I reloaded the PIX for peace of mind at 23:55:37 although it had been running happily for months without being reloaded

Initially yyyy.yyyy.cc31 seemed to me to relate to a BroadCom card, perhaps a DELL machine, but then I determined that it resolved to x.x.x.1, the router next hop down from our firewall. That MAC address actually seemed to resolve to x.x.x.253 as well. It seems to be set up as a failover system. Our ISP say that the traffic is nothing out of the ordinary but I have searched all our PIX logs from the past two years and this traffic has not appeared before now. In the hour we lost service the PIX logged almost nothing but this traffic. That would seem to indicate to me that the two are related but the ISP deny this

I'm not saying that the ARP traffic isn't part of the normal working of the router failover monitoring process but there could still perhaps have been some event that caused a huge increase in that traffic. Perhaps the ISP restarted one or more routers as part of the investigation, or unplugged and reattached a cable. Any information would be helpful but none seems forthcoming from them, other than they say that everything was normal

The main issue is not apportioning blame as such, but determining whether the problem could be the fault of our firewall. Neither of the MAC addresses mentioned in the ARP collision messages in the PIX log relate to our equipment but the IP address (x.x.x.x) does. As I say it's the IP address of our outside interface. The way I see it the outside interface is seeing conflicting data but I need to confirm for sure that this confusion couldn't be caused by our PIX as, if it is, I need to take steps to avoid it happening again

I have done a lot of reading up on this issue since the downtime but I can't seem to determine exactly what happened. The Cisco documentation for error 405001 mentions that the traffic could be legitimate but whether legitimate or not the traffic seems to be deadly for our firewall. Would these messages be caused by a normal failover config on our ISP's router? The ARP collision traffic we saw coincided with a DoS so I'm sure the two are completely related as we've never seen that traffic on the PIX before in two years

As far as I'm concerned the traffic comes from one of four sources:

1) A cable plugged incorrectly by the ISP
2) A faulty config or a malfunctioning of the ISP's router (not our PIX)
3) A piece of kit from another hosted company at the ISP which shares the router (x.x.x.1) with us
4) Our PIX causing confusion or having a faulty config

My primary concern is eliminating and possibility of the source of the problem being number 4)

Anyway, I'd be interested to know if anybody has a different opinion as this has caused us a lot of hassle with some big clients and we are in danger of losing them. It's just one of those things no doubt but I'd be very interested to find out what happened and why so that we can avoid a repeat occurrence which we absolutely cannot afford
Question by:saville00
LVL 79

Accepted Solution

lrmoore earned 1000 total points
ID: 18833237
I'm almost certain that it is the ISP at fault here.
Ask them if there is anyone else that can possibly share the same broadcast boundary with your outside interface and their router. If yes, then any other firewall, like another PIX for example, in that same broadcast domain will use proxy arp to answer up for all of the addresses within its interface subnet.
I've seen the arp collisions when 2 pix's were in failover mode and one of them had a NIC going bad...

Author Comment

ID: 18839628
Thank you very much. That is what I thought was happening but with no access to the ISP's network to run any analysis and with the ISP themselves in full denial mode it's hard to be 100% sure on an issue such as this. Your opinion as somebody whose comments I've read with respect over the years counts for a lot with me

The answer is accepted but if anyone is still able to add comments I'd be interested in any additional thoughts

Thanks again

Featured Post

WEBINAR: GDPR Implemented - Tips & Lessons Learned

Join the WatchGuard team on Thursday, March 29th as we recount some valuable lessons learned in weighing the needs of a business against the new regulatory environment, look ahead at the two months left before implementation, and help you understand the steps you can take today!

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

On Feb. 28, Amazon’s Simple Storage Service (S3) went down after an employee issued the wrong command during a debugging exercise. Among those affected were big names like Netflix, Spotify and Expedia.
There’s a movement in Information Technology (IT), and while it’s hard to define, it is gaining momentum. Some call it “stream-lined IT;” others call it “thin-model IT.”
As a trusted technology advisor to your customers you are likely getting the daily question of, ‘should I put this in the cloud?’ As customer demands for cloud services increases, companies will see a shift from traditional buying patterns to new…
Both in life and business – not all partnerships are created equal. Spend 30 short minutes with us to learn:   • Key questions to ask when considering a partnership to accelerate your business into the cloud • Pitfalls and mistakes other partners…

624 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question