Link to home
Start Free TrialLog in
Avatar of billwharton
billwharton

asked on

Wierd problem

I am a network security engineer and when a problem was escalated to me, I didn't believe it at first but now after seeing it, I do.

My network is infected with bad IP addresses. We are using static IP addresses at the moment.
Suddenly, a user complains of not being able to access certain websites like yahoomail.com, optonline.net and probably 1-2 others.

I visit this user's desk and to find out if it's a computer problem or not, I plug in my laptop and use her IP address. We have no gateway URL filtering product and use a PIX firewall to connect to the internet. I have rebooted all these network devices and the problem still exists.

The DNS resolution is working fine as I can see packets in my sniffer go to the right IP address but nothing comes back. This happens approximately every 2 weeks with a different computer each time and all I have to do to resolve it is give the computer another IP address.

This problem is not website centric but it also happened while I was trying to access a share on a server. With IP 10.1.1.25, I couldn't access the share but with IP 10.1.1.26, I could. I gave the .25 IP to another computer and it too couldn't access the server. Of course, I did reboot the server and even flushed out it's ARP entries, etc.

Can this be the wierdest virus ever? I don't think I can resolve this problem unless somebody else has faced it and gotten to learn the mystery.
Avatar of lockdown
lockdown

This sounds very strange.
Could the pix have used all its licenses?  The lowest level pix license is by IP address and not number of connections..
Are these computers attached to switches with any special settings?  Was the computer moved or IP on the computer changed before it stopped working...
Could it be a bad subnetmask somewhere,  from a working computer try to ping the IP that isn't working.  See if something replies host/network unreachable, if so likely a bad subnetmask is configued on that device.
Seems like a bad netmask - are you using DHCP? The firewall, if not configured accordingly, is not clever enough to be causing you this grief.

I would check your layer 2 network...
Avatar of billwharton

ASKER

Thx lockdown & ferg but it cannot be any of those assumptions. I should have made it clear that this bad IP address cannot access only a handful of websites but can access all other websites and other network resources just perfectly.

I have also rebooted all network devices on the way to the internet.
The firewall is configured just fine and so is everything else. We have to think way out of the box and still I don't forsee a solution
Other than the fact changing the IP fixes everything it sounds like spyware/adware.  Possible the hosts file has been tampered with.  Try running adaware and/or hijack this.
Again, it's not a client problem at all. I connected my clean laptop to this same network using the same IP and faced the same problem. So, we have to think outside the client machine. There is nothing wrong with it.

I like this - smells like a real problem - can you post the data capture (ie sniffer, ethereal output etc) here for us to see?
You are on a private network, how do you manage NAT/masquerading (to the Internet) ?
I would tend to think like lockdown, a problem at pix level - licenses ?
----- ferg-o
thanks for your interest. The next time this problem occurs, I am going to send you logs of the sniffer as well as pix debugs.


----- Mercantilum
Again, it cannot be a license problem because I can continue accessing additional websites excepting a handful. Also, the PIX is an UR (unrestricted) license and this is a very small network.

I am using NAT overload on the PIX to get to the internet. Please don't consider this to be a PIX problem.

I'll also like to add that these bad IP's become good in a few days time.


SOLUTION
Avatar of Rich Rumble
Rich Rumble
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Yes, I think i would have to trace the actual packets now and see if they are actually exiting out of my PIX and if the webserver is replying back or not.

however, it cannot be a layer2 or mac address problem since only certain websites cannot be accessed while the rest can be.
you'd be surpized... i've done some poisoing in the lab- can cause very erradic behaviour. Wireless was the most sporadic and easiest to do-

-rich
Have you tried capturing the traffic on your network?  If you don't have access to a sniffer, you can try some freeware like what is available at www.ethereal.com.

Analyzing the traffic might pinpoint a problem area, or at least help you determine if the problem is internal or external.

Make sure you use a dumb-hub device or a promiscuous-mode setup to capture all the traffic...
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial

Short of a sniffer capture we need a snoop or a tcpdump to see what the gateway is saying at syn/ack time...
joseph:

Usually, an IP goes good after being bad for a week's time.

Mostly it's the same websites like optonline.net (not optimumonline.com), yahoomail.com (not yahoo.com)

I cannot ping the website when it goes bad and cannot open any kind of http connection to it. I have even tried accessing these websites by IP address and not DNS name. The way I sometimes test is this. I simply do 'telnet yahoomail.com 80' which is as simple as accessing yahoomail.com on port 80. Now, if it's able to connect, it might not give you any output but the ms-dos window would stick there for at least 15-20 seconds.

Access folders on another server: The client and the server were on different vlan's seperated by an alcatel omniswitch/router.
They were also on different windows NT domains. With the bad IP, the client couldn't access shares on serverA but it COULD access shares on serverB. ServerA and B are on the same vlan and windows NT domain.

DNS is internal and the resolution works fine.

Unfortunately I am unable to do any testing right now because I don't have a single bad IP and am hoping to get one within the next 2 weeks.

Hoping to get a bad one?

It has to be some kind of routing or layer 2 issue - could you use netcat with -v -v switches to attempt to connect to the servers when you get a bad IP? I would really like to see the output...
Ok. I bet it is the Alcatel switch/router.
Since you have problems hitting the shared folder on the NT box on the other subnet, and to get to that subnet you do not go through the PIX, then it can't be the firewall.
How is your network wired? Is it client machines -- Alcatel switch/router -- Pix -- Internet? Sounds like it.
The Alcatel box would then be the only single point of commonality, so the fault has gotta exist there.
The next time this happens, check the ARP cache on the Alcatel. See if it has the MAC address of the client with the "bad" IP, and see in what state it is in.
I don't know what IOS the Alcatel uses, so I don't know the command syntax to check the ARP cache. But, it sounds like you could find this out yourself.
I am glad you did the Telnet test to websites on clients with bad IP addresses, as that rules out any silly IE problems!
BTW, what Alcatel model is in place? And what version of firmware/IOS does it have? I want to look this one up, to see how long it keeps ARP entries. The 7 days an IP being bad sure seems like a long time for an ARP problem on a router, and so I want to check.
So, I  guess I am starting to agree with richrumble on this, on it being a Layer2 problem, and it seems like the Alcatel is the culprit.
Please post when you have more data.
Why would you consider it to be a layer2 problem when the problem persists only with a few websites? it's not like this bad IP address loses network connectivity all together.

I am going to check up on the alcatel anyways!
It's cause I've seen this happen once. A workgroup-level switch had a problem with its ARP entries for a specific IP address. The ARP entry for the MAC for the NIC with this one IP (can I use any more abbreviations????) was in some type of conflicted state. I don't remember what the message the switch (running the Cisco IOS) said exactly, so it might not have called it "conflicted". I know Windows uses "conflict" in its NetBIOS entries, but that is neither here nor there.
Anyway, I remember that connectivity from that IP to most resources was fine, but it had problems hitting very select places (some internal host), and those hosts could not hit it either. But other things seemed fine, PINGing and such. It was erratic, did not conform to anything regular, and was just the weirdest thing.
There was an IOS upgrade available, and I believe that that was the recommendation from Cisco (they were called in on support of this issue).
After the IOS upgrade, the issue was closed. We "think" that it was the switch, but we never were 100% comfortable with that. It hasn't happened again, so it must have been the switch, but still, it was really strange!
It didn't seem logical, and that is the problem. These IT computer issues we deal with every day actually ARE logical, once you start to think like the machines do!
But for this one, I couldn't get my mind to match it exactly.
But from what you have posted, your problem seems similar to that one incident. And we really though it was an ARP issue, so that is why I am leaning toward that for you.
So, check out that Alcatel when another IP goes bad.
BTW, I forgot to ask earlier. When an IP goes bad, can you PING it from a working IP on the same segment? From the other segment? Can you map a drive to a bad IP machine when this happens?
Open a case with Alcatel... see if they know anything about a problem such as this- get the real expert's on the case. Might think of trying cisco to... they have been a leader in the field for some time, they may know of your problem.
If I follow correctly- saying this isn't a certain port on the switch... you said IP- so I assume you unplug the host having the problem- then you plug your laptop in (not the same port that PC was having the problem) and give it the same ip...then the problem's continue for your laptop... so that's layer3- which still reak's of bad arp table or a memory leak perhaps.
Try your vendor's- they know their own gear better than we do. Since your giving the LT the same ip, and it's having the same trouble getting to the same site's- that definatly seems like the arp table. Cisco router's had quite a few arp table overwrites in the past- alcatel may have a similar vulnerability...
Good Luck
-rich

(cisco vlun- pix not affected- http://www.cisco.com/en/US/products/products_security_advisory09186a00800b113c.shtml)
======================
EE Administrator - please don't close this thread, it might take some time for me to come back to it as I don't know when my next BAD IP would appear. I am sure it would though.
======================

Joseph_Moore:
Ping to to the bad IP - YES. Haven't tried mapping a drive to the bad IP

richrumble, thanks for your input. I am looking forward to taking all your ideas and following them through. One stumbling block here is that I have to do all this remotely via terminal services, etc. I even have to change IP's remotely so when I do that and if i get locked out, i have devised a script which takes the windows machine back to it's original IP the next morning so that the end user doens't see any difference.






Hi :)

IP goes bad for a few days...can't connect to internet/local network.
The same IP becomes OK after a few days.
Doesn't this look like something is monitoring the IP for something and then when it finds it doing that 'something' it blocks it for a period of time? Are u sure u have nothing of this sort in place? Those IPs are blocked for sometime and then taken off the 'blocked list'.
If u change the IP of the specific machine then it works fine. Look for something that is configured to block BAD IPs.
Have you tried cleaning out the arp cache on the switch and router when this occurs?
Hey - i have the same problem with my pix.  websites like www.anbfinancial.com don't come up from behind the switch - and some things on yahoo's site don't work either - is happening at 2 pix locations for me now.

Anyone solves this problem?
lol, no

i haven't got a chance to go and debug it..but this thread is full of great stuff.. Read some of their suggestions and implement them
none of the above worked - i went through this website a few times and others in the past have had the problem but the questions are always finished with "i solved it outside EE" - damnit post the solution :)
Nick, open a TAC case with cisco. If they can't solve it... who can?
-rich
I cannot believe that nobody has mentioned MTU. Verify your MTU settings on the PIX and any other network devices you go through (specifically your dsl or cable modem). An incorrect MTU setting will cause you to not be able to get to some websites. It seemed almost intermitent when I had this problem, which could explain your "bad ip"
well, that's a good suggestion but what turned out to be the problem were bad switches. The switch vendor couldn't explain the behavior but simply asked me to upgrade the software images which helped.
classic cisco! At least that is their suggestion to every problem to us... upgrade the code. You can have your pts refunded if you wish by posting a question in the community support  https://www.experts-exchange.com/Community_Support/
-rich
Opps, the question is PAQ'd already- sorry about that... duhh
-rich
wasn't cisco.