Link to home
Start Free TrialLog in
Avatar of leblanc
leblanc

asked on

inconsistent traceroute

Attached is how my network is setup. It is all Catalyst 3560 layer 3 switches. I did a traceroute from server1 to the 4.2.2.2 and something does not seem correct. On my first traceroute, I have 2 hops. On my second traceroute, I have 1 hop. I have been having problem downloading files from the Internet on the first try. I am just wondering if this is the cause and how I fix it. Any inputs will be greatly appreciated.
 
- 1st traceroute:
C:\Users\bo>tracert 4.2.2.2
Tracing route to b.resolvers.Level3.net [4.2.2.2]
over a maximum of 30 hops:

  1    <1 ms    <1 ms     1 ms  10.10.1.1
  2    <1 ms    <1 ms     1 ms  10.10.1.40  
  3    <1 ms    <1 ms     1 ms  *        *     ^C

- 2nd traceroute:
C:\Users\bo>tracert 4.2.2.2
Tracing route to b.resolvers.Level3.net [4.2.2.2]
over a maximum of 30 hops:

  1    <1 ms    <1 ms     1 ms  10.10.1.40
  2    <1 ms    <1 ms     1 ms  *        *     ^C
Capture.JPG
ASKER CERTIFIED SOLUTION
Avatar of skullnobrains
skullnobrains

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of leblanc
leblanc

ASKER

" in your case, level 3 most likely meaning you should not use the same ips on both side of any router." I am not sure I understand this. Where did you see the same IPs on both side? Thanks
simple. if 10.0.1.40 can be your first hop, you are in that network

10.10.1.1 gave an advertisement to your machine asking it to directly use 10.10.1.40 as a gateway because it's no use going through an extra router. this happens because 10.10.1.1 sees the same network on the firewall side and your machine's
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
To make it simple, use 10.10.1.40 as your default gateway.

Job done! :-)
Avatar of leblanc

ASKER

The problem seems to be coming from the Websense (connected to the 3560). When I disable the Websense, everything works.
More than likely you are running some type of dynamic routing protocol and the Websense box is telling the 3560 to use it as the default route.    

Do you have access to the 3560?  What does its routing table look like?  Are you running a dynamic routing protocol?

My guess is your PC has the 3560 as the default route.  The 3560 is most likely  has a default route to the "Internet" and when the Websense is up it tells the 3560 to use it (the Websense box) as the default route.  Since the Websense box is on the same IP subnet as you, the 3560 issues a ICMP re-direct to tell your PC to use the Websense.
Avatar of leblanc

ASKER

No dynamic routing protocol... the GW for the 3560 is the FW. The Websense is directly connected to the 3560.
With the Websense active, if you have access to the 3560, display the routing table.    Then disable the Websense wait a few minutes and display again.  See what the difference is.
Avatar of leblanc

ASKER

Good idea. I will try that. Thanks
assuming that 10.10.1.1 must be the websense and 40 the catlyst,
you need to use a different set of addresses to go from the websense to the ruoter.
for example, add 172.16.0.1 to the websense and 172.16.0.2 to the catalyst and set 176.16.0.2 as the default route for the websense. you probablt also should delete the 10.10.1.40 address from the router so it is not that easy to bypass the websense

likewise do the reverse if the boxes are the other way round

it would be better but not required to setup a dedicated vlan for the communication between the websense and the catalyst
skullnobrains brings up a good point which leads to the question, which is the IP address of the 3560 and which is the IP address of WebSense box?

With the WebSense box up and running can you post the routing table from your PC?
Avatar of leblanc

ASKER

The traceroute is from server1.
10.10.1.5: 3560
10.10.1.40: FW
10.10.1.15: Websense
10.10.1.1: virtual IP address for HSRP

traceroute from the 3560 (10.10.1.5):

S*    0.0.0.0/0 [1/0] via 10.10.1.40
       10.10.1.0/24 is variably subnetted, 2 subnets, 2 masks
C     10.10.1.0/24 is directly connected, vlan10
L     10.10.1.5/32 is directly connected, vlan10
       10.10.20.0/24 is variably subnetted, 2 subnets, 2 masks
C     10.10.20.0/24 is directly connected, vlan20
L     10.10.20.3/32 is directly connected, vlan20
What device as the HSRP address?

However, since ".40" and ".1" are in your subnet and ".40" is the firewall, that means the firewall knows how to get directly to your computer and every other computer in the 10.10.1.0/24 subnet.

So for any destination IP address that must go through the firewall, a ICMP redirect will be issue to say "go through '.40'".

Where and how Websense comes into play, I don't know it should  not.  It should be your computer, whatever hardware owns ".1", and the firewall.
Avatar of leblanc

ASKER

"What device as the HSRP address?" The answer can be found in my capture.jpg attached file.
All hosts have the GW as 10.10.1.1. The devices have GW as the FW (10.10.1.40)
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of leblanc

ASKER

"Since your PC is in the same subnet as 10.10.1.40, it will send a ICMP redirect to your PC to use it as the route to any IP host that does not have a more specific route.  "
I am not sure I understand this.

"why go through 10.10.1.1 and have it forward to 10.10.1.40" . The reason is because we have many hosts that have static IP addresses. So it will be painful to go to each hosts and changed the GW from 10.10.1.1 to 10.10.1.40.

"Yes, physically you are still taking the same path," from where to where?

Thanks
So if you have lots of hosts using static IPs, why not just swap the HSRP address on the 3560s with the address of the router?
Avatar of leblanc

ASKER

The hosts are pointed to the HSRP virtual addresss on the two cores 3560. The router that you referred to is the L3 3560. So you are saying to move the 10.1.1.1 to the L3 3560.  I am not sure how this will work with HSRP. So if I do so, I will not need HSRP and just rely on spanning tree. I think the point is not to rely on STP. That is why I use HSRP.
HSRP and STP are completely different things. HSRP shouldn't be used instead of STP.

STP maintains a loop-free topology.
HSRP provides router redundancy.
Im wondering now why you have HSRP at all actually?!
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of leblanc

ASKER

giltjr,
So in your suggestion, there is no need for HSRP. Correct? So you will have everything layer 2.

I've been told that the FW (it is a Fortigate) cannot have 2 interfaces to the internal network. So your suggestion will not work.

The plan was to use HSRP as GW redundancy for all internal hosts.
Giltjr is correct.

You only need HSRP for redundancy if you have two routers going to the same place.  Here, you don't! I know it looks like you do, but if the VLAN IDs in the diagram are correct the routing won't work properly.
Avatar of leblanc

ASKER

"You only need HSRP for redundancy if you have two routers going to the same place.  Here, you don't! ". I am not sure I understand this. The 2 cores are going to the same 3560.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
- don't change the HSRP address if it is the gateway of all your machines.
this part works fine, apparently so just keep it

That's the bit that's causing the issue, if I'm correct, so it's not working fine.

The active 3560 HSRP switch is sending an ICMP redirect to the clients who are using .1 as their default gateway, sending them to .40.  This is breaking routing (or traffic-flow at least) from what I can determine.

I still can't see from looking at the diagram why HSRP is actually being used here though, as the IP ranges are the same, yet the VLANs are different, therefore they're not routing the same L2 segments.  If I am correct, and the VLAN IDs are correct in the diagram, this design is bad and HSRP should be removed!

Someone tell me I'm talking mumbo-jumbo?! :-)
having an icmp redirect does not prevent the network from working properly. i don't like it because it makes the network prone to fail on human errors. for example is the first router is removed, everything will continue working fine until one of the machines is rebooted. if a power failure occurs at that time, the whole network goes down. but it is a working situation nevertheless as long as both routers are up.

the problem is that the upstream router of the hsrp set is in the same LAN

i believe it is much simpler to move a single upstream router to a dedicated LAN rather than make changes on all the machines of the initial LAN. also i believe that HSRP is set for a reason and bypassing the hsrp set by using a different gateway will produce various points of failure that do not exist currently.

<off topic and flamish>
then i pretty much agree that hsrp is not currently used anyway and the network schema looks like one of the uselessly complicated, impossible to debug, error-prone, and please-call-us-when-it-breaks-because-when-that-happens-there-is-no-way-you-can-repair-it-yourself idiotic solutions consultants thend to impose everywhere because they don't want to try and look at what the client needs and rather tend to simply sell the exact same generic barely working setup over and over again

but then we have no idea of the requirements
</off>
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
:-D

I agree that ICMP redirect doesn't stop things working, but it appears that something within that process is breaking things.  All I'm saying is get rid of the redirect as it's unnecessary.

The easiest way to remedy things (assuming our previous assumptions are correct) is to swap the HSRP address with the router addres.  This would minimize the administrative overhead in making the changes as the only other thing we'd need to do to get this to work would be to adjust a static route or two on the router.
i assume you means swap the hsrp address with the firewall's because it would not solve anything otherwise. this seems sound as well, but risky : given the network schema i would not vouch that all machines have direct access to the firewall in all situations and there is likely no failover that way.

maybe we'd get the big picture with a more complete schema. given what we see at least 2 routers are plain useless so i'd assume they are needed for something that is not shown on the schema.... and i'd rather not advice changing stuff in that situation.

but clearly having the firewall in the same network as the hosts while the hosts are setup with a different gateway is the reason for the initial problem.

a simple solution with minimal impact on the existing setup may be to just add "no ip redirects" on the router(s)
I don't think that would work though, as you'd then stop people who are pointing to .1 from reaching the internet.

I'd have to lab that though to be sure.
Well, ultimately the question was why he was getting the results from traceroute he was seeing.  We have answered that and I hope in a way he understands.

I think we all agree his network is not the best design, but it works and should not cause any major problems.

If he would like some suggestion he can/should open a new question asking for design help.
@craigbeck : it does work. it just allows for the useless hop to stay in use. likewise the hosts can be instructed or firewalled in such a way that they ignore those redirects. most routers in that case will still forward the packets while sending redirects.
Avatar of leblanc

ASKER

Thank you for the active discussion. Sorry if I confused you. But here it is:

- vlan30 is a mistake. It should be vlan20 on core2.

- As far as HSRP is concerned, I will have to disagree that HSRP is not needed for redundancy in this case. core1 is configured as the primary for vlan10. Core2 is configured as the primary for vlan20. Now if core1 goes down, server1 traffic with GW 10.10.1.1 (vlan10) will go to core2 and still can access the internet. So that is what HSRP is doing, it gives the hosts a GW redundancy. Agree?

- From core1 and core2, their default GW is the FW because that is where they get out to the outside world.
- The Websense and switch2 exchange info via WCCP. The Websense has the GW as switch2 (10.10.1.5)

Now I am a bit confused when you addressed the ICMP redirect. I am not sure I understand the correlation here. May be you can shed some light?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of leblanc

ASKER

skullnobrains,
" it should not be in the same LAN as the hosts or the router will send an ICMP redirect to tell those hosts they can reach the firewall directly."

Please tell me more about ICMP redirect. Is this something that is sent by default from the router? By router, you meant core1 and core2. Correct? Will I see those ICMP redirect in wireshark? What if the hosts know about the FW as the GW, Don't they still go through their configured GW?

Thanks
Yes you will see this in a packet capture.  Routers send this by default.  This is done to reduce the number of IP hosts that will need to handle the packets.  Why go through 2 hops when you can to through 1?

By router we mean any box that can perform routing, so yes CORE1 and CORE2 as they are fully functional L3 (routers) devices.

If a devices receives a ICMP redirect, they will start using what they are told.  

They will NOT use the default router/GW if they are told to use a different router/GW.  The default router/GW is used when there is not a more specific route in the routing table.

When your computer gets a ICMP redirect, it will temporally add a more specific route for the destination address using the router/GW that it is told to use in the ICMP redirect.
Avatar of leblanc

ASKER

I appreciate the explanation. I understand the ICMP redirect now. But in my case, the extra hop will not be the cause for the inconsistent traceroute. It is just a performance issue, two hops instead of one hop. Agree?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of leblanc

ASKER

skullnobrains,

So the routing does not work correctly because of the icmp redirect issue and the GW assignment. However, it does not explain why I only have problems in my vlan10 and not on my vlan20. The traceroute for vlan20 is consistently the same.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of leblanc

ASKER

"your comptuer on VLAN 20" <---> 10.10.20.1 <-- 10.10.10.1 --> 10.10.10.40 <--> Internet" I am not sure why it has to go through 10.10.1.1 first. When the packet hit 10.10.20.1, it will has a default route to 10.10.1.40.

I am not sure I understand how can an extra hop can mess up the routing for vlan 10.
10.10.10.1 IS 10.10.20.1

giltjr is showing you the router you're going through.

When you're not on the 10.10.10.0/24 subnet you don't see the ICMP redirect as your traffic is actually going through the router instead of being told to go somewhere else.

We could go round and round here but the answer to this question at the end of the day is to simply use the router as the default gateway no matter how much of a nuisance it will be to change things, or use a routed link between the core and the router and point everyone at the HSRP address (as they are already).
or set no icmp redirect on the routers

would you try any of the proposed solutions ?
is there still something unclear concerning this problem ?
Avatar of leblanc

ASKER

"simply use the router as the default gateway " By router, you meant the FW 10.10.1.40. Correct? If yes then HSRP is useless then.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of leblanc

ASKER

I did not this design raised many interesting problems. I did not design this network BTW. I just got involved to fix it.

Also, there are no vlan20 on swith1. That was a mistake. switch1 is just a L2 switch.
For switch2, from my understanding, there are users from vlan10 and vlan20 connected to it and they needed vlan20 for wccp redirect to work. That is why they have a trunk between swich2 and the 2 cores. Only the 2 cores are running HSRP.

I am thinking moving all users for vlan10 an vlan20 to another access switch and make the link between switch2 and the 2 cores a L3 as well as the link between switch2 and the FW (which represents by the router symbol).

I am curious on craigbeck's solution with the new vlan for 2 cores and switch2. I am just wondering how this will solve the problem.

Thanks
It will solve the problem by removing the ICMP redirect and optimally routing traffic.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Exactly that giltjr!
Avatar of leblanc

ASKER

Right. L2 stops at the 2 cores.
I have to look into your suggestion of creating a vlan just for switch2 and the 2 cores. I don't have a good grasp on that yet.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of leblanc

ASKER

ACLs?

If your new VLAN suggestion is what make switch2 the DMZ then it is fine. But  in this design, switch2 is not the DMZ. Everything behind that FW is considered internal and trusted. The DMZ is not considered in this design. Switch2 is the distribution WAN.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@skullnobrains - I think someone got out of bed on the wrong side!
you may also want to note that your understanding of the problem and proposed solution are obvious follow-ups of things that @giltjr and myself posted beforehand, with added unrelated documentation.
Back up!  I'm not here to argue, but...

the firewall will try to directly speak to the machines that are not on his vlan
How?  If the firewall doesn't have an IP address on the same subnet as the destination it will send the traffic to the next-hop if one is configured, or nowhere if it doesn't - ALWAYS.
unless you know what you are doing, either stick to level 2 or stick to level 3. in your case, level 3 most likely meaning you should not use the same ips on both side of any router.
Firstly, it's LAYER, not LEVEL.  Secondly, it's impossible to assign the same IP to two different interfaces on the same router, or to separate interfaces on the same subnet.

STP plays a big part here, yet no-one else has even mentioned it!  That explains a thing or two...

There's obviously too many cooks in the kitchen, and for that reason - I'm out!
Avatar of leblanc

ASKER

Experts,

I think this is great to share our thoughts and see what may or may not work. Personally, it helps me to see the issue from different angles.
 
I am just a beginner when it comes to networking. I understand what an ACL is and what the DMZ is. If this is what your suggested design is to make it work with the current design, then I can understand. But in my opinion it is quite complex and adding processing work to switch2, I am not sure if I want to put my internal network behind a DMZ (that is just me)

Anyway, I think I can try to understand this but at the end of the day, I have to implement it to see if it is working. Thanks.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@giltjr : ++ for the clarification. i'll be waiting for answers as well.

<off topic @craigbeck>
i'm not trying to get into an argument either. sorry, i'll keep the conversation technical. please accept apologies as my comment was a little aggressive.

How?  If the firewall doesn't have an IP address on the same subnet as the destination it will send the traffic to the next-hop if one is configured, or nowhere if it doesn't - ALWAYS.

exactly : it will look at it's routing table, and find it is supposed to have a direct link to the machine. it will issue an ARP request, won't get an answer because the host is in a different vlan and drop the packet because it cannot send it anywhere.

this is exactly the reason why i insist on the fact it is important to create a dedicated SUBNET. using a separate VLAN only enforces the use of a different subnet so the hosts cannot communicate directly with the firewall by using the proper ip and default route.

using several vlans inside the same subnet is always a bad idea.

---

i'd be quite happy to know why you think STP is related to this problem in any way
<off>
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
no problem on my side : we don't have to agree and it is probably interesting to have different views from different people.

technically, i still think that loosing the first connection, but having something that works afterwards is totally inconsistent with stp related problems

i also think that this exact problem can easily be reproduced in a lab with a single router, no stp, and no hsrp, so i see no reason to mix them in.

removing the redirection, or setting up a dedicated lan segment for the communication with the firewall are both working ideas. in the latter case, i forgot that we need a route on the firewall so it reaches the lan though the router

removing the redirection can be tried in a matter of seconds. adding an extra dedicated lan can be done gradually by first making the router and firewall ping one another, then setup the firewall properly (rules and return route), and possibly use policy routes on the router to test the setup for a few machines before using it in production.
@pitachip : feel free to post the solution you used and how it went. as you can see, none of us is pretty definitive on what is good/best practice with reasonable/overkill work so we can learn from your experience.

best regards to all
Avatar of leblanc

ASKER

skullnobrains,

All of your proposed solutions sound very interesting. But the work-around adds complexity. So I have decided to make everything layer 3 beyond the core switches. I am testing two scenario, one with all static routes https://www.experts-exchange.com/questions/28282400/static-routes-redundancy.html and one with EIGRP. As you can see from my thread for the static routes, I am having issue on the link between core1 and core2. The link is currently a trunk. I was thinking may be I should have it as a layer 3. but not sure how HSRP will communicate between core1 and core2. Thanks
thanks a lot for posting back some info, i'm following the other thread with some interest. i won't be participating though since you have more than enough people on that thread already and i don't really believe such network designs to be good practice (overly complicated and difficult to maintain)

if my 2 cents is of interest to you
- avoid mixing layers 2 and 3 on the same equipment. do it on an existing network in order to solve an existing specific problem but don't start up with this unless you're used to working in such environments.
- don't delegate the firewall's job to a router. like before you might end up doing so in a couple of existing LANs that have huge internal traffic in order to ease the firewall afterwards.
- don't route when switching is enough. you'd usually use layer 3 when you need separate LANs for security reasons, or you experience latency over slow link, or you have 10k windows machines in the same LAN and the shoutcast is starting to be a problem.
- don't setup a network with assymetric routes
- don't use 3 or 4 network equipments together in order to achieve redundancy that can easily be achieved with 2. if you can't do it with only 2 of them, then you're probably trying to setup routing where a bunch of switches would behave better ( see above : using routing instead of switching and possibly using the routers where firewalls would be enough )

i hope you'll find a suitable solution

best regards
With all respect, the solution I've provided in the opposite thread isn't complicated.  It took me 10 mins to lab it!

Asymmetric routing isn't an issue here.  There are no firewalls between the users and the multiple paths so issues such as session tracking don't exist.  As long as it's not implemented in the wrong place it's perfectly acceptable and it's actually used all over the internet.

At some point the traffic needs to be routed, so I don't understand the comment regarding using routing when a bunch of switches would be better?!

Also, I must have missed the memo when Layer3 SWITCHES became obsolete?!
- avoid mixing layers 2 and 3 on the same equipment

Anyway, this question is closed...
craigbeck, i was not talking about your solution in the other post. just a bunch of regular advice which seemed relevant to this one. like you said, this post is closed. let's not spam others.