inconsistent traceroute

Attached is how my network is setup. It is all Catalyst 3560 layer 3 switches. I did a traceroute from server1 to the 4.2.2.2 and something does not seem correct. On my first traceroute, I have 2 hops. On my second traceroute, I have 1 hop. I have been having problem downloading files from the Internet on the first try. I am just wondering if this is the cause and how I fix it. Any inputs will be greatly appreciated.
 
- 1st traceroute:
C:\Users\bo>tracert 4.2.2.2
Tracing route to b.resolvers.Level3.net [4.2.2.2]
over a maximum of 30 hops:

  1    <1 ms    <1 ms     1 ms  10.10.1.1
  2    <1 ms    <1 ms     1 ms  10.10.1.40  
  3    <1 ms    <1 ms     1 ms  *        *     ^C

- 2nd traceroute:
C:\Users\bo>tracert 4.2.2.2
Tracing route to b.resolvers.Level3.net [4.2.2.2]
over a maximum of 30 hops:

  1    <1 ms    <1 ms     1 ms  10.10.1.40
  2    <1 ms    <1 ms     1 ms  *        *     ^C
Capture.JPG
LVL 1
leblancAccountingAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

skullnobrainsCommented:
you have a mix of level 2 and 3 networking

the first trace produced an ICMP response indicating to your machine that 10.10.1.40 was directly available on sever1's network.

your server automatically learnt the new route and bypassed the router (or actually traversed it as a layer 2 switch)

unless you know what you are doing, either stick to level 2 or stick to level 3. in your case, level 3 most likely meaning you should not use the same ips on both side of any router.

it would be easier to help if you provided the ip adresses and ip masks. it is fairly possible that your setup is ok, but you used wide network masks such as /8 instead of /24
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
leblancAccountingAuthor Commented:
" in your case, level 3 most likely meaning you should not use the same ips on both side of any router." I am not sure I understand this. Where did you see the same IPs on both side? Thanks
0
skullnobrainsCommented:
simple. if 10.0.1.40 can be your first hop, you are in that network

10.10.1.1 gave an advertisement to your machine asking it to directly use 10.10.1.40 as a gateway because it's no use going through an extra router. this happens because 10.10.1.1 sees the same network on the firewall side and your machine's
0
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

giltjrCommented:
to try and clear up  skullnobrains explanation a little bit (at least I hope so).

It appears:

1) Your PC has 10.10.1.1 as the default route

2) That 10.10.1.40 is a router within your network

3) That there is a route on 10.10.1.1 that says to get to 4.2.2.2 go through 10.10.1.40.

So when your PC tries to get to 4.2.2.2 it wants to try 10.10.1.1 first because that (more than likley) is the default router on your PC.  When your PC sends the packet to 10.10.1.1, that device looks in its route table and say, "Oh to get to 4.2.2.2 I need to through 10.10.1.40."

In order to make the network path the shortest, 10.10.1.1 sends what is called a ICMP redirect to your PC telling it, if you want to get to 4.2.2.2 you should really go through 10.10.1.40.  

So your PC starts going through 10.10.1.40 until you either re-boot the PC,  flush your PC's routing table, or 10.10.1.40 has it route table updated and sends your PC another ICMP re-direct.
0
Craig BeckCommented:
To make it simple, use 10.10.1.40 as your default gateway.

Job done! :-)
0
leblancAccountingAuthor Commented:
The problem seems to be coming from the Websense (connected to the 3560). When I disable the Websense, everything works.
0
giltjrCommented:
More than likely you are running some type of dynamic routing protocol and the Websense box is telling the 3560 to use it as the default route.    

Do you have access to the 3560?  What does its routing table look like?  Are you running a dynamic routing protocol?

My guess is your PC has the 3560 as the default route.  The 3560 is most likely  has a default route to the "Internet" and when the Websense is up it tells the 3560 to use it (the Websense box) as the default route.  Since the Websense box is on the same IP subnet as you, the 3560 issues a ICMP re-direct to tell your PC to use the Websense.
0
leblancAccountingAuthor Commented:
No dynamic routing protocol... the GW for the 3560 is the FW. The Websense is directly connected to the 3560.
0
giltjrCommented:
With the Websense active, if you have access to the 3560, display the routing table.    Then disable the Websense wait a few minutes and display again.  See what the difference is.
0
leblancAccountingAuthor Commented:
Good idea. I will try that. Thanks
0
skullnobrainsCommented:
assuming that 10.10.1.1 must be the websense and 40 the catlyst,
you need to use a different set of addresses to go from the websense to the ruoter.
for example, add 172.16.0.1 to the websense and 172.16.0.2 to the catalyst and set 176.16.0.2 as the default route for the websense. you probablt also should delete the 10.10.1.40 address from the router so it is not that easy to bypass the websense

likewise do the reverse if the boxes are the other way round

it would be better but not required to setup a dedicated vlan for the communication between the websense and the catalyst
0
giltjrCommented:
skullnobrains brings up a good point which leads to the question, which is the IP address of the 3560 and which is the IP address of WebSense box?

With the WebSense box up and running can you post the routing table from your PC?
0
leblancAccountingAuthor Commented:
The traceroute is from server1.
10.10.1.5: 3560
10.10.1.40: FW
10.10.1.15: Websense
10.10.1.1: virtual IP address for HSRP

traceroute from the 3560 (10.10.1.5):

S*    0.0.0.0/0 [1/0] via 10.10.1.40
       10.10.1.0/24 is variably subnetted, 2 subnets, 2 masks
C     10.10.1.0/24 is directly connected, vlan10
L     10.10.1.5/32 is directly connected, vlan10
       10.10.20.0/24 is variably subnetted, 2 subnets, 2 masks
C     10.10.20.0/24 is directly connected, vlan20
L     10.10.20.3/32 is directly connected, vlan20
0
giltjrCommented:
What device as the HSRP address?

However, since ".40" and ".1" are in your subnet and ".40" is the firewall, that means the firewall knows how to get directly to your computer and every other computer in the 10.10.1.0/24 subnet.

So for any destination IP address that must go through the firewall, a ICMP redirect will be issue to say "go through '.40'".

Where and how Websense comes into play, I don't know it should  not.  It should be your computer, whatever hardware owns ".1", and the firewall.
0
leblancAccountingAuthor Commented:
"What device as the HSRP address?" The answer can be found in my capture.jpg attached file.
All hosts have the GW as 10.10.1.1. The devices have GW as the FW (10.10.1.40)
0
giltjrCommented:
Sorry, somehow I missed the diagram to start with.  If I had seen that, this would have been resolved a lot quicker.

Your PC has default route of 10.10.1.1, which is your internal routers (the 3560's).
They have a default route of 10.10.1.40.

Since your PC is in the same subnet as 10.10.1.40, it will send a ICMP redirect to your PC to use it as the route to any IP host that does not have a more specific route.  

Think about it, from a layer 3 point of view, why go through 10.10.1.1 and have it forward to 10.10.1.40, when you can go straight to 10.10.1.40?

Yes, physically you are still taking the same path, but this is at layer 2,  not layer 3.  Just slightly less overhead because the 3560's are just looking at the MAC address, they don't need to do any routing.
0
leblancAccountingAuthor Commented:
"Since your PC is in the same subnet as 10.10.1.40, it will send a ICMP redirect to your PC to use it as the route to any IP host that does not have a more specific route.  "
I am not sure I understand this.

"why go through 10.10.1.1 and have it forward to 10.10.1.40" . The reason is because we have many hosts that have static IP addresses. So it will be painful to go to each hosts and changed the GW from 10.10.1.1 to 10.10.1.40.

"Yes, physically you are still taking the same path," from where to where?

Thanks
0
Craig BeckCommented:
So if you have lots of hosts using static IPs, why not just swap the HSRP address on the 3560s with the address of the router?
0
leblancAccountingAuthor Commented:
The hosts are pointed to the HSRP virtual addresss on the two cores 3560. The router that you referred to is the L3 3560. So you are saying to move the 10.1.1.1 to the L3 3560.  I am not sure how this will work with HSRP. So if I do so, I will not need HSRP and just rely on spanning tree. I think the point is not to rely on STP. That is why I use HSRP.
0
Craig BeckCommented:
HSRP and STP are completely different things. HSRP shouldn't be used instead of STP.

STP maintains a loop-free topology.
HSRP provides router redundancy.
0
Craig BeckCommented:
Im wondering now why you have HSRP at all actually?!
0
giltjrCommented:
What I believe craigbeck is suggesting is make the HSRP address on the 3560's 10.10.1.40 and make the firewall 10.10.1.1.  He can correct me if I am wrong.

If you do that you will need to add a route to the firewall for 10.10.20.0/24 that points to 10.10.10.40 (the new HSRP address).

Not knowing exactly why your network is setup the way it is, if where me and I had an non-used Ethernet interface on the firewall I would look at connecting the firewall directly to the 3560's you have backing each other up.  Remove 3560 the firewall is currently connected to as a single point of failure.  Create a new VLAN between the firewall and the 3560's to isolate the firewall from your internal network.  Something like

           /------------------- >  3560 TOP <----------------------\
          /                                    /\                                             \
FW <        VLAN 30                |   VLAN10 and VLAN20    > 3560 <------> Server
         \                                      \/                                            /
           \-------------------- >  3560 BOT <--------------------/

VLAN 30 is the "inside" of the firewall and the "outside" of the 3560's.  

VLAN 10 and VLAN 20 are "inside" the 3560's.
0
leblancAccountingAuthor Commented:
giltjr,
So in your suggestion, there is no need for HSRP. Correct? So you will have everything layer 2.

I've been told that the FW (it is a Fortigate) cannot have 2 interfaces to the internal network. So your suggestion will not work.

The plan was to use HSRP as GW redundancy for all internal hosts.
0
Craig BeckCommented:
Giltjr is correct.

You only need HSRP for redundancy if you have two routers going to the same place.  Here, you don't! I know it looks like you do, but if the VLAN IDs in the diagram are correct the routing won't work properly.
0
leblancAccountingAuthor Commented:
"You only need HSRP for redundancy if you have two routers going to the same place.  Here, you don't! ". I am not sure I understand this. The 2 cores are going to the same 3560.
0
giltjrCommented:
-->  The 2 cores are going to the same 3560.

True but with the way your network is setup you don't need redundancy at layer 3, only at layer 2.  Because the 3560 and the firewall are in the same VLAN they are all in the same layer 2 network.  

When 10.10.1.90 accesses the Internet it is using 10.10.1.40 as its router to the internet.  The traffic that flows through all of the 3560's is flowing through just as Ethernet frames only, not really IP packets.  You could change your default route on everything to 10.10.1.40 (as craigbeck suggested) remove the VLAN 3 interface from your two core 3560's and everything will still work.
0
skullnobrainsCommented:
i'm still not 100% sure i understand what you want, but here i my 2 cents :

- don't change anything on the machines if it's likely to take a long time.
- don't change the HSRP address if it is the gateway of all your machines.
this part works fine, apparently so just keep it

- change the address of both the websense and your firewall so they are outside of the 10.10.1.0/24 network.
- add a route on your router to make the gateway either of those. depending on your websense integration, you may want to route all the traffic through the websense or just let it sniff either network.

if you route through the websense, you need a dedicated lan for the router and the websense and a second dedicated lan for the websense and the firewall. the default gateway of the router is the websense. the gateway of the websense is the firewall. you
can bypass the websense for some of the traffic using policy routing on the router.

--------

"Since your PC is in the same subnet as 10.10.1.40, it will send a ICMP redirect to your PC to use it as the route to any IP host that does not have a more specific route.  "
I am not sure I understand this.

if a router or any machine that acts as a router receives a packet on a specific interface+network combination and resends it on the same interface+network combo, it understands that this packet could have been sent directly and that going through itself is plain useless. so it sends the corresponding information to the sender : "hey, you're sending packets to me that i forward to router x but router  x is in the same network as us so why don't you send the packets directly to him instead of going through me"
0
Craig BeckCommented:
- don't change the HSRP address if it is the gateway of all your machines.
this part works fine, apparently so just keep it

That's the bit that's causing the issue, if I'm correct, so it's not working fine.

The active 3560 HSRP switch is sending an ICMP redirect to the clients who are using .1 as their default gateway, sending them to .40.  This is breaking routing (or traffic-flow at least) from what I can determine.

I still can't see from looking at the diagram why HSRP is actually being used here though, as the IP ranges are the same, yet the VLANs are different, therefore they're not routing the same L2 segments.  If I am correct, and the VLAN IDs are correct in the diagram, this design is bad and HSRP should be removed!

Someone tell me I'm talking mumbo-jumbo?! :-)
0
skullnobrainsCommented:
having an icmp redirect does not prevent the network from working properly. i don't like it because it makes the network prone to fail on human errors. for example is the first router is removed, everything will continue working fine until one of the machines is rebooted. if a power failure occurs at that time, the whole network goes down. but it is a working situation nevertheless as long as both routers are up.

the problem is that the upstream router of the hsrp set is in the same LAN

i believe it is much simpler to move a single upstream router to a dedicated LAN rather than make changes on all the machines of the initial LAN. also i believe that HSRP is set for a reason and bypassing the hsrp set by using a different gateway will produce various points of failure that do not exist currently.

<off topic and flamish>
then i pretty much agree that hsrp is not currently used anyway and the network schema looks like one of the uselessly complicated, impossible to debug, error-prone, and please-call-us-when-it-breaks-because-when-that-happens-there-is-no-way-you-can-repair-it-yourself idiotic solutions consultants thend to impose everywhere because they don't want to try and look at what the client needs and rather tend to simply sell the exact same generic barely working setup over and over again

but then we have no idea of the requirements
</off>
0
giltjrCommented:
craigbeck.  I agree.  We must be missing something.  

pitachip, I just noticed you have a VLAN 30 already.  First, it is only on CORE2, so if you lose CORE2 you loose access to VLAN 30.  In my  suggested change you the new VLAN would create would be VLAN 40 instead of VLAN 30.  You would also need to add VLAN 30 to CORE1 if you really want CORE1 and CORE2 to back each other up.

Honestly I think you need to really look at your whole network and do a redesign.

Is this all the routers and switches you have?
0
Craig BeckCommented:
:-D

I agree that ICMP redirect doesn't stop things working, but it appears that something within that process is breaking things.  All I'm saying is get rid of the redirect as it's unnecessary.

The easiest way to remedy things (assuming our previous assumptions are correct) is to swap the HSRP address with the router addres.  This would minimize the administrative overhead in making the changes as the only other thing we'd need to do to get this to work would be to adjust a static route or two on the router.
0
skullnobrainsCommented:
i assume you means swap the hsrp address with the firewall's because it would not solve anything otherwise. this seems sound as well, but risky : given the network schema i would not vouch that all machines have direct access to the firewall in all situations and there is likely no failover that way.

maybe we'd get the big picture with a more complete schema. given what we see at least 2 routers are plain useless so i'd assume they are needed for something that is not shown on the schema.... and i'd rather not advice changing stuff in that situation.

but clearly having the firewall in the same network as the hosts while the hosts are setup with a different gateway is the reason for the initial problem.

a simple solution with minimal impact on the existing setup may be to just add "no ip redirects" on the router(s)
0
Craig BeckCommented:
I don't think that would work though, as you'd then stop people who are pointing to .1 from reaching the internet.

I'd have to lab that though to be sure.
0
giltjrCommented:
Well, ultimately the question was why he was getting the results from traceroute he was seeing.  We have answered that and I hope in a way he understands.

I think we all agree his network is not the best design, but it works and should not cause any major problems.

If he would like some suggestion he can/should open a new question asking for design help.
0
skullnobrainsCommented:
@craigbeck : it does work. it just allows for the useless hop to stay in use. likewise the hosts can be instructed or firewalled in such a way that they ignore those redirects. most routers in that case will still forward the packets while sending redirects.
0
leblancAccountingAuthor Commented:
Thank you for the active discussion. Sorry if I confused you. But here it is:

- vlan30 is a mistake. It should be vlan20 on core2.

- As far as HSRP is concerned, I will have to disagree that HSRP is not needed for redundancy in this case. core1 is configured as the primary for vlan10. Core2 is configured as the primary for vlan20. Now if core1 goes down, server1 traffic with GW 10.10.1.1 (vlan10) will go to core2 and still can access the internet. So that is what HSRP is doing, it gives the hosts a GW redundancy. Agree?

- From core1 and core2, their default GW is the FW because that is where they get out to the outside world.
- The Websense and switch2 exchange info via WCCP. The Websense has the GW as switch2 (10.10.1.5)

Now I am a bit confused when you addressed the ICMP redirect. I am not sure I understand the correlation here. May be you can shed some light?
0
Craig BeckCommented:
So, as you've now cleared up the issue regarding VLAN30 that confirms that HSRP is indeed relevant.

Now that makes the solution even easier.  As I think giltjr said earlier in post:

http://www.experts-exchange.com/Networking/Network_Management/Network_Analysis/Q_28252621.html#a39574918

...just create a new VLAN and subnet for the link to the router on both cores, and send all traffic via the HSRP address from hosts.
0
skullnobrainsCommented:
- As far as HSRP is concerned, I will have to disagree that HSRP is not needed for redundancy in this case. core1 is configured as the primary for vlan10. Core2 is configured as the primary for vlan20. Now if core1 goes down, server1 traffic with GW 10.10.1.1 (vlan10) will go to core2 and still can access the internet. So that is what HSRP is doing, it gives the hosts a GW redundancy. Agree?

yes which is why you should just leave that part alone

- From core1 and core2, their default GW is the FW because that is where they get out to the outside world.

ok. then you need to move the firewall on a DIFFERENT NETWORK SEGMENT : it should not be in the same LAN as the hosts or the router will send an ICMP redirect to tell those hosts they can reach the firewall directly.

or you can keep yourr existing setup and instruct your cores1 and 2 not to send icmp redirects. this is not too bad a setup since you want the extra-hop. the downside is that any user that knows the firewall address can set it up as a gateway effectively bypassing the websense. you can prevent this by setting an ACL that prevents direct communication with the firewall if you want to prevent this.

- The Websense and switch2 exchange info via WCCP. The Websense has the GW as switch2 (10.10.1.5)

if your websense is integrated using WCCP, it's location should not matter since the traffic does not actually get routed through it. unless you perform some network sniffing with the websense, it is nevertheless better practice to isolate the WCCP link on a network segment of it's own, but not required.
0
leblancAccountingAuthor Commented:
skullnobrains,
" it should not be in the same LAN as the hosts or the router will send an ICMP redirect to tell those hosts they can reach the firewall directly."

Please tell me more about ICMP redirect. Is this something that is sent by default from the router? By router, you meant core1 and core2. Correct? Will I see those ICMP redirect in wireshark? What if the hosts know about the FW as the GW, Don't they still go through their configured GW?

Thanks
0
giltjrCommented:
Yes you will see this in a packet capture.  Routers send this by default.  This is done to reduce the number of IP hosts that will need to handle the packets.  Why go through 2 hops when you can to through 1?

By router we mean any box that can perform routing, so yes CORE1 and CORE2 as they are fully functional L3 (routers) devices.

If a devices receives a ICMP redirect, they will start using what they are told.  

They will NOT use the default router/GW if they are told to use a different router/GW.  The default router/GW is used when there is not a more specific route in the routing table.

When your computer gets a ICMP redirect, it will temporally add a more specific route for the destination address using the router/GW that it is told to use in the ICMP redirect.
0
leblancAccountingAuthor Commented:
I appreciate the explanation. I understand the ICMP redirect now. But in my case, the extra hop will not be the cause for the inconsistent traceroute. It is just a performance issue, two hops instead of one hop. Agree?
0
skullnobrainsCommented:
Please tell me more about ICMP redirect. Is this something that is sent by default from the router? By router, you meant core1 and core2. Correct? Will I see those ICMP redirect in wireshark? What if the hosts know about the FW as the GW, Don't they still go through their configured GW?

since the first part is already covered, i'll keep it short : yes, yes, and yes.

if the hosts are configured to use the firewall as the gateway, the firewall has no reason to send an icmp redirect because it's own gateway is not in the same LAN as the hosts so it has a reason to exist as a router.

if they are configured to use the coreX address, once they receive the ICMP redirect, they start using the firewall as a gateway when they want to reach the same host. this is the reason why you have a problem initially : they change gateways during the TCP handshake or the beginning of the download and the firewall does not see the packets it receives as part of the same connection and rejects them.

But in my case, the extra hop will not be the cause for the inconsistent traceroute

yes it is : the router understands it is useless as a hop and sends a redirect. so your first traceroute shows host->router->firewall->WAN but the second will show host->firewall->WAN. additionnaly, one of the subsequent hops will likely be hidden by the first traceroute.

It is just a performance issue, two hops instead of one hop
- true if you set "no icmp redirect" on the routers
- false if you don't because the first instance of most connections will break which is your current situation.
0
leblancAccountingAuthor Commented:
skullnobrains,

So the routing does not work correctly because of the icmp redirect issue and the GW assignment. However, it does not explain why I only have problems in my vlan10 and not on my vlan20. The traceroute for vlan20 is consistently the same.
0
giltjrCommented:
VLAN 20 works because your router does not have a interface on VLAN 20.

Your default route on VLAN20 is 10.10.20.1.
In order to get to the "Internet" from 10.10.20.1, you must go through 10.10.10.1 and they 10.10.10.40.

Hosts in 10.10.20.0/24 can NOT talk directly to 10.10.10.40 because they are on two different IP networks.

So the L3 path for VLAN 20 is

"your comptuer on VLAN 20" <---> 10.10.20.1 <-- 10.10.10.1 --> 10.10.10.40 <--> Internet

I believe you problem understanding this is that you are looking at two different levels of networking (layer 2 and layer 3) as though they are the same because they are on the same physical hardware and the same physical wires.  The layer 3 part is logically (virtually) different when processed.
0
leblancAccountingAuthor Commented:
"your comptuer on VLAN 20" <---> 10.10.20.1 <-- 10.10.10.1 --> 10.10.10.40 <--> Internet" I am not sure why it has to go through 10.10.1.1 first. When the packet hit 10.10.20.1, it will has a default route to 10.10.1.40.

I am not sure I understand how can an extra hop can mess up the routing for vlan 10.
0
Craig BeckCommented:
10.10.10.1 IS 10.10.20.1

giltjr is showing you the router you're going through.

When you're not on the 10.10.10.0/24 subnet you don't see the ICMP redirect as your traffic is actually going through the router instead of being told to go somewhere else.

We could go round and round here but the answer to this question at the end of the day is to simply use the router as the default gateway no matter how much of a nuisance it will be to change things, or use a routed link between the core and the router and point everyone at the HSRP address (as they are already).
0
skullnobrainsCommented:
or set no icmp redirect on the routers

would you try any of the proposed solutions ?
is there still something unclear concerning this problem ?
0
leblancAccountingAuthor Commented:
"simply use the router as the default gateway " By router, you meant the FW 10.10.1.40. Correct? If yes then HSRP is useless then.
0
skullnobrainsCommented:
which is the reason why i proposed you 2 different solutions that do not involve removing HSRP. setting "no icmp redirect" on the routers will not require any modification of your address plan. given your understandable reluctancy to make big structural changes, this is likely your best course of action.
0
Craig BeckCommented:
"simply use the router as the default gateway " By router, you meant the FW 10.10.1.40. Correct? If yes then HSRP is useless then.
Why is it useless?  It's still ok to use it for the 10.10.20.0/24 subnet - just not for the internet-bound traffic.

However, I think you're misunderstanding how HSRP should be implemented here.  Your current HSRP implementation isn't very good if you want to provide redundancy to the internet router - in fact it's pointless.

At the end of the day you only have one internet router, so providing 2 paths to it is only ever going to give you L2 redundancy.

Also it's not clear to me which switches are running HSRP.  Are all switches running HSRP, or just Core1 and Core2?  If it is only the core switches running HSRP, why do you have SVIs for the 10.10.20.0/24 subnet on the other two switches?  Why do you have a default-gateway AND a default route on one switch?

It all just doesn't make any sense.  I'd redesign it slightly.

I don't think that turning off ICMP redirects will be a good idea.  It may work and solve the issue but it will mean that all traffic will have to consume bandwidth on a link which it otherwise wouldn't need to traverse.  This isn't desirable and if you have a busy network it'll cause a different problem to what you're facing now.

HSRP doesn't work too well with ICMP redirects, so this is probably why you're seeing lost packets.

http://www.cisco.com/en/US/docs/ios-xml/ios/ipapp_fhrp/configuration/15-s/fhp-hsrp-icmp.html?referring_site=bodynav

So, off the top of my head here's a list of what I'd do...

1] Remove the unnecessary SVIs from Switch1 and Switch2.  You only need management IP addresses on those switches.

2] Create a new VLAN on Core1, Core2 and Switch2 (not Switch1) for the link to the Router and add it to the existing trunks.

3] Configure a SVI for the new VLAN on Core1 and Core2 and give each a new IP address (a /28 range will be enough).

4] Configure HSRP between the two Cores on the new SVI and designate one as the primary HSRP router.  Make sure this is the same as the LAN side of the Core and use tracking to ensure the same Core is routing traffic both ways.

5] Reconfigure the router's interface to have an IP address on the new subnet you just created.

6] Configure a static route to point all internal traffic to the HSRP address on the new SVI at the core.

7] Reconfigure the default static route on the core to point all outbound traffic to the new IP address of the router.


That will mean you still have HSRP for the VLANs and internet traffic, but traffic-flow will be a bit more efficient and the ICMP redirect won't cause an issue.
0
skullnobrainsCommented:
looks like a complicated but working solution.

if you don't mind, i'd like to discuss these 2 points :

I don't think that turning off ICMP redirects will be a good idea.  It may work and solve the issue but it will mean that all traffic will have to consume bandwidth on a link which it otherwise wouldn't need to traverse.  This isn't desirable and if you have a busy network it'll cause a different problem to what you're facing now.

i pretty much agree that it is better to do otherwise.

you are not consuming any extra bandwidth in this setup because the same routers would be traversed using level2 as if they were switches. the extra processing is just the routing which is quite neglectible.

nevertheless many networks are setup in that way and work fine. would you please elaborate which kind of problems you expect ?

HSRP doesn't work too well with ICMP redirects, so this is probably why you're seeing lost packets.

it's not lost packets, but lost or impossible to establish TCP connections. these are due to the fortinet (or possibly a router along the way before that) rejecting packets because the SYN went through another route.

-----

if the no icmp redirect is not desirable, i'd suggest something simpler

- set an extra address on the forgigate. let's say 192.168.0.1/24 (i'm not using 10 because i want to make sure it is a different network and you don't provide network masks.
- set an extra address on the HSRP set of routers 192.168.0.x/24
- change the route on the routers to use 192.168.0.1 instead of 10.10.1.40
it is better but not required to use a dedicated vlan for the new network segment.
0
Craig BeckCommented:
Well, actually it is lost packets...

http://www.cisco.com/en/US/docs/ios-xml/ios/ipapp_fhrp/configuration/15-s/fhp-hsrp-icmp.html?referring_site=bodynav

HSRP filters ICMP redirects by default.  There are issues with lost packets due to this filtering.  The solution to this is detailed here...

https://supportforums.cisco.com/docs/DOC-5555

By disabling ICMP redirect you would be consuming extra bandwidth on at least one extra link if your traffic is coming from the switch which wasn't the active HSRP forwarder.  Think about it...

With ICMP Redirect (when it works):
The active HSRP forwarder is Core1.  You are connected to Core2 and want to get to the internet.  Your initial packet goes to Core 1 over the link to Core2.  Core1 will send an ICMP redirect to you to tell you to go straight to the router.  Every subsequent packet goes straight from Core2 to the router via switch2.

Without ICMP Redirect:
The active HSRP forwarder is Core1.  You are connected to Core2 and want to get to the internet.  Your initial packet goes to Core 1 over the link to Core2 and is then sent to the router.  No ICMP redirect is received so every subsequent packet will also go across the link to Core1 then up to the router.

That is in an optimal scenario too.  If STP isn't configured correctly this could come back across the link to Core1 then up to the router.  That would be a nightmare.  At least with a routed link to the router you alleviate the STP element.

Really, I think it is definitely better to use a routed link to the router here if HSRP is going to be used.
0
leblancAccountingAuthor Commented:
I did not this design raised many interesting problems. I did not design this network BTW. I just got involved to fix it.

Also, there are no vlan20 on swith1. That was a mistake. switch1 is just a L2 switch.
For switch2, from my understanding, there are users from vlan10 and vlan20 connected to it and they needed vlan20 for wccp redirect to work. That is why they have a trunk between swich2 and the 2 cores. Only the 2 cores are running HSRP.

I am thinking moving all users for vlan10 an vlan20 to another access switch and make the link between switch2 and the 2 cores a L3 as well as the link between switch2 and the FW (which represents by the router symbol).

I am curious on craigbeck's solution with the new vlan for 2 cores and switch2. I am just wondering how this will solve the problem.

Thanks
0
Craig BeckCommented:
It will solve the problem by removing the ICMP redirect and optimally routing traffic.
0
giltjrCommented:
My suggestion here:

http://www.experts-exchange.com/Networking/Network_Management/Network_Analysis/Q_28252621.html#a39574918

Is basally the same thing craigbeck is suggesting.

It creates a L2 and L3 network that sits between your internal network and the Internet.  Not only will the ICMP re-direct go away, but it creates more separation between your internal network and the Internet.
0
Craig BeckCommented:
Exactly that giltjr!
0
leblancAccountingAuthor Commented:
Right. L2 stops at the 2 cores.
I have to look into your suggestion of creating a vlan just for switch2 and the 2 cores. I don't have a good grasp on that yet.
0
giltjrCommented:
By using ACL's on your 2 cores you are creating a DMZ  

Internet <--> FW <---> DMZ <---> CORE <---> Inside

The DMZ is the new VLAN connects the FW and CORE.
0
leblancAccountingAuthor Commented:
ACLs?

If your new VLAN suggestion is what make switch2 the DMZ then it is fine. But  in this design, switch2 is not the DMZ. Everything behind that FW is considered internal and trusted. The DMZ is not considered in this design. Switch2 is the distribution WAN.
0
giltjrCommented:
Don't take this the wrong way, but what is your experience? An ACL is an Access Control List, used on many routers and depending on the router is the same thing as a rule or policy on a firewall.  Some firewalls even call them access control lists.

Everything behind any firewall is considered "inside", but what you end up with is "2 insides"

Firewall < -- "Inside #1 (a.k.a. DMZ) " --> L3 switch with ACL's <-- "Inside #2" ---> CORE's.

If you can  afford or don't really need a 2nd firewall, the above is one of the two typical setups for a DMZ.  The other typical setup is called a 3 legged dog, where you have a second interface on the firewall that is the DMZ.

When you have the 3 legged dog, the firewall is normally the default routers for the whole network.
0
Craig BeckCommented:
Ah I think I see the confusion with the DMZ issue here (although someone correct me if I'm wrong)...

@giltjr - contrary to the network diagram, I don't think switch2 is actually operating as L3.  Therefore pitachip is questioning the thoughts around the DMZ.

What I was suggesting is to make a new L2 segment which is trunked between Core1, Core2 and Switch2 for the link to the router, so switch2 is just a switch which allows the router to see the new SVI on Core1 and 2.

You can't use a L3 port for this as that would remove the redundancy element, so it has to be trunked to an SVI via a switch.
0
skullnobrainsCommented:
@craigbeck, your doc does not say that at all. here is the relevant part

ICMP Redirects to Non-HSRP Devices

ICMP redirects to devices not running HSRP on their local interface are permitted. No redundancy is lost if hosts learn the real IP address of non-HSRP devices.

you may also want to note that your understanding of the problem and proposed solution are obvious follow-ups of things that @giltjr and myself posted beforehand, with added unrelated documentation.

---

@pitachip

like both @giltjs and myself suggested, you need to dedicate a network segment to the communication between the core routers and the firewall.

using a different vlan but failing to dedicate addresses outside of the 10.10.1.0/24 network will only result in a non working setup

the ciscos should not send icmp redirects in that case because the interfaces will be different. the firewall will try to directly speak to the machines that are not on his vlan so the return path will just be broken. if they do send the redirect, the machines will not be able to hit the firewall for the same reason.
0
Craig BeckCommented:
@skullnobrains - I think someone got out of bed on the wrong side!
you may also want to note that your understanding of the problem and proposed solution are obvious follow-ups of things that @giltjr and myself posted beforehand, with added unrelated documentation.
Back up!  I'm not here to argue, but...

the firewall will try to directly speak to the machines that are not on his vlan
How?  If the firewall doesn't have an IP address on the same subnet as the destination it will send the traffic to the next-hop if one is configured, or nowhere if it doesn't - ALWAYS.
unless you know what you are doing, either stick to level 2 or stick to level 3. in your case, level 3 most likely meaning you should not use the same ips on both side of any router.
Firstly, it's LAYER, not LEVEL.  Secondly, it's impossible to assign the same IP to two different interfaces on the same router, or to separate interfaces on the same subnet.

STP plays a big part here, yet no-one else has even mentioned it!  That explains a thing or two...

There's obviously too many cooks in the kitchen, and for that reason - I'm out!
0
leblancAccountingAuthor Commented:
Experts,

I think this is great to share our thoughts and see what may or may not work. Personally, it helps me to see the issue from different angles.
 
I am just a beginner when it comes to networking. I understand what an ACL is and what the DMZ is. If this is what your suggested design is to make it work with the current design, then I can understand. But in my opinion it is quite complex and adding processing work to switch2, I am not sure if I want to put my internal network behind a DMZ (that is just me)

Anyway, I think I can try to understand this but at the end of the day, I have to implement it to see if it is working. Thanks.
0
giltjrCommented:
O.K, Your original question/issue was bout the inconsistent traceroute.


So the questions to you are:

1)  Do you understand why you were seeing the route change?
2) Do you want to leave your network as is and accecpt the ICMP redirect on your VLAN1?
3) Or do you want a little help and a few suggestion of how to re-design your network to make it a little simpler and get rid of the ICMP redirects.

If you want #3, then we need to understand what you would like to accomplish and how you think your network may grow.

Example:

1) You don't want a DMZ.  
2) You have two IP subnets (VLAN's) and that is all you plan to have for the next 5 years.
3) You have 40 host devices (servers, clients, printers, routers, and switches) and the current plan will increase that to 60 over the next 5 years.

Or:

1) You want a DMZ.
2) You plan to grow from 40 network hosts to 400 over the next 5 years.
3) You plan to add VLAN's and IP subnets as the network grows.
0
skullnobrainsCommented:
@giltjr : ++ for the clarification. i'll be waiting for answers as well.

<off topic @craigbeck>
i'm not trying to get into an argument either. sorry, i'll keep the conversation technical. please accept apologies as my comment was a little aggressive.

How?  If the firewall doesn't have an IP address on the same subnet as the destination it will send the traffic to the next-hop if one is configured, or nowhere if it doesn't - ALWAYS.

exactly : it will look at it's routing table, and find it is supposed to have a direct link to the machine. it will issue an ARP request, won't get an answer because the host is in a different vlan and drop the packet because it cannot send it anywhere.

this is exactly the reason why i insist on the fact it is important to create a dedicated SUBNET. using a separate VLAN only enforces the use of a different subnet so the hosts cannot communicate directly with the firewall by using the proper ip and default route.

using several vlans inside the same subnet is always a bad idea.

---

i'd be quite happy to know why you think STP is related to this problem in any way
<off>
0
Craig BeckCommented:
STP is completely related as there are two paths to the router from Core1 and Core2.  They are Layer-2 paths, not Layer-3, so STP is completely relevant here.

If STP is configured correctly/optimally there should be one link to Switch2 which is blocking traffic, otherwise there would be a loop and all traffic would get lost, not just routed traffic.  Therefore it's completely possible that if Core1 is the active forwarder but the link from Core1 to Switch2 is blocked by STP (which by my guess it probably is), the traffic would have no choice but to traverse the following path:

Host ---> Core2 ---> Core1 ---> Core2 ---> Switch2 ---> Router

I've seen this hundreds of times, especially where people use two 6500 cores but no VSS.  They think it's acceptable to configure one core as the STP root for all VLANs, yet make the SVIs for each VLAN staggered.  This creates mayhem, as well as congestion (which is another cause of lost packets).

For example:

Core1 is the active forwarder for VLANs 10,20,30,40,50
Core2 is the active forwarder for VLANs 60,70,80,90,100
Core1 is the STP root for all VLANs.

In a STP configuration which is default on all other switches this would be bad as traffic for VLANs routed via Core2 would probably have to come to Core1 first, then cross the link to Core2, then come back to Core1 on the return-path.  Traffic on VLANs 10,20,30,40,50 would be ok though as their L2 path is already optimal.

Obviously it's not always like that, but quite often it's not as easy as just leaving STP alone and accepting the defaults.

All I'm saying is that there's more to it than just the obvious.  Even though this is related to HSRP, there are still other components to consider which could cause an issue.


Going back to what you said, I apologise I read your comment regarding the firewall in the wrong context.  I agree, if you don't change the subnet it will not work.  I also agree that using multiple subnets in the same L2 domain is a bad idea.

Don't get me wrong here, I'm not trying to steal anyone's thunder.  I was actually providing backup to both of the suggestions made prior to my comment.  If I disagree on certain aspects I don't mean to sound aggressive so I'll apologise again if that's how it appeared.
0
skullnobrainsCommented:
no problem on my side : we don't have to agree and it is probably interesting to have different views from different people.

technically, i still think that loosing the first connection, but having something that works afterwards is totally inconsistent with stp related problems

i also think that this exact problem can easily be reproduced in a lab with a single router, no stp, and no hsrp, so i see no reason to mix them in.

removing the redirection, or setting up a dedicated lan segment for the communication with the firewall are both working ideas. in the latter case, i forgot that we need a route on the firewall so it reaches the lan though the router

removing the redirection can be tried in a matter of seconds. adding an extra dedicated lan can be done gradually by first making the router and firewall ping one another, then setup the firewall properly (rules and return route), and possibly use policy routes on the router to test the setup for a few machines before using it in production.
0
skullnobrainsCommented:
@pitachip : feel free to post the solution you used and how it went. as you can see, none of us is pretty definitive on what is good/best practice with reasonable/overkill work so we can learn from your experience.

best regards to all
0
leblancAccountingAuthor Commented:
skullnobrains,

All of your proposed solutions sound very interesting. But the work-around adds complexity. So I have decided to make everything layer 3 beyond the core switches. I am testing two scenario, one with all static routes http://www.experts-exchange.com/Networking/Network_Management/Network_Design_and_Methodology/Q_28282400.html and one with EIGRP. As you can see from my thread for the static routes, I am having issue on the link between core1 and core2. The link is currently a trunk. I was thinking may be I should have it as a layer 3. but not sure how HSRP will communicate between core1 and core2. Thanks
0
skullnobrainsCommented:
thanks a lot for posting back some info, i'm following the other thread with some interest. i won't be participating though since you have more than enough people on that thread already and i don't really believe such network designs to be good practice (overly complicated and difficult to maintain)

if my 2 cents is of interest to you
- avoid mixing layers 2 and 3 on the same equipment. do it on an existing network in order to solve an existing specific problem but don't start up with this unless you're used to working in such environments.
- don't delegate the firewall's job to a router. like before you might end up doing so in a couple of existing LANs that have huge internal traffic in order to ease the firewall afterwards.
- don't route when switching is enough. you'd usually use layer 3 when you need separate LANs for security reasons, or you experience latency over slow link, or you have 10k windows machines in the same LAN and the shoutcast is starting to be a problem.
- don't setup a network with assymetric routes
- don't use 3 or 4 network equipments together in order to achieve redundancy that can easily be achieved with 2. if you can't do it with only 2 of them, then you're probably trying to setup routing where a bunch of switches would behave better ( see above : using routing instead of switching and possibly using the routers where firewalls would be enough )

i hope you'll find a suitable solution

best regards
0
Craig BeckCommented:
With all respect, the solution I've provided in the opposite thread isn't complicated.  It took me 10 mins to lab it!

Asymmetric routing isn't an issue here.  There are no firewalls between the users and the multiple paths so issues such as session tracking don't exist.  As long as it's not implemented in the wrong place it's perfectly acceptable and it's actually used all over the internet.

At some point the traffic needs to be routed, so I don't understand the comment regarding using routing when a bunch of switches would be better?!

Also, I must have missed the memo when Layer3 SWITCHES became obsolete?!
- avoid mixing layers 2 and 3 on the same equipment

Anyway, this question is closed...
0
skullnobrainsCommented:
craigbeck, i was not talking about your solution in the other post. just a bunch of regular advice which seemed relevant to this one. like you said, this post is closed. let's not spam others.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Network Analysis

From novice to tech pro — start learning today.