(Hopefully) Simple question about site to site traffic with Juniper Netscreens + MPLS

PRNtom
PRNtom used Ask the Experts™
on
Hello all-

This has been a problem I've been sitting on for a few months now, due to scheduling difficulties with the customer and their MPLS/ISP vendor.  The customer has a very simple network, with a Juniper

Netscreen SSG-5 at the main office, and another at their satellite office.  They used to have a Netscreen-Netscreen VPN between the two sites, which worked fine.  Now, they have a MPLS (higher bandwidth) which also works fine.

What we are trying to accomplish is setting up a failover from the MPLS to the VPN for site to site traffic, should the MPLS go down.  Currently, we have the MPLS router on the same subnet as the LAN
(10.0.0.0/24), plugged into bgroup0 on the Juniper (for those not familiar with the SSG line, the device can group ports in the same security zone together into a 'bgroup', which functions like a mini switch made up of the physical ports on the device assigned to that zone).  Firewall is at 10.0.0.1 and the MPLS router is at 10.0.0.2, static route entry to push traffic across to the other office, etc.  Very simple.

In order to do failover though, I need to be able to failover to another interface (rather than just an IP).  Since the MPLS was just acting as a LAN device in bgroup0 instead of being plugged into it's own interface, we couldn't do this.  So, we decided to break off one port on the Netscreen, add in a seperate dummy subnet (10.0.1.0/24), and change the MPLS router's IP and hang it off of this interface.  The same was done at the other site. (Satellite MPLS subnet 10.0.2.0/24 and satellite office LAN subnet 10.0.3.0/24)

So now, the traffic looks like this (sorry for the crappy Visio):

http://i.imgur.com/yd0w8.jpg

This also connects up fine, and the physical path is the same, there's just the additional logical hop of the two extra subnets.  Static routes were put into place for these.  

This configuration worked fine... at first.  We could ping between the two sites, and everything seemed ok.  However, once we got into more application-level stuff, such as Outlook, and opening up files across the MPLS, we started getting what seemed to be some kind of disconnect or drop issue.  I could browse a UNC share to a Windows file server for example, and it would go a few levels down, but then stop and not do anything... almost like it was timing out, but not.  No actual error message.  Then in Outlook it would open and connect, but then say a minute later that it was disconnected from the server.  

Basically, everything became 'flaky' in a way that is hard to describe.  It was like timeout/packet loss/dropping without causing the actual error messages you might expect to see.

I feel like I am dealing with a TTL or some other timing/packet life sort of thing, but I am not well versed in this, and admittedly grasping at straws.

I put in a ticket with Juniper support about it, and they refused to even speculate or give me an idea of some things to check, and wanted about a million debug and log dump things from me to even start looking at it.  I was hopeful since the customer's network is so simple, but apparently that's not how they operate.

The rub is that we need to coordinate myself, the customer's IT guy, and the customer's MPLS vendor all to be on a conference call at the same time to change the setup on all of the devices to the desired  (but as of yet non-functioning) configuration, in order to allow Juniper to collect these logs, but that creates problems between the two sites, and they have to revert it back before too long.  So, in order to get help from Juniper, I am looking at having to coordinate myself, the customer, the MPLS vendor, AND Juniper tech support all at a specific after-hours time.  Ugh.

I know this is a lot of detail, I will be happy to clarify anything.

**Short version (can't blame you):  Site to site MPLS traffic doesn't work properly after adding in two extra subnet hops, can't figure out why.**
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Well, to be honest, I didn't read the whole thing after looking at the diagram.
First off, I'd say you're making the MPLS connection more complicated than it needs to be.
I'm using the same technology with the following differences:

1) just plug the MPLS routers into the LANs.
2) set a route in the SSG-5 (as the default gateway) that points to the MPLS router IP on the LAN for packets destined for the remote LAN.

i.e.
10.0.0.0/24 to 10.0.3.2 (or whatever IP you assign the MPLS router)
and
10.0.3.0/24 to 10.0.0.2 (or whatever IP you assign the MPLS router)
3) turn off stateful packet inspection on the SSG-5 on the LAN/Trust side because otherwise returning packets re: the MPLS link will be dropped.

Then, presumably the MPLS routers already know how to route to the subnets.

Also, I suppose that the MPLS routers provide a "connect" function or you wouldn't need them, right?

OK.  Now I'll go back and read the rest....

I'll stick with what I said for now.

Author

Commented:
fmarshall:

Thanks for the reply!  That is fine for basic connectivity, but what we're trying to accomplish is failover from the MPLS to the firewall-to-firewall VPN should the MPLS go down.  In order to do this (on the netscreens at least), the MPLS has to be on it's own interface on the Netscreen (AS FAR AS I KNOW!!).  Hopefully this is more clear :)
Well, I was afraid there was a good reason for doing it that way.

I wonder if there isn't another way to do the failover.  
You only need to switch the inter-site traffic.

The way I've been doing something similar is this way:

SSG-5 provides fallback internet connection
MPLS gets to primary internet connection at another site through another SSG-5 there.
RV042 is MPLS link AND is primary internet connection interface AND failover to SSG-5 for internet.
The failover works but doesn't like to switch back automatically.

So, I guess you could do this:
SSG-5 to primary internet connection.
RV042 is the default gateway.
RV042 to MPLS link with failover to SSG-5/VPN.
That's almost the same as what I have working now.
But the failover mechanism is in the MPLS router instead of the SSG-5.

So, it looks like this:

LAN <> RVO42 <>WAN1 to MPLS
                            <>WAN2 to SSG-5 LAN<>SSG-5 Internet Connection

You might do this:
LAN <> RVO42 <>WAN1 to MPLS
                            <>WAN2 to SSG-5 LAN 2<>SSG-5 Internet Connection
         <>                                     SSG-5 LAN 1<>

You could possibly do this a couple of ways:
In my arrangement, the RV042 is the default LAN gateway.
Because of the failover "stickiness" I've changed from failover to load balancing with a lower metric on the MPLS connection for the default route 0.0.0.0.
Seems like you could do the same thing doubling up on the remote LAN connection with the lower metric on the MPLS connection for the remote LAN route.

It's not so clear to me if you keep the SSG-5 as the default gateway.  This is how I used to have it set up and the SSG-5 would route over to the RV042 for the MPLS subnets.  That was before I implemented the redundancy.
I wonder if you could do this:
SSG-5 as default gateway.
Route remote subnet to MPLS router (if it's an RV042 or other failover/load balancing device).
MPLS router fail over to its WAN2 connected to LAN 2 in SSG-5.
Obviously you would not want the first subnet route in the SSG-5 LAN 1 to affect traffic coming into LAN 2 or there would be a loop.
So, since I've not tried this I can't say if it's a good idea or not.
I think you'd probably want to use load balancing in the RV042 like I'm doing because of the stickiness of failover.  But that may be acceptable.
But this approach seems hokey to just keep the same default interface device.

The practicality of any of this may depend on the MPLS router you're using or would use.
I wish I could give a higher recommendation to the RV042.  It doesn't do everything I'd like but it's pretty good for what it does do.

Anyway, I hope it generates some ideas for you.
Commented:
Customer is moving to Sonicwall because they couldn't do things exactly the way they wanted.  Ugh.

Author

Commented:
Question no longer relevant

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial