Link to home
Start Free TrialLog in
Avatar of Mr Anderson
Mr Anderson

asked on

Windows routing

Hello.
I have a wierd case routing traffic over a site2site VPN connection.

On site A I have 10.10.61.0/24 with a pfsense firewall/openvpn client connectiong to site b 10.10.60.0/24 running OpenVPN Access server.
Site A and B can communicate both ways as intended.
One server (windows 2012 r2) on site b needs to have a different gateway than the openvpn access server, so I have added a route for traffic to site a (route add -p 10.10.61.0 mask 255.255.255.0 10.10.60.1) where 10.10.60.1 is the openvpn access server.
When this route is added I can ping site b from site a, but not site a from site b, however if I let ping from site b to site a continue while I try ping site a from site b it works!?
As a test, I have also tested change default gw at the server, and then everything works.
Avatar of Dan Craciun
Dan Craciun
Flag of Romania image

>>needs to have a different gateway

Is that by any chance on the 10.10.60.0/24 network? That would explain the behavior. While the ping is running, all traffic between the server and the machine in the site a will go through 10.10.60.1.

But not by default. If you ping from site b to site a it goes through 10.10.60.1, but the reply comes back from the other gateway, so it's dropped.
Do a port mirror on the pfsense on both sides and capture the traffic, to test.

HTH,
Dan
Avatar of Mr Anderson
Mr Anderson

ASKER

Yes, the windows server have ip 10.10.60.13, with default gw 10.10.60.2, and i want all 10.10.61.0/24 traffic tough the vpn server at 10.10.60.1
OK. Change the default route (10.10.60.2) to be from another class and it will work as you want.
Ok, but can i use two ip adresses on same nic? 10.10.60.13 as secondary ip without gw, and a new ip ex. 10.10.62.13 with gw 10.10.62.1 for other traffic?
In theory yes. You can have a secondary IP on an interface from another class.

Just of curiosity: if it's a server, it has at least 2 physical NICs. Why don't you simply use the second one?
Its a VM, so I can add a nic, was just not sure if it was needed.
To test your teori i changed IP on server to 192.168.60.13 with gw 192.168.60.1 and added secondary ip 10.10.60.13 with route for 10.10.61.0/24 as before. Same result.
OK, then it's time for tcpdump/Wireshark. Check how the ICMP replies return.
Also tested with 2 nics. Same problem
Have you the ability to configure a VPN Keep-Alive function at each end of the tunnel?  If you can, this will maintain an active tunnel and your problem should disappear.
VPN tunnel is active and working for all with default gw to the openvpn access server. So dont think thats the problem.
Going on your description, your routes must be right, as you would never be able to get pings going in both directions, if they were not.  The only thing that I can think is happening is: if the VPN drops, it is easily re-established when pinging in one direction, but not in the other direction.  This suggests to me that the issue is note route related, but an issue with tunnel establishment in one direction.  Forcing the tunnel to remain up despite there being no traffic traversing it should resolve such a problem.  If you can't do this, switching the VPN initiation from standard mode to aggressive mode at the end with the issue may help.

If this issue turns out not to be related to the status of the tunnel, then the only thing I can think of is: Your cached routing table at the problem end is faulting, but this is corrected when receiving traffic in the reverse direction.  Though not impossible, this would be unusual.
The VPN tunnel is up all the time, as many other users use this connection all the time, both ways.
Its only this one server who doesnt have the same default gw that doesnt work.
If i change default gw it works in a second, changes it back it doesnt work
"Your cached routing table at the problem end is faulting, but this is corrected when receiving traffic in the reverse direction.  Though not impossible, this would be unusual"

A reboot of the server should fix this?
Nope. It's the table in the router, not the server.
BTW, what does your ARP table look like on the pfsense?
Its only pfsense on site a. In the arp table there i can se 44 10.10.61.x adresses and 2 adresses on the wan interface
Switching default gw on windows server allways work right away, this make me belive the problem is on the windows server?
A reboot shouldn't make a difference, but by all means try it.

Examine the routing table on the server at the problem end.  If you can examine the routing table on the gateway also.   Dan may have a point!
Screenshot from wireshark at windows server. Ping from 10.10.61.10 to 10.10.60.13 shows no result. Starting to ping 10.10.61.10 from 10.10.60.13 while still trying from 10.10.61.10 to 10.10.60.13 this is the result from wireshark
User generated image
I have tried many reboots, it was just if we then could eliminate cache.
The following command will flush the arp cache

netsh interface ip delete arpcache
no help clearing arpcache.
Here you can se no reply at step 1, starting to ping from the other server at step 2 makes both servers reply at step 3
While ping reply both directions, other traffic ex. rdp just work one way

User generated image
Installed a new test win2008 server, same problem.
Have also testet OpenVPN Access Server both in Nat (default) and routing mode
Would it be possible to ask you to produce a diagram of this, just so I'm sure the picture I have in my head is correct?
Here is a simple diagram, think you get the idea.
On server 3 i have added route for 10.10.61.0/24 to 10.10.60.1 so it will be able to use the site2site connection. Its not possible to merge 10.10.60.1 and 10.10.60.2 who would have been the easy way - because everything work if i change server 3 gw to 10.10.60.1.

So in this setup server 3 can allwats ping server 1, but server 1 cannot ping server 3 unntil server 3 ping server 1.User generated image
I think your problem is the OpenVPN machine.

When you ping server1 from server3, the first thing server 3 does is fire an ARP request for 10.10.60.1, so it knows where to send the ping.
The result is that in the Debian machine's ARP table the correct MAC is assigned to the IP 10.10.60.13 and all works well.

The problem is that the ARP table entries expire, by design. After 5-30 seconds with no packets from 10.10.60.13, the entry is deleted.
When 10.10.60.1 next receives an ICMP request for 10.10.60.13, it checks its ARP table, no mention of 10.10.60.13 there so it replies with no host found with that IP.

It's the client's job to send ARP requests to the router and keep the ARP table populated, not the router's.

You could leave a ping running every 10 seconds to 10.10.60.1 in server3 (or a ping to 10.10.60.1 from the OpenVPN machine) to keep the ARP entry populated correctly. Or setup a static ARP entry in the OpenVPN Debian machine.
Ok.  I've been back through your posts while looking at the diagram.  I have one question:  You have drawn one line from site a that splits into 2 and goes to both firewalls at site b.  I assume this is representative of sites access to the Internet.  But do both firewalls exit to the Internet via the same router and circuit?
Hi. Yes, but with different public ip
Ok.  Just to counter other suggestions on this and get things straight in my head.  You've made the statement that all devices at site a can communicate successfully with devices at site b when their gateways are set to 10.10.61.1 and 10.10.60.1 respectively and the same is true in the reverse direction.  If this is true then the issue cannot reside on any other device other than server 3.  Do you agree so far?
Yes! Thats why im lost 😉
Ok.  And you can ping consistently from server 1 on site a to server 3 on site b without interruption, but you cannot do the reverse unless there are active ICMP packets coming in the other direction?
Just for fun, run this in the Debian router/server, while you can ping from site 1 to site 2 and when you cannot:
sudo ip -s neighbor list

It should show you all known hosts.
I will test it tomorrow. Im not in the office today
Can you post a screenshot of the complete routing table of Server 3?
Here is the route print for 10.10.60.13

User generated image
ip -s neighbor list (while i can not ping 10.10.60.13)
10.10.60.12 dev eth1 lladdr 62:1b:02:ce:05:a1 ref 10 used 7/7/7 probes 1 REACHABLE
10.10.60.13 dev eth1 lladdr 76:2d:7a:cf:ea:ff ref 3 used 4/51/4 probes 1 DELAY
10.10.60.254 dev eth1 lladdr 72:5f:bb:80:a8:99 ref 6 used 6/6/6 probes 1 REACHABLE
10.10.60.10 dev eth1 lladdr 02:4b:5c:78:9e:95 ref 69 used 0/29/3 probes 1 DELAY
10.10.60.11 dev eth1 lladdr 26:73:9c:4e:62:12 ref 8 used 9/4/4 probes 1 REACHABLE
x.x.x.x dev eth2 lladdr 00:00:0c:9f:f0:00 ref 107 used 83/0/83 probes 1 REACHABLE (public ip)

ip -s neighbor list (while i can ping 10.10.60.13)
10.10.60.12 dev eth1 lladdr 62:1b:02:ce:05:a1 ref 10 used 20/20/20 probes 1 REACHABLE
10.10.60.13 dev eth1 lladdr 76:2d:7a:cf:ea:ff ref 4 used 4/3/3 probes 1 REACHABLE
10.10.60.254 dev eth1 lladdr 72:5f:bb:80:a8:99 ref 6 used 20/19/19 probes 1 REACHABLE
10.10.60.10 dev eth1 lladdr 02:4b:5c:78:9e:95 ref 74 used 10/9/9 probes 1 REACHABLE
10.10.60.11 dev eth1 lladdr 26:73:9c:4e:62:12 ref 9 used 11/9/9 probes 1 REACHABLE
x.x.x.x dev eth2 lladdr 00:00:0c:9f:f0:00 ref 118 used 2/0/2 probes 1 REACHABLE (public ip)
Can you explain why the default gateway is set to 192.178.60.1?
10.10.60.13 dev eth1 lladdr 76:2d:7a:cf:ea:ff ref 3 used 4/51/4 probes 1 DELAY

That means arp request scheduled, but no reply yet.

Can you run this on the Debian router while the ping is not working?
arping -i eth1 10.10.60.13
David Needham: it was to test default gw on another subnett than the vpn gw
Dan Craciun: just a moment
Here is output on arping, remember this is on the sam LAN (debian fw/gw and 10.10.60.13).

arping -i eth1 10.10.60.13
ARPING 10.10.60.13
60 bytes from 76:2d:7a:cf:ea:ff (10.10.60.13): index=0 time=810.207 usec
60 bytes from 76:2d:7a:cf:ea:ff (10.10.60.13): index=1 time=353.597 usec
60 bytes from 76:2d:7a:cf:ea:ff (10.10.60.13): index=2 time=367.540 usec
60 bytes from 76:2d:7a:cf:ea:ff (10.10.60.13): index=3 time=298.054 usec
SOLUTION
Avatar of Dan Craciun
Dan Craciun
Flag of Romania image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ping 10.10.60.13 from debian gw doesnt help, stil no reply from server at 10.10.61.x
I still do not believe that this is anything to do with the firewall.  If it was, then you would have the same issue with any device on network b pinging back to a device on network a.  As I've understood it, this is not the case.  Can you please confirm?
yes, 10.10.60.13 can allways reach 10.10.61.x.
10.10.61.x can ping 10.10.60.13, when 10.10.60.13 ping the device thats ping itself.

While ping works (-t) still no other traffic (for example rdp) works from 10.10.61.x to 10.10.60.13, but the other way it works just fine.
What if you try and ping from 10.10.61.10 to 10.10.60.10 and vice versa.  Do you find the same scenario, or slightly different?
As 10.10.60.10 have default gw to 10.10.60.1 it always works, likes it does with 10.10.60.13 if i change default gw to 10.10.60.1 (not just having a manual route for 10.10.61.0/24)
Ok.  Can you explain the reasons why 10.10.60.13 needs to exit via 10.10.60.2?  Does it have a public facing service running on it?
Yes. Public services and to separate some services.
Ok.  Two thoughts occur to me.  

I can't say that I've ever tried this, but have you tried removing the default gateway defined in the NIC and placing a static route with a metric for 0.0.0.0 0.0.0.0?

If you have and you get no joy, or it does not resolve the problem, have you considered multi-homing server 3 ( 2 nics ) defining one on the 10.10.60.x network and creating a separate network for the link to the Debian firewall ( obviously reconfiguring the Lan port of the firewall along the way ), then if necessary enter required static routes?
Yes, tried removing default gw, and as you suggest set 0.0.0.0 route with higher metric, but it gives same result as having default gw.

Actually server 10.10.60.13 has 2 nic now for test, but same result there to.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Just by running "route print", as screendump earlier to day.

Persistent Routes:
  Network Address          Netmask  Gateway Address  Metric
       10.10.61.0    255.255.255.0       10.10.60.1       1
          0.0.0.0          0.0.0.0     192.168.60.1  Default

Where 192.168.60.13 is interface for all other traffic.
On debian fw/gw openvpn access server create this network adapter with adresses 172.27.224.1, could it be that windows server must have a route for this to?User generated image
Ok.  We're clearly missing the issue and going around in circles.  I have a solution, but looking at your setup, it may not be an option because of budget restraints.  Here goes though:

I have seen similar setups in schools that I've been asked to go in and help.  The first thing I have done is replace the firewall/s with dedicated firewall appliance/s.  In your case if this was an option, you could consolidate the services running on the OpenVPN and Firewall servers on one site and replace the pfsense server at the other.  You would then have only one gateway to each network.  A vpn could be setup between the two firewall appliances and each firewall could be configured to replicate what you have now.  

I did pretty much exactly this for a school in Cheshire about 18 months ago.  Their budget was tight, but I managed to pick up 2 decent second user Sonicwall's very cheaply off eBay.  There are a couple of pitfalls to avoid, if you're considering going down this route, but I'll willingly advise on how to avoid these.

I appreciate that this won't be the response that you're looking for, but I assure you that such a solution will work for you.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Well I'm glad that you've solved it, but I'm not sure why this would be.
You both have given good help, how should i accept solution to do it the right way? Thanks for all inputs!
David, not sure my self...
I'm not sure that either of us helped you really! :-)
Well its helpfull to get anothers inputs :)
If you're accessing EE through the full site and not an App on a mobile device, you should have an option to accept one of the posts as a solution and another as an assisted solution.  I'm not sure that EE will let you do it, but your own post should go down as the Accepted Solution! :-)
>>route add -p 172.0.0.0 mask 255.0.0.0 10.10.60.1

Careful about that. 172.0.0.0/8 is not private. 172.16.0.0/12 is.

route add -p 172.16.0.0 mask 255.240.0.0 10.10.60.1
Agree, updated :)
My own research solved it