Solved

Stumped troubleshooting ipsec vpn

Posted on 2014-01-27
10
1,041 Views
Last Modified: 2014-01-28
I'm stumped troubleshooting this vpn connection.

Scope of this question is to get ping replies working end to end across an ipsec vpn tunnel (site to site).  currently, the tunnel connects, i can send a ping thru the tunnel from building1 to the datacenter, the ping is received at the datacenter, the datacenter host replies, but the reply never gets back to the building1 side that initiated the ping.

the setup:

building1:
cisco rvs4000 vpn router
lan: 10.100.1.0/24
wan static ip, we'll call it a.b.c.d
upstream gateway a.b.c.z  (cox cable)

vpn setup:
ipsec tunnel
local address a.b.c.d
local group subnet 10.100.1.0 / 24
remote group ip z.x.c.v  (ie: public ip of datacenter)
remote group subnet: 172.16.1.0 / 24
keying mode: ike with preshared key
phase 1 encryption: 3des
auth: md5
group: 1024bit (ie group2)
key life 28800 sec

phase 2 encryption 3des
auth md5
PFS disable
preshared key: password (or whatever, it matches the remote endpoint)
group: 1024bit
key lifetime 28800

datacenter setup:
paloalto pan-4050 router
wan:  vpn endpoint z.x.c.v (public ip)
lan: 172.16.1.0/24
upstream gateway:  z.x.c.z (datacenter core switch, expedient colo)
vpn setup identical to above

the vpn tunnel DOES connect, shows "up"

if i initiate a ping from 10.100.1.5 to 172.16.1.70... with wireshark running on both machines:
10.100.1.5 sends the packet to 10.100.1.254 (the cisco)
z.x.c.v (datacenter wan) receives the encapsulated packet and decrypts it, routes it to 172.16.1.70
on 172.16.1.70, wireshark sees the ping from 10.100.1.5
172.16.1.70 replies to 10.100.1.5
172.16.1.65 (paloalto inside interface) receives it and encapsulates it for a.b.c.d (building1 wan)
z.x.c.v does forward it to z.x.c.z (upstream device) as seen by port-mirroring the wan uplink
it never arrives at a.b.c.d (building1 wan, cisco rvs router)

if i initiate a ping from 172.16.1.70 destined for 10.100.1.5:
the intside interface of the paloalto receives it, packs it up and forwards it to the upstream gateway (as seen on the wire, port mirroring the wan uplink).
no traffic is received at building1

troubleshooting:  

i've had paloalto support in their device for a week, they've proved beyond all doubt that the traffic is being handled properly and being passed upstream correctly

i've replaced the router at building1 (changed from a netgear vpn router, to a cisco vpn router).  both the netgear, and the cisco, have the identical symptoms.  tunnel connects, traffic gets from building1 to the datacenter, but not back.

interesting point:
when i traceroute from building1 (10.100.1.5) to google (8.8.8.8) my first hop is as expected my internal gateway (10.100.1.254, the cisco).  but, the very next hop is 10.16.72.1 (14ms, assume not my cable modem).  the next hop after that is NOT a.b.c.z (upstream public gateway), it is something completely different (but still on cox network)

i've tried asking cox what the heck is 10.16.72.1 and to check my cable modem routing table to make sure it's correct... but the best they could do for me is tell me to reboot my cable modem and router.

the physical wan port of the cisco at building1, is directly connected to the one and only ethernet port on the cable modem.  nothing is in between.

so, i need ideas as to why the return traffic can't get back.
0
Comment
Question by:FocIS
  • 5
  • 4
10 Comments
 
LVL 68

Expert Comment

by:Qlemo
ID: 39813150
Compare the traceroute to each other public IPs. And how do you know a.b.c.d does not get traffic back? Did you use a WAN link mirror with WireShark here, too?

And make sure the paloalto device does not try to create another tunnel because of some parameter mismatch (local and remote subnets in VPN, for example).
0
 
LVL 2

Author Comment

by:FocIS
ID: 39813405
Thanks for the reply!
In the paloalto, the debug logs are very explicit, i'm certain it sends it out the correct tunnel.  The port mirror on the paloalto side does show esp packets going to the building1 wan

I can't port-mirror (yet) at the building1 side though i should be able to tomorrow.  

The symptoms are, as seen from 172.16.1.70, the pings are sent and sent and sent with no replies.

Similarly, as seen from 10.100.1.5, those packets are sent and sent and sent and never received replies, BUT i see matching pings on the destination of 172.16.1.70.   so i see the ping hit the destination, the destination replies, but the replies never hit back to 10.100.1.5

the traceroutes are similar but different:

from building1 to datacenter:
Tracing route to z.x.c.v over a maximum of 30 hops

  1     3 ms     1 ms     1 ms  10.100.1.254
  2    17 ms    11 ms    13 ms  10.16.72.1  <-- not sure what this is
  3    10 ms     9 ms     9 ms  ip98-173-132-214.cl.ri.cox.net [98.173.132.214] <-- not our ip or upstream gw
  4    12 ms    10 ms    11 ms  ip98-173-132-222.cl.ri.cox.net [98.173.132.222]
  5    65 ms    38 ms    49 ms  68.1.4.246
  6   109 ms   207 ms   222 ms  te6-3.ar3.DCA3.gblx.net [67.17.134.45]
  7    65 ms   107 ms    75 ms  CONTINENTAL-BROADBAND.Te6-2.ar5.CHI1.gblx.net [207.138.128.70]
  8    71 ms    67 ms    68 ms  te1-2.4006.cr2.350ec.chcgil.e-xpedient.com [216.130.11.134]
  9    78 ms    72 ms    73 ms  te1-4.4005.cr1.strlng.clevoh.e-xpedient.com [216.130.11.130]
 10    74 ms    69 ms    76 ms  te1-2.4002.cr2.strlng.clevoh.e-xpedient.com [216.130.12.134]
 11    67 ms    74 ms    74 ms  te2-7-1.4007.151-core.expedient.com [216.130.12.202]
 12    72 ms    67 ms    70 ms  z.x.c.v
Trace complete.


from datacenter:
Tracing route to a.b.c.d [68.99.x.x]
over a maximum of 30 hops:

  1    <1 ms    <1 ms    <1 ms  paloalto [172.16.1.65]
  2     1 ms    <1 ms    <1 ms  z.x.c.z  (upstream gateway)
  3    11 ms    11 ms    11 ms  te1-3.4007.cr2.strlng.clevoh.e-xpedient.com [216.130.12.201]
  4    11 ms    11 ms    11 ms  te1-2.4002.cr1.strlng.clevoh.e-xpedient.com [216.130.12.133]
  5    11 ms    11 ms    11 ms  te1-4.4005.cr2.350ec.chcgil.e-xpedient.com [216.130.11.129]
  6    11 ms    11 ms    11 ms  te1-2.4006.cr1.350ec.chcgil.e-xpedient.com [216.130.11.133]
  7    11 ms    11 ms    11 ms  te6-2.ar5.chi1.gblx.net [207.138.128.69]
  8    31 ms    31 ms    31 ms  cox-com.ethernet15-2.ar6.dal2.gblx.net [64.215.187.2]
  9    57 ms    76 ms   114 ms  clvdhdrj01-xe000.0.rd.cl.cox.net [68.1.1.94]
 10    57 ms    57 ms    57 ms  ip98-173-132-221.cl.ri.cox.net [98.173.132.221]
 11    57 ms    57 ms    56 ms  ip98-173-132-217.cl.ri.cox.net [98.173.132.217]
 12    70 ms    68 ms    73 ms  a.b.c.d [our static ip 68.99.x.x]
Trace complete.
0
 
LVL 68

Expert Comment

by:Qlemo
ID: 39813761
Ok, that tells us the packets should flow between both routers. ESP packets might get filtered on their way back, though. Really difficult to tell. Also, there could be a MTU mismatch leading to excess fragmentation - or requiring fragmentation, but not doing that. Depending on firmware bugs this might be an issue if the correct MTU is only a few bytes different (we often see issues with 1500 instead of 1492 bytes to use).
0
 
LVL 2

Author Comment

by:FocIS
ID: 39813974
Good catch qlemo, i was in a hurry and missed that ip address :)

The cisco at the building1 side was already 1492 mtu, but the datacenter side was 1500, so i just changed that to 1492.

having saved the changes, the tunnel remains fully connected but the esp packets of the "ping reply" don't appear to reach back to building1.

when we had a netgear vpn router at building1 last week, it has a built in packet scanner with download to wireshark for review - we could see the ping request leaving, but never saw the ping reply come back.  

i'll hook up a port mirror device at building1 on tuesday and see what can be seen (still have one in place at the datacenter)

happy to try any other ideas at all, and to provide more info if it helps
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 2

Author Comment

by:FocIS
ID: 39813978
i wanted to mention some more aspects:

when i initiate a ping from 172.16.1.70 directly to 10.100.1.5, the route is there, the datacenter router passes esp packets upstream (that's as far as we can track it) but the "ping request" never makes it to 10.100.1.5 (as viewed on the nic of 10.100.1.5)

also, we have an identical cisco rvs4000 router on another cox cable modem in the building (different account, different static ip) with identical vpn settings (tailored to the different static cox ip address), and THAT tunnel connects, and pings pass in both directions

i've requested that the datacenter isp capture our packets on our upstream gateway but their answer was (rightfully so) "no way, that's a core switch" - all i can assume is since the packets left our rack in good health, they should be leaving the building too.

what do you think about the "odd" 2nd hop leaving building1?  is that some sort of internal vpn between cleveland and rhode island (cl.ri.cox.net gets from cleveland to RI for the cox pop).  i wonder if it is a cox tunnel, if that's double/tripple wrapping the packets and killing the crypto (yet it works on the way out from building1 to the datacenter)
0
 
LVL 68

Accepted Solution

by:
Qlemo earned 500 total points
ID: 39814416
I don't think the 2nd hop is wrapping, just routing. If it did anything, the tunnel would not come up.
Regarding the Netgear packet capture - did you see both unencrpyted and encrypted traffic, or only the unencrypted one? Because I still think something with the VPN settings is not correct. Maybe the other tunnel is getting the reply traffic in error?
0
 
LVL 2

Author Comment

by:FocIS
ID: 39815197
Good thought with the two tunnels - i've just deleted the second tunnel (which worked, but with the "wrong" network).

I'll post some sanitized screenshots of the settings here
building1.png
datacenter.png
0
 
LVL 2

Author Closing Comment

by:FocIS
ID: 39815220
oh wow, i think that actually did it - there was some confusion between "tunnel.2" and "tunnel.3" - when i went to delete tunnel.3 at your suggestion, i noticed the wrong private ip block in tunnel.2

pings in both directions are finally replying all the way thru the tunnel now!
0
 
LVL 68

Expert Comment

by:Qlemo
ID: 39815364
Great! The private networks exchanged in IPSec are often used to map traffic to tunnel. Though that should be not necessary for reply traffic - a stateful firewall should bypass any rules, as long as the corresponding session exists.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Suggested Solutions

The Cisco RV042 router is a popular small network interfacing device that is often used as an internet gateway. Network administrators need to get at the management interface to make settings, change passwords, etc. This access is generally done usi…
If you use NetMotion Mobility on your PC and plan to upgrade to Windows 10, it may not work unless you take these steps.
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now