We have a site to site VPN tunnel which has been performing well for 4 years. We are seeing increased traffic this week and are seeing select devices unable to reliably access the tunnel for periods of several minutes to several hours while other devices are able to connect across the tunnel.
The VPN tunnel is used to access a terminal server in a remote site using handheld computers running Windows CE. We typcially have 12 devices deployed. Currently we have 18 devices deployed for a 2 week project.
We are seeing that during peak times (more users connected to the RDP server) select devices will be unable to connect. Pings from the affected device will range from 100% loss to 0%. The ping failure rate fluctuates. Users may sometimes connect to the RDP server for a few minutes before being disconnected again.
This problem seems to last between 10 - 120 minutes.
I have taken packet captures at the ASA and see that both ICMP and RDP packets are arriving on the inside interface - the portable computer having the problem is transmitting correctly.
My problem is how do I ensure the ASA is encapsulating these packets and sending them out the Outside interface reliably. I have taken packet captures on the outside interface but do not know of a way to match these encapsulated packets up to those originating from the problem computer.
I have reviewed: Show crypto ipsec sa
#pkts encaps: 9228711, #pkts encrypt: 9228711, #pkts digest: 9228711
#pkts decaps: 9482440, #pkts decrypt: 9360557, #pkts verify: 9360557
#pkts compressed: 0, #pkts decompressed: 0
#pkts not compressed: 9228711, #pkts comp failed: 0, #pkts decomp failed: 0
#pre-frag successes: 0, #pre-frag failures: 0, #fragments created: 0
#PMTUs sent: 0, #PMTUs rcvd: 0, #decapsulated frgs needing reassembly: 0
#TFC rcvd: 0, #TFC sent: 0
#Valid ICMP Errors rcvd: 0, #Invalid ICMP Errors rcvd: 0
#send errors: 0, #recv errors: 121883
I have tried debug crypto ipsec 1 and did not see any errors.
The uptime on the device is 4 years. The tunnel has been configured since the firewall was installed (keepalives=yes).
I have asked for packet captures off the peer firewall. They are not receiving the packets for the failed pings. I cannot imagine this could be ISP related as other devices are working without problems. I am also confused as to why select computers experience this issue and then it clears up.
Here is the information for a failed packet:
Frame 2892: 78 bytes on wire (624 bits), 78 bytes captured (624 bits)
Encapsulation type: Ethernet (1)
Arrival Time: Sep 20, 2019 15:05:35.939969000 Eastern Daylight Time
[Time shift for this packet: 0.000000000 seconds]
Epoch Time: 1569006335.939969000 seconds
[Time delta from previous captured frame: 0.039549000 seconds]
[Time delta from previous displayed frame: 0.039549000 seconds]
[Time since reference or first frame: 768.823551000 seconds]
Frame Number: 2892
Frame Length: 78 bytes (624 bits)
Capture Length: 78 bytes (624 bits)
[Frame is marked: False]
[Frame is ignored: False]
[Protocols in frame: eth:ethertype:vlan:ethertype:ip:icmp:data]
[Coloring Rule Name: ICMP]
[Coloring Rule String: icmp || icmpv6]
Internet Protocol Version 4, Src: 192.168.1.204, Dst: 10.0.0.71
0100 .... = Version: 4
.... 0101 = Header Length: 20 bytes (5)
Differentiated Services Field: 0x1c (DSCP: Unknown, ECN: Not-ECT)
Total Length: 60
Identification: 0x6dcc (28108)
Time to live: 32
Protocol: ICMP (1)
Header checksum: 0x55ec [validation disabled]
[Header checksum status: Unverified]
Internet Control Message Protocol
Type: 8 (Echo (ping) request)
Checksum: 0x3759 [correct]
[Checksum Status: Good]
Identifier (BE): 512 (0x0200)
Identifier (LE): 2 (0x0002)
Sequence number (BE): 5123 (0x1403)
Sequence number (LE): 788 (0x0314)
[Response frame: 2894]
Data (32 bytes)
I did not notice any difference between the ICMP packets from computers which did not have the issue. There does not seem to be any pattern to which computers experience the issue. It seems to be computers that were idle and try to connect during a period of high use.
Any advise on how to obtain additional information regarding this problem would be appreciated.
I ran show memory and show cpu and was seeing under 50% utilization but I did not run it during the problem period today. I will try to check this next time the issue happens.