VPN troubleshooting

I'm troubleshooting a VPN connection using checkpoint.  I keep getting dropped out from time to time.  Here are the logs from the checkpoint firewall
[28 Oct 7:13:54] IKE tunnel disconnected, error code=-1000. Reason: Site is not responding.
[28 Oct 7:13:54] Client state is connected
[28 Oct 7:13:54] Tunnel (2) disconnected. State is connected. Trying to reconnect.
[28 Oct 7:14:22] IKE connection failed, error code=-1000. Reason: Site is not responding.
[28 Oct 7:14:22] Client state is reconnecting
[28 Oct 7:14:22] Reconnect failed. trying again (2)
[28 Oct 7:15:20] IKE connection failed, error code=-1000. Reason: Site is not responding.
[28 Oct 7:15:20] Client state is reconnecting
[28 Oct 7:15:20] Reconnect failed. trying again (2)
[28 Oct 7:16:05] IKE connection failed, error code=-1000. Reason: Site is not responding.
[28 Oct 7:16:05] Client state is reconnecting
[28 Oct 7:16:05] Reconnect failed. trying again (2)
[28 Oct 7:16:23] IKE connection failed, error code=-1000. Reason: Site is not responding.
[28 Oct 7:16:23] Client state is reconnecting
[28 Oct 7:16:23] Reconnect failed. trying again (2)
[28 Oct 7:17:02] IKE connection failed, error code=-1000. Reason: Site is not responding.
[28 Oct 7:17:02] Client state is reconnecting
[28 Oct 7:17:02] Reconnect failed. trying again (2)

Open in new window

Can someone get me started on the troubleshooting?  What is happening and how can I fix?
Thanks
Ted JamesAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

JohnBusiness Consultant (Owner)Commented:
IKE connection failed, error code=-1000. Reason: Site is not responding.

That is the beginning.  You are not getting to Phase 1 or Phase 2.

Is this Site to Site or Client to Site?

Make sure Internet Connections are very stable, that each end know the External IP of the other end, and that subnets are different for each end. Mode should be MAIN for site to site, not Aggressive.
Ted JamesAuthor Commented:
Is the problem on the other end?  I don't have a view into the other side.
JohnBusiness Consultant (Owner)Commented:
It could be, based on the message. You are not getting a connection, or solid if it connects.

Can someone at the other end reset the Modem and reset the VPN box?
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

bbaoIT ConsultantCommented:
it means the VPN service at the other end is not reached. technically, its VPN port(s) are not reached. it could be an IP routing issue, port forwarding issue, or firewall policy issue.

basically, double check both sides. it could an issue of your side as well, e.g. given a wrong host name or IP address of the VPN server.
Ted JamesAuthor Commented:
Unfortunately there are several "other ends".  This local Checkpoint FW is terminating about 18 VPNs.  Eight of them are having this problem.  I think it is my end, even though that error message says that the other end is not responding.  That is throwing me off.
Eight of these VPNs are exhibiting this flapping while the other ten are solid.  Thoughts?
JohnBusiness Consultant (Owner)Commented:
In a hardware VPN box, each other end should terminate in its own end on your box.

Try deleting one or two problem profiles on your box, reset your modem and router and then rebuild the problem profiles to see if they work.  I have done this before.
Ted JamesAuthor Commented:
In each case they are able to stand up the VPN.  But then they get dropped.  Happens several times.  Started happening last week.
JohnBusiness Consultant (Owner)Commented:
If some tunnels work solidly and some do not, it seems unlikely to be the box. Possible but not likely. That is why I suggest rebuilding profiles.  Resetting the modem is just something I routinely do in these circumstances.

You may want to plan to replace the VPN box at your end and make sure the replacement has enough overall VPN throughput.
JohnBusiness Consultant (Owner)Commented:
Also make sure the VPN box at the problem end is compatible with your main office VPN.

DO upgrade firmware on ALL these machines. Big job for 19 VPN boxes but you need to do this and that could help you solve this issue.
bbaoIT ConsultantCommented:
This local Checkpoint FW is terminating about 18 VPNs.  Eight of them are having this problem.

if you have a local firewall in place, then its firewall logs and settings should be reviewed.

a simple way is to try establishing a VPN connection then immediately check the logs, or better enable real-time connection monitor if possible.

if you see "denied" connection(s) for your outgoing VPN connection, then that must be one of the reasons caused the problem.
Ted JamesAuthor Commented:
All very good ideas.  Some of which I will have to schedule "downtime" as the users are 24/7.

Couple other thoughts:
1.  Could the firewall licenses (encryption licenses or Firewall license) be expiring?  Causing the tunnels to go down for a couple minutes?

2.  Though I haven't verified it, some claimed it happens at roughly the same time as each other,  I haven't verified but maybe it reflects exceeding a certain limit? Or a throughput issue?.

3. Not a thought but another impediment to my troubleshooting...  My access to my CP SmartConsole is now being rejected.  When I first logged on yesterday (first time ever) I had to "verify" a fingerprint.  Ignorantly I said "yes".  Apparently I must have been wrong I guess because now I can't get authenticated to get back in.  Is there something I am missing or more I need to do, or I am just fat-fingering it?
JohnBusiness Consultant (Owner)Commented:
1. I think a firewall would either work or not work.

2. At the same time might point to throughput or congestion, but that would affect random points
JohnBusiness Consultant (Owner)Commented:
I also want to underscore the point about updating firmware. I think that may wind up being very helpful
Ted JamesAuthor Commented:
For the firewall firmware, yes?
It makes sense since the firewalls haven't been touched in over a year, and problems are only surfacing now.
JohnBusiness Consultant (Owner)Commented:
Yes to firmware.
Time of dropouts random and failures limited to specific sites and not others.
arnoldCommented:
Confirm client/server settings dealing with key lifetimes
Sounds like a mismatch, that the client has a lower number,
I.e key lifetime in the server side is 8 hours as an example, while on the client it is 6. When the client approaches the six hour limit, it tries to reconfirm the link, the server having two additional hours for the session ignores the renewal requests.


Your log only deals with when the drop occurs. You need to look at the event when the VPN connection successfully connected.
Then look how long before phase 2 tries to renegotiate.

Presumably if you forcefully disconnect the VPN, and then initiate the connection, the VPN will setup.
Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
If the lifetimes are different, no forced reconnect is available - the delete message would not be sent to the other site (assuming it is a Phase 1 issue, and it looks like so).
But I agree different lifetimes could be the reason.
Sadly, details about a failing connection are usually only available at the receiving site (the responder), not the initiator. Hence debugging often requires to see both sides of the connection at the same time, if necessary using e.g. TeamViewer to the remote for getting access to the gateway.
arnoldCommented:
Qlemo, the forced (disconnect of your own end) not sure where you hot forced reconnect.
I.e. Commonly phase 1 is still connected during the attempt to confirm phase 2 can still continue.

Once you tell your side to bring down the VPN, the VPN on the other side gets dropped. A new connection will setup.
Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
arnold, you are correct if P2 is the issue. But it looks like P1 is failing here (because of "IKE").
Ted JamesAuthor Commented:
Can anyone point me to a good detailed troubleshooting guide of IPSEC that is not geared to a specific product.  Cisco et al has VPN troubleshooting guides but it is geared towards specific commands and logs specific to the product.  I'd like a generic troubleshooting list.

(Many of my endpoints (far end) are not Checkpoint, I don't even know for sure what endpoint they have)

thx
JohnBusiness Consultant (Owner)Commented:
Here is a somewhat generic document to help troubleshooting the Logs produced when trying to connect.

https://campus.barracuda.com/product/cloudgenfirewall/doc/73719167/ipsec-ikev1-log-messages-and-troubleshooting

Google for:   ipsec VPN generic troubleshooting guides

Pick ones you like.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Ted JamesAuthor Commented:
Fixed my CP SmartConsole issues.  Turns out my permissions were not upgraded.

So all good suggestions.  Thank you!
In summary, some things to look at:
1.  Firmware upgrade (at a later time during scheduled maintenance)
2.  Check for inconsistent key lifetimes between both ends.
3.  Look at logs at other end (both sides view).  Going to be difficult because the other end person would probably not be technical enough and I don't own that termination point.
4.  Reconstruct profiles. (John can you be more specific?  Is this user profiles?  tunnel endpoint profiles? What are we talking about? etc.)

Also, what about the possibility of the far end being on a wireless network?

Thanks in advance.  We are meeting tomorrow to discuss our strategy.
JohnBusiness Consultant (Owner)Commented:
Reconstruct profiles...…  tunnel endpoint profiles?

Yes. Tunnel Endpoint profiles.

Also, what about the possibility of the far end being on a wireless network?

So long as the base wireless is a static IP in the network and handing out wireless addresses on the network, then that should not be a problem.
Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
Good find, John, that Barracuda manual explains common log messages well enough for most devices.
Ted JamesAuthor Commented:
Thank you all.  We are scheduling a firmware in the next couple weeks.
It's due for one anyway.
Ted JamesAuthor Commented:
Thank you all.  Very helpful.  I'm not very familiar with the scoring system so I hope I didn't slight anybody.  I'll reach back when we complete the upgrade.
JohnBusiness Consultant (Owner)Commented:
You are very welcome and I was happy to assist you.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Troubleshooting

From novice to tech pro — start learning today.