Telnet Dropping

I have a problem with telnet dropping after 20 minutes of idle time to a vendor's mainframe.  The users will let the telnet connection stay open (idle) for 20 minutes.  Right at the 20 min point the Telnet session drops.

There is a point to point T1 from our router to their managed router.  So basically the workstation goes to a switch, hits the default gateway, through the firewall to the router where it travels across the T1 to their managed router and on to their mainframe.

Here are the details:
•  Users are using telnet via Rumba, etc
•  The telnet apps have a keep alive set
•  Windows XP workstations has had a registry changed to allow idle times of up to 2 hours
•  Windows Firewall on the workstations are disabled
•  There is only 1 router with access lists and telnet is not restricted
•  The firewall shows the connection as NOT being blocked
•  Called another vendor (expert) on our firewall, same results after they tested (not blocked)
•  Reduced hops from that bld to comm room from 4 to 2
•  Network was replaced with  1 GBPS switches with a 1 GBPS uplink (fiber) to the comm room

The only setting in the firewall that can potentially disconnect an idle telnet session after 10 minutes is an aggressive aging timer placed on the telnet protocol that marks a connection for deletion after 10 minutes of idle time.  If the firewall has exceeded its memory or connection threshold it will drop those connections marked for deletion.  Our firewall has been operating well under its thresholds.

So what can I be missing?  I am hoping this is something so simple stupid that I have overlooked looking for more complex causes.
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

connection timeout set for your firewall connection?
does a direct telnet to this device from its own subnet yield the same results?

captclamAuthor Commented:
There are connection timeouts; however they are set to 2 hours.  Also the applications are sending a keep-alive heart beat that is set around 10 minutes.  There is no reason the firewall would drop this connection other than the aggressive aging timer.

I don't have the option of testing from the same subnet, the mainframe is owned by a vendor and offsite.  However they have said that other clients don't have this problem using similar methods.  I am starting to think this is a mainframe timer issue.  It seems that if my users login and don't use specific mainframe apps then the connection doesn't drop after 20 minutes, it stays connected to their mainframe, but if they go into certain apps (within the mainframe) using the same telnet session it will drop 20 minutes after going in the apps.  So if you are like me you immediately think it is a mainframe security setting or timer, however I think they used the same user profiles, applications, etc before when it was changed.  So I am trying to sort out any possibilities.
My best guess is you have the firewall timing out a connection that appears to be idle.
OR wait... are any of your routers performing NAT?
Any type of address translation or NAT requires a translations table.

Some types of NAT will result in timeouts if the connection is idle.
You think the application is sending keepalives of some type, but have you actually sniffed the connection or otherwise verified that this is actually occuring, that keepalives are being sent, and they are traversing all the way from client to server  (or server to client)?

Are you sure keepalives are sent by the app [or client], often enough, even when it's idle?

Some types of keepalive schemes only work if both ends of the connection support and have the functionality enabled.
Some telnet clients might not support the keepalive options.

Some operating systems may not even support the BSD socket level SO_KEEPALIVE option,  if that is what you are referring to.
Or more likely -- by default, a keepalive is not sent until a socket has been idle for at least 2 hours -- see

What type of keepalives are being used,  Network/SOCKET level keepalives,  or  keepalive messages within the telnet protocol itself?

I would consider the possibility, that if a Socket level keepalive is being used, the firewall may actually be ignoring those.

Also,  perhaps some applications support the keepalive facility and some do not.

I would suggest inserting a packet sniffer such as Wireshark and verifying that there actually is keepalive traffic, using a mix of connections, even when a session is idle.

captclamAuthor Commented:
I think I figured it out.  First you have to know that the process changed.  We had a SNA server onsite that connected to a mainframe via managed routers over PPP T1.  The process was changed so the workstations went directly to the mainframe cutting the old SNA server.  In making that change the workstations now added the Security Gateway to the path.  The old SNA server was connected directly to the managed router bypassing the firewall; however the workstations go through the default gateway which sends them through the firewall.

Problem #1 Telnet Display Sessions (this makes no sense to me)
When connected to the SNA server the users could leave sessions idle indefinitely, and it would stay connected with no issues.  Now going over IP through the firewall I had to alter 3 registry settings to send a keep-alive TCP packet every 5 minutes for 5 sec.  It would attempt this 10 times before ending the session.  This seems to have fixed the display sessions, and was timing out at exactly 20 minutes from the point it went idle.

Problem #2 Printer Sessions Dropping
When connected the printers sessions would drop after an undetermined amount of idle time.  As it turns out I had to go into the Security Gateway and create my own Telnet TCP service.  As I created the new TCP service I removed any timers on that protocol.  Then I checked NAT and the global properties for any additional timers.  There were none.  After these changes were committed the issue stopped occurring.  The frustrating part of this problem is the inconsistent timing of session disconnects.  Even more frustrating was the Security Gateway reporting no issues even when being debugged.  Neither myself nor product support specialist were able to determine that timers were affecting Telnet to begin with.

Now I am not sure if Problem #1 is not also resolved by the new Telnet service that I created.  I have yet to test that.  When I do I’ll post the result.  The other interesting fact is that on one station that does both the printer sessions and display session and once I had the registry settings in place the telnet display sessions worked, but the printer (telnet) sessions would still time out.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Networking Hardware-Other

From novice to tech pro — start learning today.