Solved

Telnet connection over the internet drops after idle

Posted on 2006-11-05
36
7,442 Views
Last Modified: 2013-11-29
11/5/06
8:10pm

I am having a lot of trouble with Telnet sessions over the internet. After 20 minutes of idle time the connection is often dropped. This is a major problem because of lost data entry productivity for many users. The users are at several locations with a PIX vpn to corporate. All are vpn over DSL. Previously this happened occasionally, now it happens very frequently. Now users at 3 of the 4 branch locations are dropping several times a day, it happens after 20 min. or so of idle time.

I have tried doing some diagnostics, testing, research, etc.
Testing results:
Telnet connections are very reliable on a LAN or a decent point-to-point WAN (such as fractional T1), idle time is not a problem, connections will stay for 2+ hours at least. The internet connections are un-predictable - anywhere from 20 min. to 2.5 hours of idle time will cause a disconnect. It is variable, some times the connection will drop 3 times in a row after 30 min. idle, then it will stay connected 3 times in a row with 30 min. idle. The testing was done over the internet with and without vpn involvement with no noticeable difference - frequent and un-predictable drops after 30+ min. of idle time with variable patterns. Pinging test with 10,000 cycles showed no relation to the drops, pings were 99% successful with occasional slow times but a decent average (times varied from 40ms to 1400ms with an average of about 70ms). Packet tracing did not show anything significant (not to me anyway) except:
after a certain amount of idle time a keystroke (usually the letter "c" but anything causes the same result) from the client causes a RST from the server. Then the client is disconnected from the session. netstat shows the session is gone on the client but netstat on the server shows the connection "established".

I have duplicated these results consistently on several LANS using different DSL providers for both client and server, with and without vpn's, using different Telnet client software and different server software.

Any suggestions would be appreciated.

George
0
Comment
Question by:George46227
  • 19
  • 9
  • 5
  • +1
36 Comments
 
LVL 6

Accepted Solution

by:
bmedward earned 300 total points
ID: 17878780
Do you have a similar telnet session drops with clients when some kind of 'keep-alive' is being used?  Also, are all of the clients on wired networks or are the remote LAN’s wirelesses?

As I am sure that you can attest to, telnet sessions are inherently different from pings - each ping only lasts as long as is needed to reach the destination and return back.  The telnet session has to have good connectivity from when the session is originated until it is closed.  Not all telnet clients are equal in terms of being able to handle errors - Wavelink (www.wavelink.com) has a good enterprise level client (demo available) for most platforms.  For some situations, Wavelink (and others) offer proxy session controllers or gateway applications that are much more tolerant of network errors.  

On the network traces, were you able to determine if the packet was being re-sent (from either the client or server) prior to the RST?  Keep in mind that the reset will usually be generated when the connection tries to re-establish even though the failure happened much earlier.  Also, have you compared traces of the same telnet session from both the client's side and the server's?  The re-try should be easily identifiable through ethereal (ethereal.com) or other network trace utilities.  There will be a sequence of (usually around 6) re-transmits of the same packet, and the time between re-transmits will double with each one.  

Is there any chance of getting the application to operate through a web interface or other transaction oriented client?  If it is determined that you cannot keep a reliable session it may be easier to change how the data entry is being transmitted.
0
 
LVL 13

Assisted Solution

by:prashsax
prashsax earned 200 total points
ID: 17885022
PIX, has a some time limit on the NAT sessions. So If you are using dynamic natting, It could be possible that pix will clear the nat session are that idle time limit.

Are you using natting in between?

If yes, you can try and do a static nat for just one client and see what happens.
0
 

Author Comment

by:George46227
ID: 17885299
bmedward:

Tell me more about the "keep-alive". None of the telnet clients I use offer a keep alive option (at least not that I can find). I tried setting up keep alive settings in the registry (win98 SE test pc) but it didn't help - in fact when I sniff the session I see no evidence of any keep-alive packets.

I do have one network that is wireless DSL, the others are hard-wired telecom (POTS) line DSL. Maybe the wireless DSL is a little worse than the others but they are all dropping a lot.

Can you explain - why does the telnet session need to have good connectivity from the beginning to the end? I don't see any background (polling or whatever) activity during the idle time either on LAN or internet.

I don't see any evidence of either failed packets or re-transmission. Re-transmission would presume something failed - but I can't see where anything failed. The session works perfect until it sets idle for a while, then a keystroke from the client causes the RST disconnection. I am not a sniffer expert so maybe I am missing something in the logs or just mis-interpreting what I see.

No - there is no chance whatsoever of changing data platforms. A ton of money was spent on the applications which are only accessible through telnet sessions. We have to live with whatever happens (or maybe go crazy trying to get any work done!).

prashsax:

one of the networks I am having trouble with is using PIX for vpn between 5 remote branches, 3 branches are complaining of drops, the other 2 are not. The vpn is all hard-wired internet access (DSL) at corp and branches.

I am also having the same problem with another network which is not using pix or any vpn. Some of the remote users use outbound NAT (client) to "inbound NAT" port forward to port 23(server). Other users are outbound thru a proxy server (client) to "inbound NAT" with port FW (server). All seem to be having the same problem.

I am suspicious of the NAT because all systems have some type of NAT involved - inbound or outbound or both. I wonder if the DSL modem or the NAT routers are "timing-out" or discarding the connections, maybe clearing the NAT port-mapping tables but I have not found any evidence of this (again maybe my sniffer skills are not too good).

Maybe I should try testing public ip-to-public ip telnet with no NAT or proxy involved, that would be more like the LAN sessions - LAN sessions never go down even with very long idle times.

Thanks
George




0
 
LVL 6

Expert Comment

by:bmedward
ID: 17885489
A keep-alive signal could be any type of activity that makes sure that there is some TCP/IP communication going on.  For experimental purposes, set up the server to ping a client (or set of clients) every 4 minutes or so.  This communication will reset the timeout counters for most devices in route between the client and server.  Note that if the NAT or PIX has a maximum session time limit, it would not be reset by keep-alive data.

As for the duration of a connection, a Ping transaction completes very quickly.  Usually it is only a matter of milliseconds from the start to end of a Ping cycle.   Telnet sessions, on the other hand, can easily span several days.  Often times, if the lower layers of the network stack drop – even momentarily – the session will be forced to reset by the client or another device (router, access point, etc) along the route.

Another method to test if this is a maximum time-limit issue - have some end users manually stop and re-establish their connections (re-boot pc / dsl modem) if necessary.  Track how long the failures are from the end user's start or re-connect time.  This may be a better metric for this type of issue as their idle time until disconnection will be variable.

0
 

Author Comment

by:George46227
ID: 17886674
11/6/06
9:40pm

bmedward:

I appreciate your advice but I am past the point of doing general diagnostics, intruding into the user's environment, etc. No matter how much data I collect about the problem it won't matter unless it leads to a solution or at least a definitive cause. I have spent a lot of time collecting data, testing, talking to users, etc. I need to tell the management at this point either-
it's the nature of the beast - telnet over internet is just flaky
or
I have a very good reason to believe what the cause is, what it will take to fix it, downtime, cost, etc.
I have used up all my "maybe it's this or maybe it's that, let's try this idea and see what happens"

I am leaning toward option #1 - this is what it is and nothing significant can be done short of ripping out the networks and a complete re-do of the system, maybe go to T1 internet instead of DSL? they won't like it and probably won't do it - "too expensive, how do we know it will even fix the problem, etc."

I would really like to at least have an intelligent understanding of the technical side of the problem, at least I could give an explanation. Like - why does a LAN connection never go down? why does the server send an immediate RST as soon as I press a key after some idle time? Obviously the server and client are communicating at that point because I get the RST!

Interesting note - if we state that the session goes down during idle time because of a "bad" internet connection, or network equipment, switch, router or whatever-
I can unplug the cat5 cable from my pc during a live telnet session on the LAN, have a long period of idle time, plug the cable back in and resume my session without a RST disconnection! As I said earlier the packet sniff shows no activity during idle time (whether plugged in or un-plugged) - so how does the server even know the network has been down (because the cat5 was un-plugged) for a long time? it resumes the session with no problem!

I do feel like something is actively terminating the session ( not just a unstable or slow/congested internet connection), I just can't find it - I don't think it is the client or the server. Something which is part of the internet connections and is common to several different locations, different ISP's, different DSL modems and routers, etc. My guess is it will turn out to be NAT-related and possibly not even solve-able. Some locations are using PIX for NAT and VPN, others are using low-end NAT DSL routers. I don't see any time-out values on the DSL NAT devices. The PIX are managed by a 3rd party telecom/voice/data provider which I have no access to - the guy says he has checked out the PIX and it's set up good without any time-outs.

Thanks for the advice and support
George
0
 
LVL 6

Expert Comment

by:bmedward
ID: 17890287
When troubleshooting wireless systems, I had similar problems with telent sessions.  In my case, a mobile computer would suspend or roam out of range for a period of time.  One problem arose from the ARP cache timeout in wireless access points - devices would be flushed after 7 (?) minutes of idle time.  When the mobile device re-connected, the AP had to refresh its ARP cache, re-discover a path to the server, and in most configurations would force a reset of any telnet sessions.  If this were a wired environment and some infrastructure device were behaving the same way as these access points, a occasional ping would reset ARP timeout counter.  However, it could just as likely be any of a thousand other things.  

Wavelink's TermProxy may work in this setting. (Note, I do not work for Wavelink)
http://www.wavelink.com/wavelink/emulators/wavelink_termproxy.aspx
This software works by establishing the telnet sessions on a intermediary computer and communicating with the clients over a more robust protocol.  

In my experience, they have been good about providing demo software.
0
 

Author Comment

by:George46227
ID: 17892550
11/7/06
3:05pm

bmedward:

thanks for the advice, I may look into it - if management decides the solution involves spending money on software/hardware, technical service time (me) to install/config, etc.

Today I am testing:
client: win98 se using built-in telnet with a public IP thru T1
server: w2k3 srv built-in telnet service with a public ip thru DSL
           - I am not sure if the server goes straight out thru the DSL device or passes thru the pix first; I do know it is not part of a vpn and it is not using NAT

I really thought this might work well but it works just as bad if not worse!!!
Sometimes I get "Connection... lost" after only 10 minutes idle without even touching the keyboard!!!
the server will just send a RST out of the blue for no apparent reason, the client is not doing anything just idling - then boom here comes a RST
On another occasion I seemed to stay connected for over an hour (no "Connection...lost") but when I tried to send a keystroke nothing whatsoever happened!!! The client netstat said "Established" but the server netstat showed no connection at all (checked it after the keystroke didn't respond). This time I could see in the trace the client re-sending 10 times with no response from the server - maybe because I have MaxDataRetries in the reg set to = 10. Never got a RST or anything from the server, the client eventually dis-connected itself.

I am afraid the ISP may be doing something which interferes with the session but I'm not sure what that could be.

I am thinking of posting part of my sniffer log to see if I am missing something. I hope I don't get yelled at by EE Admins! (like the time I posted a HJT log!). I will try to make it very small.

Thanks
George



0
 
LVL 6

Expert Comment

by:bmedward
ID: 17892738
If you do want to include sniffer data, posting the 4 to 10 packets from the telnet session up to, and including the RST, should be sufficient.  Be sure to identify the client and the server, and filter out any sensitive data.  The telnet data is likely to be less important than the packet overhead data - sequence number, fragmenting, retries, window size.

For fun, you could also try NetCat as a telnet client - you can get a windows version here http://www.vulnwatch.org/netcat/ .

The format to run this from a client computer would be 'nc -t hostname 23'.  

Also note that you can enable some level of logging with Windows default telnet clients - newer ones at least.

Good luck.
0
 

Author Comment

by:George46227
ID: 17894993
11/7/06
9:00pm

Would netcat be of any diagnostic value - I have used before as a telnet server. Win98 telnet logging doesn't so much - is 2k or xp any better?

I believe someone with better packet sniffer skills and packet-level knowledge of tcp might be able to see an error which causes the RST, it's too subtle for me to see - I don't know and understand the header fields that well, how the tcp error-control process really works, etc.

I will try to post part of a log tomorrow.

Thanks
George
0
 

Author Comment

by:George46227
ID: 17898246
11/8/06
9:20am

Here is a log. The ip's have been changed, I have included the headers and minimal data:

---------------------------------------------------------------------------------
#68       Receive time: 8826.923 (delta = 1.813)  packet length: 55    received length: 55  
Ethernet:   (00a024f0746c -> 00a0c81bde63)  type: IP(0x800)
Internet:   64.199.1.1 -> 209.254.1.1   hl: 5  ver: 4  tos: 00  len: 41  id: 0xfed7  fragoff: 0  flags: 0x2  ttl: 128  prot: TCP(6)  xsum: 0x1641
TCP: 1104 -> telnet(23)  seq: 00e35fdd  ack: acbfc076  win: 8102  hl: 5   xsum: 0x7411  urg: 0  flags: <ACK><PUSH>
data (1/1): d
---------------------------------------------------------------------------------
#69       Receive time: 8827.051 (delta = 0.128)  packet length: 60    received length: 60  
Ethernet:   (00a0c81bde63 -> 00a024f0746c)  type: IP(0x800)
Internet:  209.254.1.1 -> 64.199.1.1    hl: 5  ver: 4  tos: 00  len: 41  id: 0xf91f  fragoff: 0  flags: 0x2  ttl: 122  prot: TCP(6)  xsum: 0x21f9
TCP: telnet(23) -> 1104  seq: acbfc076  ack: 00e35fde  win: 65473  hl: 5   xsum: 0x93f4  urg: 0  flags: <ACK><PUSH>
data (1/1): d
---------------------------------------------------------------------------------
#70       Receive time: 8827.053 (delta = 0.002)  packet length: 55    received length: 55  
Ethernet:   (00a024f0746c -> 00a0c81bde63)  type: IP(0x800)
Internet:   64.199.1.1 -> 209.254.1.1   hl: 5  ver: 4  tos: 00  len: 41  id: 0xffd7  fragoff: 0  flags: 0x2  ttl: 128  prot: TCP(6)  xsum: 0x1541
TCP: 1104 -> telnet(23)  seq: 00e35fde  ack: acbfc077  win: 8101  hl: 5   xsum: 0x6f10  urg: 0  flags: <ACK><PUSH>
data (1/1): i
---------------------------------------------------------------------------------
#71       Receive time: 8827.190 (delta = 0.137)  packet length: 60    received length: 60  
Ethernet:   (00a0c81bde63 -> 00a024f0746c)  type: IP(0x800)
Internet:  209.254.1.1 -> 64.199.1.1    hl: 5  ver: 4  tos: 00  len: 41  id: 0xf920  fragoff: 0  flags: 0x2  ttl: 122  prot: TCP(6)  xsum: 0x21f8
TCP: telnet(23) -> 1104  seq: acbfc077  ack: 00e35fdf  win: 65472  hl: 5   xsum: 0x8ef3  urg: 0  flags: <ACK><PUSH>
data (1/1): i
---------------------------------------------------------------------------------
#72       Receive time: 8827.193 (delta = 0.003)  packet length: 55    received length: 55  
Ethernet:   (00a024f0746c -> 00a0c81bde63)  type: IP(0x800)
Internet:   64.199.1.1 -> 209.254.1.1   hl: 5  ver: 4  tos: 00  len: 41  id: 0xd8  fragoff: 0  flags: 0x2  ttl: 128  prot: TCP(6)  xsum: 0x1441
TCP: 1104 -> telnet(23)  seq: 00e35fdf  ack: acbfc078  win: 8100  hl: 5   xsum: 0x660f  urg: 0  flags: <ACK><PUSH>
data (1/1): r
---------------------------------------------------------------------------------
#73       Receive time: 8827.345 (delta = 0.152)  packet length: 60    received length: 60  
Ethernet:   (00a0c81bde63 -> 00a024f0746c)  type: IP(0x800)
Internet:  209.254.1.1 -> 64.199.1.1    hl: 5  ver: 4  tos: 00  len: 41  id: 0xf921  fragoff: 0  flags: 0x2  ttl: 122  prot: TCP(6)  xsum: 0x21f7
TCP: telnet(23) -> 1104  seq: acbfc078  ack: 00e35fe0  win: 65471  hl: 5   xsum: 0x85f2  urg: 0  flags: <ACK><PUSH>
data (1/1): r
---------------------------------------------------------------------------------
#74       Receive time: 8827.540 (delta = 0.195)  packet length: 54    received length: 54  
Ethernet:   (00a024f0746c -> 00a0c81bde63)  type: IP(0x800)
Internet:   64.199.1.1 -> 209.254.1.1   hl: 5  ver: 4  tos: 00  len: 40  id: 0x1d8  fragoff: 0  flags: 0x2  ttl: 128  prot: TCP(6)  xsum: 0x1342
TCP: 1104 -> telnet(23)  seq: 00e35fe0  ack: acbfc079  win: 8099  hl: 5   xsum: 0xd817  urg: 0  flags: <ACK>
---------------------------------------------------------------------------------
#75       Receive time: 8827.711 (delta = 0.171)  packet length: 56    received length: 56  
Ethernet:   (00a024f0746c -> 00a0c81bde63)  type: IP(0x800)
Internet:   64.199.1.1 -> 209.254.1.1   hl: 5  ver: 4  tos: 00  len: 42  id: 0x2d8  fragoff: 0  flags: 0x2  ttl: 128  prot: TCP(6)  xsum: 0x1240
TCP: 1104 -> telnet(23)  seq: 00e35fe0  ack: acbfc079  win: 8099  hl: 5   xsum: 0xcb03  urg: 0  flags: <ACK><PUSH>
data (2/2): ..
---------------------------------------------------------------------------------
#76       Receive time: 8827.860 (delta = 0.149)  packet length: 694   received length: 694
Ethernet:   (00a0c81bde63 -> 00a024f0746c)  type: IP(0x800)
Internet:  209.254.1.1 -> 64.199.1.1    hl: 5  ver: 4  tos: 00  len: 680  id: 0xf923  fragoff: 0  flags: 0x2  ttl: 122  prot: TCP(6)  xsum: 0x1f76
TCP: telnet(23) -> 1104  seq: acbfc079  ack: 00e35fe2  win: 65469  hl: 5   xsum: 0x293d  urg: 0  flags: <ACK><PUSH>
data (60/640): .[5;2HVolume in drive C has no label..[6;2HVolume Serial Num
---------------------------------------------------------------------------------
#77       Receive time: 8828.040 (delta = 0.180)  packet length: 54    received length: 54  
Ethernet:   (00a024f0746c -> 00a0c81bde63)  type: IP(0x800)
Internet:   64.199.1.1 -> 209.254.1.1   hl: 5  ver: 4  tos: 00  len: 40  id: 0x3d8  fragoff: 0  flags: 0x2  ttl: 128  prot: TCP(6)  xsum: 0x1142
TCP: 1104 -> telnet(23)  seq: 00e35fe2  ack: acbfc2f9  win: 7459  hl: 5   xsum: 0xd815  urg: 0  flags: <ACK>
---------------------------------------------------------------------------------
#78       Receive time: 9429.445 (delta = 601.405)  packet length: 60    received length: 60  
Ethernet:   (00a0c81bde63 -> 00a024f0746c)  type: IP(0x800)
Internet:  209.254.1.1 -> 64.199.1.1    hl: 5  ver: 4  tos: 00  len: 40  id: 0x9b57  fragoff: 0  flags: 00  ttl: 254  prot: TCP(6)  xsum: 0x3bc2
TCP: telnet(23) -> 1104  seq: acbfc2f9  ack: ----  win: 0  hl: 5   xsum: 0x560a  urg: 0  flags: <RST>


Thanks
George
0
 

Author Comment

by:George46227
ID: 17907091
11/9/06
11:00am

possible new relevant info:

I now have sniffers running on the client and the server-
client win98 se telnet with public IP thru T1 (no nat, no proxy)
server w2k telnet server behind NAT/port forwarded

in this test the session is dropping after 10-15 min idle without any keystroke entered (after the idle time) - spontaneous "Connection to host lost" on the client

both sniff logs show RST in the last packet - client sends RST to the server and server send RST to the client! How can each machine RST each other? RST terminates the session immediately - right? Makes me think something in the "middle" is sending the RST to both machines??!!

George
0
 
LVL 6

Expert Comment

by:bmedward
ID: 17907877
I wouldn't be alarmed by the mutual RST's - I would have expected the client to send a FIN, but I think that responding to a RST with an RST is a form of ACK'ing the RST.  Interesting behavior with your test - does it reliable behave this same way? Also, are you logged in? - if your client is just sitting at a login prompt, the server will usually reset the connection after a login timeout period.  

This could be a re-try issue - the time difference of almost exactly 600 seconds (5 min) seems like it would originate from a software controlled parameter, not just random chance.  

Do you have any other telnet servers to test a WAN client connection to?  It would be good to know if the same client to a different server (Solaris, AIX, AS/400, Linux, Cisco Router, or other) acts differently.

Here is some decent info from MS on Win2k network parameters.  Much of this should be portable to Win98.   Check out the section labeled "Transmission Control Protocol (TCP)" and the registry configurable parameters in Appendix A.  Adjusting the client's tcpWindowSize to a fixed value could be a solution - this seemed pretty erratic in the trace segment.  
http://www.microsoft.com/technet/itsolutions/network/deploy/depovg/tcpip2k.mspx

Have you tried Putty's telnet client (free) - it looks like it has a built in option for keep-alives.
They also have some FAQ info for dropped telnet sessions. http://www.chiark.greenend.org.uk/~sgtatham/putty/faq.html#faq-idleout

This could get ugly quickly, especially if you have to use production clients for testing and cannot easily reproduce the fault.  Researching all of the factors that could be contributing to this failure would be a very time consuming (expensive) process.  Furthermore, you may still find out that the corrective action is beyond your control.   Make sure that the powers-that-be are kept in the loop and have some grasp of what you are up against.  As a spare-time troubleshooting activity, this could span a few (more) months.
0
 
LVL 6

Expert Comment

by:bmedward
ID: 17907984
Couple more links on the PIX angle - no real answers that I saw from these.

http://www.velocityreviews.com/forums/t30867-pix-vpn-telnet-problem.html
http://www.velocityreviews.com/forums/t34032-intermittent-dropped-telnet-connection-through-vpn.html

Cisco example configuring the telnet session timeout - I would expect the dropped session pattern to be much more identifiable if this were the issue.

http://www.cisco.com/en/US/products/hw/vpndevc/ps2030/products_configuration_example09186a0080624e19.shtml
0
 

Author Comment

by:George46227
ID: 17909698
11/09/06
4:15pm

Thanks for the ideas.

The log above I posted is from the client side, I just today was able to sniff both sides (see my recent post above about the RST's on both sides). Sounds like you think the RST's on both sides is normal?

The log was-
Win98se telnet client thru Addtran T1 box public ip no nat/no proxy
W2k telnet server thru Netopia DSL modem public ip no nat/no proxy (although I am not sure how the boxes are wired - it could be that the server passes thru the pix first before it gets to the DSL modem or it could go from the switch to the DSL modem)

I have several servers I can test - w2k, w2k3, IBM AS400
clients will be windows telnet and AS400 emulation (mostly IBM Client Access, I could also use Netmanage ViewNow)
I have a variety of networks on client and server sides (I have remote control of desktops inside the LAN also) - T1, DSL, wireless DSL; proxy servers, NAT, pix, Netgear vpn routers

I can't say anything yet about the RST-to-RST being consistent on other systems.

I connect to the server, login with no problem, then run a command - I use "dir"
then I just leave it alone for a while
sometimes it spontaneously drops - the "Connection to host lost" window pops up on the client without any keystroke entered
other times nothing happens until I enter a ketstroke - I attempt a "cls", soon as I hit the "c" I get the pop up "Connection to host lost", I don't have to hit the enter key - just press "c"

the time period of 600 seconds I believe is 10 minutes - not 5

I have done some windows client to AS400 testing with similar results (unfortunately I have no access to the 400 for sniffing on the server side, checking the logs, changing the ip config, etc.)

LAN and point-to-point WAN (not internet) telnet is very stable, never goes down, I can approximate the problem by pulling the cat5 cable out (client) and hitting a ketstroke - the session drops after a short while, I'm not sure what the sniff would look like in that situation ( I assume the client tries to send a RST then disconnects the session, the server can't do a RST or anything since the client cat5 cable is un-plugged, it doesn't know anything is going on).

Thanks
George



0
 

Author Comment

by:George46227
ID: 17927466
11/12/06
7:55pm

Update:

after some further testing I have some interesting results:

client w98 se telnet behind DLink DSL router over cable (NAT)
server w2k srv telnet public IP DSL (no NAT/no proxy)

I see something like this:
client log
192.168.1.1:1025 > 209.1.1.1:23
209.1.1.1:23 > 192.168.1.1:1025
server log
70.1.1.1:60100 > 209.1.1.1:23
209.1.1.1:23 > 70.1.1.1:60100

note the server sees the public IP and port of the client's NAT router which is different than what the actual win98 client is using

note what happens after approximately 10 - 12 minutes of idle time:
server log
70.1.1.1:60101 > 209.1.1.1:23
209.1.1.1:23 > 70.1.1.1:60101 RST
client log
192.168.1.1:1025 > 209.1.1.1:23
209.1.1.1:23 > 192.168.1.1:1025 RST

the client's NAT router has changed the port from 60100 to 60101!! When the server see a connection from port 60101 it does not recognize it as an established connection - it's looking for port 60100. So it sends RST to port 60101 which is now mapped to the client port 1025. The client gets the RST and the connection is ended.
netstat on the server show 70.1.1.1:60100 ESTABLISHED - still connected and listening, no connection is shown for 70.1.1.1:60101
netstat on the client shows no connection to 209.1.1.1:23

Since most of the users are behind NAT I think this may be what is causing the dropped connections.

note: the logs I posted earlier on 11/8 do not show this behavior,  I will have to go back and re-check the logs and maybe re-do the test, the earlier tests were using client logs only, I only recently got a couple of servers setup with logging. The server logging is the only place the problem shows up (the change of the client's NAT port ).

George
0
 
LVL 6

Expert Comment

by:bmedward
ID: 17927782
The client port number stepping up by one is an indication that the client (or NAT/PAT) has reset one session and is trying to establish a new session - unless the telnet client is terribly messed up, it would not try to change ports mid-session.  

It looks like this is going through port translation, not just NAT.  When remote clients connect, are they always connecting through VPN tunnel to corporate network, and issued an internal IP address (if so, are they being NAT/PAT -ed at this point)?  Or, can public PC's talk to the telnet server directly?  
0
 

Author Comment

by:George46227
ID: 17932207
11/13/06
1:35pm

Yes I understand. Maybe I didn't explain it well.

It is not the telnet client that is changing the port - client port on the pc stays the same. It is the telnet client's NAT router that is re-setting the source port and changing the source port to a different number. Yes I think the NAT router is basically starting a "new" session ( a new tcp session not a new telnet session).

I am using the term NAT in the generic sense as everyone seems to do, although technically you are correct - it is really PAT, there is only 1 public IP, the NAT/PAT DSL router maps ports for each connection.

This is my test environment, I do not have direct access to the production environments. One of the main telnet servers having this idle/disconnection problem is inside a pix vpn tunnel, the clients at the remote branches also inside pix vpn, all connected by some type of DSL internet. Each pix device at the different locations provide internet access for web, email, etc. using NAT/PAT and also provides the vpn tunnel.

It is a pix router-to-router vpn over the internet, the clients do not get issued an internal ip. Each location has its own ip subnet like 192.168.100, 192.168.101, etc. It's all handled by the pix vpn setup - the telnet server points to the inside address of the local pix as the default gateway, the branch clients point to the inside address of their local pix as the default gateway.

We don't use any pc's with public ip's.

Thanks
George

0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 13

Expert Comment

by:prashsax
ID: 17933988
Yes, you are correct, DSL router generally do PAT.

Now, can you find some connection timeout setting in DSL router configuration.

What DSL router do you have right now. What make, model??

0
 

Author Comment

by:George46227
ID: 17935104
11/13/06
7:30pm

No - I don't see anything in the routers setup about time-out, but I will check again - I am familiar with several makes and models, have set them up myself, have never seen any type of relevant time-out config, I fear it is hard-coded built-in.

One of the routers is a DI-604 Ethernet Broadband Router (according to the admin page). It does not have any relevant time-out setting, I checked all the admin pages.
Also there is a Airlink101 4-Port Internet Broadband Router (according to the admin page). It also does not have any relevant time-out setting on the admin pages.

Both above routers exhibit the time-out client source-port change behavior. It does not seem to occur on the server side even when the same router is used on the server (reverse the telnet client and server - the problem is when the telnet client is behind NAT/PAT
router, telnet server behind NAT/PAT does not seem to be a problem).

I do not typically see the behavior when the client has a public IP (no NAT, no Proxy) - but there is one exception which is similar but not exactly the same. My Addtran T1 box will kill idle telnet sessions predictably after 10 minutes. It does not change the NAT/PAT port - it does not do any NAT, the client port stays the same. The logs indicate that the client and the server will always receive a RST from each other after 10 min. idle - but the logs do not show either client or server sending any RST!! Any the disconnect is spontaneous - it does not require a keystroke from the client. My guess is the Addtran is sending RST to the client and the server - but impersonating the client and server IP's and ports!!

Tomorrow I am going to do some more tests involving Windows ICS and also pix NAT.

Thanks
George
0
 
LVL 6

Expert Comment

by:bmedward
ID: 17938640
evil computers
0
 

Author Comment

by:George46227
ID: 18112658
12/10/06
9:50pm

I have not abandoned the question, I would like to post some test results
for the benefit of others to see. I hope to do this in the next few days,
sometime this week.

George

0
 

Author Comment

by:George46227
ID: 18198526
12/26/06
11:25am

I would like to present some testing results which may be useful to others who have a similar problem. I will post more details when I have time to organize the data:

Telnet testing of idle sessions summary:

1. Telnet internal LAN connections (client and server on same LAN with no NAT no Proxy between the client and server) are very stable
2. Telnet external Internet connections (client and server on the public internet with no NAT no Proxy between the client and server) are usually very stable with at least one exception
-telnet client with a public ip T1 line Adtran Total Access 912 is consistently dis-connected after 10 minutes idle; the client and server both receive a RST but neither sends a RST, presumption is the Adtran is sending the RST to both ends
3. Telnet behind a Proxy Server (telnet client local ip behind Proxy Server public ip, telnet server with public ip) is very stable
4. Telnet behind NAT (telnet client local ip behind NAT router public ip, telnet server with public ip) is often un-stable
-after some period of idle time the NAT router causes dis-connection; the NAT router connects to the server on some source port to establish the session (ex. 1024); after some idle time the port (ex. 1024) is deleted by the NAT router; when the client tries to resume the session the NAT router uses a new source port (ex. 1025); the server does not recognize port 1025 - no session has been established to port 1025, it has a session with port 1024 not 1025, the server sends RST to the client which dis-connects the session
-this seems to only be a problem when the client is behind NAT router; server behind NAT router does not seem to be a problem
-the DSL modem does not seem to be the problem, telnet clients with a public ip going thru the same DSL modem do not have the problem, only clients using a NAT router have the problem (the NAT router is going thru the same DSL modem)

George

0
 
LVL 6

Expert Comment

by:bmedward
ID: 18198565
These are some good observations - sounds like you've been keeping busy!  I always find it frustrating when I have to engineer around the undocumented features of segments that I have no control over.  Evil computers.

Have you settled on a solution that fits your technical and financial needs?
0
 

Expert Comment

by:bimmerman
ID: 18634449
I am currently testing a new setting with Client Access Express to AS400. Telnet connections over a firewall and a router were dropped without any message on client side - just a black screen.
If you use Client Access Express as telnet client, here is what I am trying as of now on some of our workstations: in the .ws file (which in fact is an .ini file for the telnet session) there is a section called [Telnet5250] under which I added the following line:

KeepAlive=Y
0
 

Author Comment

by:George46227
ID: 18656730
3/5/07
2:30pm

A few last comments:

see the above post from me for previous detail (12/2/6/06 11:26am EST)

the problem seems to be:
NAT/PAT which re-sets the client source port after a period of inactivity, the causes the server to fail to recognize the session as an established session, server sends RST which dis-connects the client

in general order of stablility from high (best) to low (worst)
the proxy server seems to be the most stable - proxy tested was Fortech Proxy Plus on NT4 and Win98 SE, both machines had a public ip with DSL, server usually did not cause disconnects after long periods of inactivity, did see one case of a FIN apparently sent by the proxy after a long period of inactivity (this was on a Win98 also running Internet Connection Sharing-NAT, so hard to say for sure whether the ICS-NAT may have caused it instead of the proxy), proxy was especially stable on NT4
PIX NAT-PAT - stablity good, RST seen after long inactivity time (60 to 90 min. range), I don't have access to the configuration so maybe this is configurable?
Win98 ICS-NAT - fairly stable, RST seen after a long inactivity time
LinkSys NAT-PAT DSL router - no RST seen but the connection just fails/hangs after a period of inactivity
Airlink NAT-PAT DSL router - RST seen after period of inactivity
Dlink NAT-PAT DSL router - RST seen after inactivity, sometimes only 15 minutes
Adtran T1 box - worst stablity, RST sent to both the client and the server ALWAYS every 10 minutes on schedule!
LAN connections last forever with no problem
WAN connections using DSL without NAT-PAT last foreverv more or less

George
0
 

Author Comment

by:George46227
ID: 18656759
3/5/07
2:40pm

bimmerman

how did your test come out? I am also using IBM Client Access to 400. The "KeepAlive=Y"??

George
0
 

Author Comment

by:George46227
ID: 18656919
3/5/07
3:00pm

I am going to close
Although no specific solution was presented - I am awarding points based on response and effort

Thanks to everyone
Georgd
0
 

Expert Comment

by:bimmerman
ID: 18657103
Three days as of now and still works fine. I am talking about wkstations using the connection for a work day long 8am to 4pm.  
0
 

Author Comment

by:George46227
ID: 18657664
3/5/07
4:50pm

You are using IBM Client Access to connect to IBM AS400? What ver - I am using mostly V4R5, I think the 400 is V5R1 or 2

What type of problem were you having, what symptom or error?

Has this config solved your problem? I tried to search the IBM help files and web site docs but couldn't find anything useful. There was a tool called Comm Power Tool or something but it required V5R1, even using V5R1 for testing it didn't solve the problem (OS was w98 se, maybe it needed w2k or xp?)

let me know how it is working, I have a better understanding of the cause of my problem but still no solution (the guys in charge of the network equipment - DSL routers, PIX's, etc. insist the problem is not with any of the network LAN/internet/vpn equipment)

George
0
 

Expert Comment

by:bimmerman
ID: 18663283
I am using Client Access Express to connect to V5R1.

My problem was that connections were dropped without any message (Windows or CA) on the wkstations and no traces in the logs on the AS400. An important thing that lead me into thinking is a timed out connection was that the wkstations were disconnected at randoom and not all at the same time.

And yes, the setting works for me.

Here's the document I have found:

http://207.181.121.77:999/George46227/as400.pdf
0
 

Author Comment

by:George46227
ID: 18663991
3/6/07
1:15pm

bimmerman:

Please keep in touch, post updates if things are working or not, if this appears to be a long-term permanent solution.

I have been working on this problem almost 6 months with no real effective solution, just crummy work-arounds that haven't really done much good. I have searched everywhere, Google, IBM web site, etc. with no answer.

I will try to implement this config tomorrow, I will let you know how it works out.

George


0
 

Expert Comment

by:bimmerman
ID: 18664200
Save the pdf I posted above cause I will remove it after you confirm you have it saved.
0
 

Author Comment

by:George46227
ID: 18671793
3/7/07
12:10pm

bimmerman:

I have saved the posted document from your link

thanks
George

0
 

Author Comment

by:George46227
ID: 18679739
3/8/07
10:45am
bimmerman

do you have any other info on the KeepAlive? is it configurable? How often is it sent? is it just an ACK? Did you have to re-config your MS TCP - registry settings for KeepAlive, etc.?

George
0
 

Expert Comment

by:bimmerman
ID: 18679867
All I did was to add KeepAlive=Y in [Telnet5250] section of the .ws config file for Client Access Express.

A note of importance maybe: In my case, when pinging the AS400, the replies are in the the low 60s for TTL as compared to other OS servers which are around 125.
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Suggested Solutions

If your business is like most, chances are you still need to maintain a fax infrastructure for your staff. It’s hard to believe that a communication technology that was thriving in the mid-80s could still be an essential part of your team’s modern I…
If you're not part of the solution, you're part of the problem.   Tips on how to secure IoT devices, even the dumbest ones, so they can't be used as part of a DDoS botnet.  Use PRTG Network Monitor as one of the building blocks, to detect unusual…
Viewers will learn how to connect to a wireless network using the network security key. They will also learn how to access the IP address and DNS server for connections that must be done manually. After setting up a router, find the network security…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now