AIX 5.3 and OpenSSH 4.3, SSH Client's terminal randomly slows down on screen refresh.
Hello,
I was wondering if anyone could help me out here, we have "random" users that from time to time report their SSH connection to our main server which is AIX 5.3 (OpenSSH 4.3) using SSH2 slows down to the point where it is no longer usable (screen refresh), and they have to terminate their connection and log back in.
The same users in the office work fine, just randomly happens to random users...
Anyone experienced such a thing before on AIX or OpenSSH? Anything I should look for when diagnosing such an issue?
We are able to vision onto the logged in user, and see exactly what they see on their terminal, and when they move through their menus it updates right away on our end (knowing the server system's responsiveness is good) but the end-user's screen, it takes at least 20-30 seconds.
There is no network loss, and network speeds are good.
Also, attached is a LOG of the SSH Client's log, I also had Wireshark running in the background and can attach those logs as well.
Thanks for your help.
Log from the SSH Client the user is using:Initiating SSH session at Tue Jul 07 14:45:09 2009Attempting login as user davsm, protocol 2Remote protocol version 2.0,remote software version OpenSSH_4.3match: OpenSSH_4.3 pat OpenSSH*Remote is NON-HPN awaredatafellows = 2000000Local version string SSH-2.0-SecureNetTerm-3.1SSH2_MSG_KEXINIT sentSSH2_MSG_KEXINIT receivedkex: server->client aes256-cbc hmac-sha1 nonekex: client->server aes256-cbc hmac-sha1 noneSelected Kex Method = diffie-hellman-group-exchange-sha1hostkeyalg = ssh-dssSSH2_MSG_KEX_DH_GEX_REQUEST(1024<4096<8192) sentexpecting SSH2_MSG_KEX_DH_GEX_GROUPSSH2_MSG_KEX_DH_GEX_INIT sentexpecting SSH2_MSG_KEX_DH_GEX_REPLYHost '172.16.4.14' is known and matches the DSA host key.Found key in C:\Users\Davsm\AppData\Roaming\InterSoft Common\known_hosts:2verify_host_key_callback, host acceptedSSH2_MSG_NEWKEYS sentexpecting SSH2_MSG_NEWKEYSSSH2_MSG_NEWKEYS receivedssh_kex2 complete, hostOK=1send SSH2_MSG_SERVICE_REQUESTservice_accept: ssh-userauthgot SSH2_MSG_SERVICE_ACCEPTauthentications that can continue: publickey,gssapi-with-mic,password,keyboard-interactivenext auth method to try is passwordssh-userauth2 successful: method passwordAuthentication complete.channel 0: rfd=-1, wfd=-1, new [client-session]Entering interactive session.NetWatchProc is runningStarting client_init_dispatch, compat20 = 1client_init id 0Requesting pty.client session id: id 0channel 0: open confirm rwindow 0 rmax 32768Adjusting channel 0 by 131072, new = 131072
It seems that there's a network problem. Since you can see that menu items are quickly accessed at your end but n1t the client side.
It might be a wrongly set MTU on either of your ethernet adapters and switch ports. In this case it will cause lots of retries on packet exchange causing slow traffic and occasional hanging depending on the packet size.
Cheers,
K.
0
mirdeAuthor Commented:
If it was a wrong MTU set, why would it be so random. Initially it works good when the user connects, about 10 minutes later, it slows down to the point where its barely usable. The screen updates become way to slow.
It might be problem with Vista or SNP package on windows. Use Linux or learn to use netsh to disable TCP autotuning.
0
There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.
I don't think it is random. It just happens when the displayed contents reach the MTU size and packets get fragmented. This is what happens whan there's a MTU error.
To verify this you can use ping with larger packets. Just check if the system can get responses with packet sizes near the MTU values.
Try with ping -s 1000 and 1400, 1500, 1600 to see if there's a communication problem with these sizes.
You have previously told you can see th menus are correctly sent at the AIX side and it idicates that there2s a problem during the conversation. IT can be wrong MTU size, a broken switch port, some malfunctioning patch chord etc.
You can trace them by pinging from both sides with packet sizes near the Max MTU 1500 and you can disable MTU discovery with these commands::
# no -o udp_pmtu_discover=0
# no -o tcp_pmtu_discover=0
Then yo make it permanent add these to the end of /etc/rc.net. Locate to very bottom and place these commands between
the last if .. fi stanza and
unset statement.
You can display current PMTU values using the command:
pmtu display
0
mirdeAuthor Commented:
Having wrong PMTU set, would this cause consistent slowdown in a session? The users that are reporting this issue have been experiancing it "randomly" and not consistenly, but definately a couple times per day.
They are connecting using a SSH client on Vista Enterprise clients.
It would happen while there's a packet transmission near the end of MTU. When there's a packet larger than the MTU then the packet will be fragmented. Starting with AIX 5.3 Fragmentation is prohibited. It will end that there's a communication hang so they will need to reset their connection and restart.
So they will not notice this if all the packets they are sending are a couple of bytes. Such a sa a small menu gets updated which is smaller than the MTU size. So everything will work. But when the packets are bigger meaning containing larger info such as a full screen output then they will reach the physicalsize and packets will be fragmented and since AIX is denying packet fragments y default it will be such that the communication will stall at that point. So it is normal that they stall at a random point.
Besides since MTU discovery is dynamic may be sometimes they get a smaller value depending on the network equipment located in between but sometimes they the discovery algorithm may decide at a larger MTU value over some other equipment so this will add up to the randomness.
I am insistently saying the same thing from the very beginning. I've suggested you to check and diagnose your network using:
- ping packets at sizes near your MTU value this is a non-disruptive operation.
- I've also told you how to disable this behavior temporarily and permanently too. This is also another no-disruptive operation and you can easily experiment with the settings and there's no need to make them permanent if you don't like the results.
- I've told you how to display your current pmtu values for AIX 5.3. This is an informative command.
So none of the commands Ive suggested here will have a negative effect on your operations.
Please try these commands and let me know the results otherwise we'll get a deadlock here while the bottleneck on your traffic keeps your users stalled.
0
mirdeAuthor Commented:
Thanks KaremE, for the very descriptive comment. I plan to test this tomorrow while in the office and see what results I get.
You'd already told that you cold observe that there's nothing wrong at the server and it is transmitting the menus as it should be while the client could not receive all the sent packets.
This alone clearly indicates that there's a problem during communication. It could be packet fragmentation or some software such as personal firewalls such as Norton etc is blocking some communication.
In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!
It seems that there's a network problem. Since you can see that menu items are quickly accessed at your end but n1t the client side.
It might be a wrongly set MTU on either of your ethernet adapters and switch ports. In this case it will cause lots of retries on packet exchange causing slow traffic and occasional hanging depending on the packet size.
Cheers,
K.