Go Premium for a chance to win a PS4. Enter to Win

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 992
  • Last Modified:

AIX 5.3 and OpenSSH 4.3, SSH Client's terminal randomly slows down on screen refresh.


I was wondering if anyone could help me out here, we have "random" users that from time to time report their SSH connection to our main server which is AIX 5.3 (OpenSSH 4.3) using SSH2 slows down to the point where it is no longer usable (screen refresh), and they have to terminate their connection and log back in.

The same users in the office work fine, just randomly happens to random users...

Anyone experienced such a thing before on AIX or OpenSSH? Anything I should look for when diagnosing such an issue?

We are able to vision onto the logged in user, and see exactly what they see on their terminal, and when they move through their menus it updates right away on our end (knowing the server system's responsiveness is good) but the end-user's screen, it takes at least 20-30 seconds.

There is no network loss, and network speeds are good.

Also, attached is a LOG of the SSH Client's log, I also had Wireshark running in the background and can attach those logs as well.

Thanks for your help.
Log from the SSH Client the user is using:
Initiating SSH session at Tue Jul 07 14:45:09 2009
Attempting login as user davsm, protocol 2
Remote protocol version 2.0,remote software version OpenSSH_4.3
match: OpenSSH_4.3 pat OpenSSH*
Remote is NON-HPN aware
datafellows = 2000000
Local version string SSH-2.0-SecureNetTerm-3.1
kex: server->client aes256-cbc hmac-sha1 none
kex: client->server aes256-cbc hmac-sha1 none
Selected Kex Method = diffie-hellman-group-exchange-sha1
hostkeyalg = ssh-dss
SSH2_MSG_KEX_DH_GEX_REQUEST(1024<4096<8192) sent
Host '' is known and matches the DSA host key.
Found key in C:\Users\Davsm\AppData\Roaming\InterSoft Common\known_hosts:2
verify_host_key_callback, host accepted
expecting SSH2_MSG_NEWKEYS
ssh_kex2 complete, hostOK=1
service_accept: ssh-userauth
authentications that can continue: publickey,gssapi-with-mic,password,keyboard-interactive
next auth method to try is password
ssh-userauth2 successful: method password
Authentication complete.
channel 0: rfd=-1, wfd=-1, new [client-session]
Entering interactive session.
NetWatchProc is running
Starting client_init_dispatch, compat20 = 1
client_init id 0
Requesting pty.
client session id: id 0
channel 0: open confirm rwindow 0 rmax 32768
Adjusting channel 0 by 131072, new = 131072

Open in new window

  • 9
  • 5
  • 4
1 Solution
Kerem ERSOYPresidentCommented:

It seems that there's a network problem. Since you can see that menu items are quickly accessed at your end but n1t the client side.

It might be a wrongly set MTU on either of your ethernet adapters and switch ports. In this case it will cause lots of retries on packet exchange causing slow traffic and occasional hanging depending on the packet size.

mirdeAuthor Commented:
If it was a wrong MTU set, why would it be so random. Initially it works good when the user connects, about 10 minutes later, it slows down to the point where its barely usable. The screen updates become way to slow.
It might be problem with Vista or SNP package on windows. Use Linux or learn to use netsh to disable TCP autotuning.
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

Kerem ERSOYPresidentCommented:
I don't think it is random. It just happens when the displayed contents reach the MTU size and packets get fragmented. This is what happens whan there's a MTU error.

To verify this you can use ping with larger packets. Just check if the system can get responses with packet sizes near the MTU values.

Try with ping -s 1000 and 1400, 1500, 1600 to see if there's a communication problem with these sizes.
Kerem ERSOYPresidentCommented:
You have previously told you can see th menus are correctly sent at the AIX side and it idicates that there2s a problem during the conversation. IT can be wrong MTU size, a broken switch port, some malfunctioning patch chord etc.
/usr/sbin/no -o tcp_mtu_discover=0
is likely to get past PMTU blackhole problems.
(add to rc.tcpip for permanent effect)
mirdeAuthor Commented:
Is there a way to figure out if PMTU blackhole problems exis within the networkt? A way to test for it?
Kerem ERSOYPresidentCommented:
You can trace them by pinging from both sides with packet sizes near the Max MTU 1500 and you can disable MTU discovery with these commands::

# no -o udp_pmtu_discover=0
# no -o tcp_pmtu_discover=0

Then yo make it permanent add these to the end of /etc/rc.net. Locate to very bottom and place these commands between
the last if .. fi stanza and
unset statement.

Save and exit.

Kerem ERSOYPresidentCommented:
You can display current PMTU values using the command:

pmtu display

mirdeAuthor Commented:
Having wrong PMTU set, would this cause consistent slowdown in a session? The users that are reporting this issue have been experiancing it "randomly" and not consistenly, but definately a couple times per day.
They are connecting using a SSH client on Vista Enterprise clients.
Kerem ERSOYPresidentCommented:
It would happen while  there's a packet transmission near the end of MTU. When there's a packet larger than the MTU then the packet will be fragmented. Starting with AIX 5.3 Fragmentation is prohibited. It will end that there's a communication hang so they will need to reset their connection and restart.

So they will not notice this if all the packets they are sending are a couple of bytes. Such a sa a small menu gets updated which is smaller than the MTU size. So everything will work. But when the packets are bigger meaning containing larger info such as a full screen output then they will reach the physicalsize and packets will be fragmented and since AIX is denying packet fragments y default it will be such that the communication will stall at that point. So it is normal that they stall at a random point.

Besides since MTU discovery is dynamic may be sometimes they get a smaller value depending on the network equipment located in between but sometimes they the discovery algorithm may decide at a larger MTU  value over some other equipment so this will add up to the randomness.

I am insistently saying the same thing from the very beginning. I've suggested you to check and diagnose your network using:
- ping packets at sizes near your MTU value this is a non-disruptive operation.
- I've also told you how to disable this behavior temporarily and permanently too. This is also another no-disruptive operation and you can easily experiment with the settings and there's no need to make them permanent if you don't like the results.
- I've told you how to display your current pmtu values for AIX 5.3. This is an informative command.

So none of the commands Ive suggested here will have a negative effect on your operations.

Please try these commands and let me know the results otherwise we'll get a deadlock here while the bottleneck on your traffic keeps your users stalled.
mirdeAuthor Commented:
Thanks KaremE, for the very descriptive comment. I plan to test this tomorrow while in the office and see what results I get.
Kerem ERSOYPresidentCommented:
> Having wrong PMTU set, would this cause consistent slowdown in a session?

This is the typical behavior with wrong MTU setting.
Kerem ERSOYPresidentCommented:
You'd already told that you cold observe that there's nothing wrong at the server and it is transmitting the menus as it should be while the client could not receive all the sent packets.

This alone clearly indicates that there's a problem during communication. It  could be packet fragmentation or some software such as personal firewalls such as Norton etc is blocking some communication.
Kerem ERSOYPresidentCommented:
You'rewelcome just take your time and let me know the outcome.
It is a problem with vista's tcp/ip autotuning. new kid on the block - nothing can help on AIX side.

netsh interface tcpip disable disable disable disable disable disable
mirdeAuthor Commented:
After disabling Vista's TCP/IP auto tuning, the users have reported that their SSH connection has not been dropping.

Continuing to monitor this, but hopefully it does the fix.

Thanks all for your input.
BTW you never told connection ever dropped itself...

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

  • 9
  • 5
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now