Solved

AIX 5.3 and OpenSSH 4.3, SSH Client's terminal randomly slows down on screen refresh.

Posted on 2009-07-07
18
987 Views
Last Modified: 2013-11-17
Hello,

I was wondering if anyone could help me out here, we have "random" users that from time to time report their SSH connection to our main server which is AIX 5.3 (OpenSSH 4.3) using SSH2 slows down to the point where it is no longer usable (screen refresh), and they have to terminate their connection and log back in.

The same users in the office work fine, just randomly happens to random users...

Anyone experienced such a thing before on AIX or OpenSSH? Anything I should look for when diagnosing such an issue?

We are able to vision onto the logged in user, and see exactly what they see on their terminal, and when they move through their menus it updates right away on our end (knowing the server system's responsiveness is good) but the end-user's screen, it takes at least 20-30 seconds.

There is no network loss, and network speeds are good.

Also, attached is a LOG of the SSH Client's log, I also had Wireshark running in the background and can attach those logs as well.

Thanks for your help.
Log from the SSH Client the user is using:
 
Initiating SSH session at Tue Jul 07 14:45:09 2009
Attempting login as user davsm, protocol 2
Remote protocol version 2.0,remote software version OpenSSH_4.3
match: OpenSSH_4.3 pat OpenSSH*
Remote is NON-HPN aware
datafellows = 2000000
Local version string SSH-2.0-SecureNetTerm-3.1
SSH2_MSG_KEXINIT sent
SSH2_MSG_KEXINIT received
kex: server->client aes256-cbc hmac-sha1 none
kex: client->server aes256-cbc hmac-sha1 none
Selected Kex Method = diffie-hellman-group-exchange-sha1
hostkeyalg = ssh-dss
SSH2_MSG_KEX_DH_GEX_REQUEST(1024<4096<8192) sent
expecting SSH2_MSG_KEX_DH_GEX_GROUP
SSH2_MSG_KEX_DH_GEX_INIT sent
expecting SSH2_MSG_KEX_DH_GEX_REPLY
Host '172.16.4.14' is known and matches the DSA host key.
Found key in C:\Users\Davsm\AppData\Roaming\InterSoft Common\known_hosts:2
verify_host_key_callback, host accepted
SSH2_MSG_NEWKEYS sent
expecting SSH2_MSG_NEWKEYS
SSH2_MSG_NEWKEYS received
ssh_kex2 complete, hostOK=1
send SSH2_MSG_SERVICE_REQUEST
service_accept: ssh-userauth
got SSH2_MSG_SERVICE_ACCEPT
authentications that can continue: publickey,gssapi-with-mic,password,keyboard-interactive
next auth method to try is password
ssh-userauth2 successful: method password
Authentication complete.
channel 0: rfd=-1, wfd=-1, new [client-session]
Entering interactive session.
NetWatchProc is running
Starting client_init_dispatch, compat20 = 1
client_init id 0
Requesting pty.
client session id: id 0
channel 0: open confirm rwindow 0 rmax 32768
Adjusting channel 0 by 131072, new = 131072

Open in new window

0
Comment
Question by:mirde
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 5
  • 4
18 Comments
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 24799840
Hi,

It seems that there's a network problem. Since you can see that menu items are quickly accessed at your end but n1t the client side.

It might be a wrongly set MTU on either of your ethernet adapters and switch ports. In this case it will cause lots of retries on packet exchange causing slow traffic and occasional hanging depending on the packet size.

Cheers,
K.
0
 

Author Comment

by:mirde
ID: 24805281
If it was a wrong MTU set, why would it be so random. Initially it works good when the user connects, about 10 minutes later, it slows down to the point where its barely usable. The screen updates become way to slow.
0
 
LVL 62

Expert Comment

by:gheist
ID: 24832029
It might be problem with Vista or SNP package on windows. Use Linux or learn to use netsh to disable TCP autotuning.
0
Will your db performance match your db growth?

In Percona’s white paper “Performance at Scale: Keeping Your Database on Its Toes,” we take a high-level approach to what you need to think about when planning for database scalability.

 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 24832829
I don't think it is random. It just happens when the displayed contents reach the MTU size and packets get fragmented. This is what happens whan there's a MTU error.

To verify this you can use ping with larger packets. Just check if the system can get responses with packet sizes near the MTU values.

Try with ping -s 1000 and 1400, 1500, 1600 to see if there's a communication problem with these sizes.
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 24832831
You have previously told you can see th menus are correctly sent at the AIX side and it idicates that there2s a problem during the conversation. IT can be wrong MTU size, a broken switch port, some malfunctioning patch chord etc.
0
 
LVL 62

Expert Comment

by:gheist
ID: 24835806
/usr/sbin/no -o tcp_mtu_discover=0
is likely to get past PMTU blackhole problems.
(add to rc.tcpip for permanent effect)
0
 

Author Comment

by:mirde
ID: 24845784
Is there a way to figure out if PMTU blackhole problems exis within the networkt? A way to test for it?
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 24845890
You can trace them by pinging from both sides with packet sizes near the Max MTU 1500 and you can disable MTU discovery with these commands::

# no -o udp_pmtu_discover=0
# no -o tcp_pmtu_discover=0

Then yo make it permanent add these to the end of /etc/rc.net. Locate to very bottom and place these commands between
the last if .. fi stanza and
unset statement.

Save and exit.

Cheers,
K
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 24845915
You can display current PMTU values using the command:

pmtu display

0
 

Author Comment

by:mirde
ID: 24845924
Having wrong PMTU set, would this cause consistent slowdown in a session? The users that are reporting this issue have been experiancing it "randomly" and not consistenly, but definately a couple times per day.
They are connecting using a SSH client on Vista Enterprise clients.
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 24846033
It would happen while  there's a packet transmission near the end of MTU. When there's a packet larger than the MTU then the packet will be fragmented. Starting with AIX 5.3 Fragmentation is prohibited. It will end that there's a communication hang so they will need to reset their connection and restart.

So they will not notice this if all the packets they are sending are a couple of bytes. Such a sa a small menu gets updated which is smaller than the MTU size. So everything will work. But when the packets are bigger meaning containing larger info such as a full screen output then they will reach the physicalsize and packets will be fragmented and since AIX is denying packet fragments y default it will be such that the communication will stall at that point. So it is normal that they stall at a random point.

Besides since MTU discovery is dynamic may be sometimes they get a smaller value depending on the network equipment located in between but sometimes they the discovery algorithm may decide at a larger MTU  value over some other equipment so this will add up to the randomness.

I am insistently saying the same thing from the very beginning. I've suggested you to check and diagnose your network using:
- ping packets at sizes near your MTU value this is a non-disruptive operation.
- I've also told you how to disable this behavior temporarily and permanently too. This is also another no-disruptive operation and you can easily experiment with the settings and there's no need to make them permanent if you don't like the results.
- I've told you how to display your current pmtu values for AIX 5.3. This is an informative command.

So none of the commands Ive suggested here will have a negative effect on your operations.

Please try these commands and let me know the results otherwise we'll get a deadlock here while the bottleneck on your traffic keeps your users stalled.
0
 

Author Comment

by:mirde
ID: 24846081
Thanks KaremE, for the very descriptive comment. I plan to test this tomorrow while in the office and see what results I get.
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 24846095
> Having wrong PMTU set, would this cause consistent slowdown in a session?

This is the typical behavior with wrong MTU setting.
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 24846100
You'd already told that you cold observe that there's nothing wrong at the server and it is transmitting the menus as it should be while the client could not receive all the sent packets.

This alone clearly indicates that there's a problem during communication. It  could be packet fragmentation or some software such as personal firewalls such as Norton etc is blocking some communication.
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 24846102
You'rewelcome just take your time and let me know the outcome.
0
 
LVL 62

Accepted Solution

by:
gheist earned 250 total points
ID: 24846922
It is a problem with vista's tcp/ip autotuning. new kid on the block - nothing can help on AIX side.

netsh interface tcpip disable disable disable disable disable disable
0
 

Author Comment

by:mirde
ID: 24880131
After disabling Vista's TCP/IP auto tuning, the users have reported that their SSH connection has not been dropping.

Continuing to monitor this, but hopefully it does the fix.

Thanks all for your input.
0
 
LVL 62

Expert Comment

by:gheist
ID: 24887498
BTW you never told connection ever dropped itself...
0

Featured Post

NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
ADCs have gained traction within the last decade, largely due to increased demand for legacy load balancing appliances to handle more advanced application delivery requirements and improve application performance.
Michael from AdRem Software outlines event notifications and Automatic Corrective Actions in network monitoring. Automatic Corrective Actions are scripts, which can automatically run upon discovery of a certain undesirable condition in your network.…
Monitoring a network: how to monitor network services and why? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the philosophy behind service monitoring and why a handshake validation is critical in network monitoring. Software utilized …
Suggested Courses

632 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question