IBM i Connectivity Issues

We have an odd issue on one of our IBM i servers. It appears that every Thursday at 5:00 PM - our hand held devices lose connectivity to our IBM i server.  It looks like our services go up and down for about 20 minutes and then stabilize.  If I look at QHST - I see that there are a lot of devices disconnecting - but i don't see the IBM i actually being unresponsive - but I don't know that for sure.  

I was wondering if there was any detailed logs somewhere related to TCP/IP that I could look to verify if the server actually lost connectivity during that timeframe or if OTHER devices connected to it lost the connectivity.  I have included a couple of snapshots of the show what I am seeing around that 17:00 PM timeframe.

Matthew RoessnerSenior Systems ProgrammerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Gary PattersonVP Technology / Senior Consultant Commented:
How do the handheld devices connect?  Are they making connections to an IBM-supplied service (database server, for example), or to a third-party or in-house program via sockets, perhaps?

Do any other IP devices lose connections during this period.

Is the R2 timer message above associated with a handhelp connection?  Message indicates that a connection has been opened, and that some IBM i program is attempting to re-send TCP PDUs to a remote device, and that the expected response is not received.  The TCP stack continues to try to deliver the PDU a fixed number of times (16 times by default), and finally gave up and threw the TCP2617 message.

Note the direction: this is a problem sending from IBM i to the handheld device(s).  I'd start my troubleshooting at the handhelds.  During this period, can you PING from the handhelds to the IBM i?  How about another device on the same network?  Can you ping from the IBM i to one of the handhelds?
Matthew RoessnerSenior Systems ProgrammerAuthor Commented:
Thanks for the suggestions.   We will go down that road...

Is there any way on the IBM i side of things that we could look to see if network connectivity was lost for a given time period?  We searched through QHST - but I didn't know if there was anything else we could look at to verify the network connectivity
Gary PattersonVP Technology / Senior Consultant Commented:
Well, if you mean something like an Ethernet line failure, sure - you'd see that on QSYSOPR message queue and perhaps QHST.  Really depends on you definition of what "network connectivity lost" is.  

Basically hardware can detect a communications loss right here:

IBM i (Network adapter) <-patch cable-> (Switch port / Router Port).  

If this connection goes down: patch cable unplugged, or switch/router powered down or failing port, you'll see warnings about loss of Ethernet connectivity.  Past that first connection, you have to use diagnostic tools (PING, traceroute), or access intermediate devices to isolate a communications loss.

If something happens farther down the network chain, it doesn't manifest in a way that can be directly detected.  Instead, you just don't get responses back from devices that are farther away, past whatever the failing link is.  TCP protocol is designed to automatically re-try sending PDUs that aren't acknowledged within a specified amount of time, and after a configured number of retries, you get a timeout error.  That's what you are getting.

The problem could be at the remote endpoint, or at any intermediate device.  With this specific error, though it probably isn't on the IBM i.

Since the error message indicates that the IBM i is repeatedly trying to contact the remote device, and not getting a response, there are several possible scenarios:

TCP PDU leaves IBM i, and is dropped or temporarily undeliverable by some intermediate device.  That would be logged on the intermediate device (firewall, switch, router) that dropped the PDU, assuming logging of those events was enabled on the device.

TCP PDU leaves IBM i, traverses intermediate network to handheld, and handheld does not respond - maybe due to a hardware problem (powered off), or a software problem.  Depending on the device and application, you might see a log on the handheld that reflects an error.

TCP PDU leaves IBM i, traverses intermediate network to handheld, received by handheld, handheld responds, but response is dropped or temporarily undeliverable by some intermediate device.   That would be logged on the intermediate device (firewall, switch, router) that dropped the PDU, assuming logging of those events was enabled on the device.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.