Link to home
Start Free TrialLog in
Avatar of sriram
sriram

asked on

Getting lwp_park() ETIME Error 62 and process is hung

Hello,
I am running a multi-threaded process on Solaris 10 platform; compiled using SunStudio 12. Sometimes our process is hung; I looked at "truss -p <PID>"; it gives strange errors like this:
truss -p 1739
/6:     nanosleep(0xFAC7BBE0, 0xFAC7BBD8) (sleeping...)
/7:     pollsys(0xFAB7BC30, 1, 0x00000000, 0x00000000) (sleeping...)
/9:     lwp_park(0xFA97BD78, 0)         (sleeping...)
/5:     lwp_park(0xFAD7BD80, 0)         (sleeping...)
/11:    lwp_park(0xFA77BD78, 0)         (sleeping...)
/10:    lwp_park(0xFA87BD78, 0)         (sleeping...)
/2:     nanosleep(0xFB07BE58, 0xFB07BE50) (sleeping...)
/3:     nanosleep(0xFAF7BE58, 0xFAF7BE50) (sleeping...)
/4:     lwp_park(0x00000000, 0)         (sleeping...)
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000) (sleeping...)
/8:     lwp_park(0xFAA7BD78, 0)         (sleeping...)
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000)  = 0
/5:     lwp_park(0xFAD7BD80, 0)                         Err#62 ETIME
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000) (sleeping...)
/8:     lwp_park(0xFAA7BD78, 0)                         Err#62 ETIME
/5:     lwp_park(0xFAD7BD80, 0)         (sleeping...)
/11:    lwp_park(0xFA77BD78, 0)                         Err#62 ETIME
/8:     lwp_park(0xFAA7BD78, 0)         (sleeping...)
/11:    lwp_park(0xFA77BD78, 0)         (sleeping...)
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000)  = 0
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000) (sleeping...)
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000)  = 0
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000) (sleeping...)
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000)  = 0
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000) (sleeping...)
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000)  = 0
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000) (sleeping...)
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000)  = 0
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000) (sleeping...)
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000)  = 0
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000) (sleeping...)
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000)  = 0
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000) (sleeping...)
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000)  = 0
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000) (sleeping...)
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000)  = 0
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000) (sleeping...)
/9:     lwp_park(0xFA97BD78, 0)                         Err#62 ETIME
/10:    lwp_park(0xFA87BD78, 0)                         Err#62 ETIME
/9:     lwp_park(0xFA97BD78, 0)         (sleeping...)
/10:    lwp_park(0xFA87BD78, 0)         (sleeping...)
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000)  = 0
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000) (sleeping...)
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000)  = 0
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000) (sleeping...)
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000)  = 0
/5:     lwp_park(0xFAD7BD80, 0)                         Err#62 ETIME
/1:     pollsys(0xFFBFEF48, 6, 0xFFBFF0D0, 0x00000000) (sleeping...)
/8:     lwp_park(0xFAA7BD78, 0)                         Err#62 ETIME
/5:     lwp_park(0xFAD7BD80, 0)         (sleeping...)
/11:    lwp_park(0xFA77BD78, 0)                         Err#62 ETIME

Avatar of cup
cup

Haven't used Solaris for a long time so please forgive any inaccuracies.  truss just tells you what system calls are being used (similar to procmon in windows).  Try pstack.  That will tell you where every thread is.

The example in http://publib.boulder.ibm.com/httpserv/ihsdiag/get_backtrace.html shows you how to use pstack with a core dump, including how to find the bad guy.  Using pstack on a pid is similar.
ASKER CERTIFIED SOLUTION
Avatar of Rowley
Rowley
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Avatar of Brian Utterback
Brian Utterback
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sriram

ASKER

Thank you all for the wonderful suggestions. I tried using <pstack pid>; it clearly says our application is failing while sending a packet out using TCP/IP ::send(...) system call. The application is spitting errors is the TCP/IP server and clients connect to it for service and disconnects.

The Server runs for a while and starts spitting ETIME error (pstack says in TCP/IP send()). Any help finding issue.
Any examples for writing an effective TCP/IP send() on Solaris 10 would be greatly appreciated.

I never used dtrace; let me try to use it.

Thanks again.
That depends on your setup.  send and recv are just the end bits.  It is normally the initial bits that may be dodgy.  

1) What does the TCP initialization look like?  What setsockopt and ioctlsocket calls are you using?
2) Is the thread that is failing using a blocking send or a non-blocking send.  Basically, are you sitting there waiting until it has finished or are you polling?
The send library call never returns ETIME, so I think you are mis-interpreting something. The method of obtaining the error is a little convoluted and if you are not careful you can get the wrong value. If, for instance your code treats a return of 0 as an error, then you are likely to return the error code from that last failing system call, which from the truss appears to be the lwp_park calls. Perhaps you could post the actual output from pstack?
Avatar of sriram

ASKER

Thank You...