Link to home
Start Free TrialLog in
Avatar of sanjaynrai
sanjaynrai

asked on

Very high number of CLOSE_WAIT connection on Hadoop NameNode servers

Issue with high number of  TCP CLOSE_WAIT socket connections on Hortonwork(HDP2.6.4) NameNodes & Metastore Server.
We frequently have very high number of CLOSE_WAIT  socket connections on hadoop servers, as a result hadoop services are unavailable on Namenode servers. This happen after heavy ingestion of data in cluster. As a result, I need to restart the cluster after re-booting concerned servers.
I tried re-setting  value of several TCP attributes on the servers, but this had not solve the problem.
Using lsop | grep CLOSE_WAIT, I can identified concerned processes which had CLOSE_WAIT socket connections, I killed the concerned process & try to re-start hadoop services but this had also not solve the problem.
I had monitored the servers for number of CLOSE_WAIT socket connections & whenever number of these keep rising , it's point to symptom that the hadoop services on NameNode are going to down in couple of minutes.
Any idea to solve this issue is welcome.
Avatar of noci
noci

More info on TCP/IP here: https://en.wikipedia.org/wiki/Transmission_Control_Protocol

CLOSE_WAIT is a state that a socket is in because a remote system sent a FIN and this system did not (yet)...
 (Close can be done one sided, often done in webservices,,, the the browser, closes  through shutdown() ,  it's sending channel )


The next command should always work: to see if there are sockets in close_wait.
netstat -antp | grep CLOSE

which will also show which program has the socket..., then you need to check in that program why it didn't close the socket....
(socket leak)...   It should suffice to stop & start the offending program... (It still needs to wait in TIME_WAIT until all dust on the socket close settles).
This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.