Solved

Solaris command or utility to monitor number of open file descriptors and sockets

Posted on 2006-06-12
13
949 Views
Last Modified: 2013-12-27

Can somebody tell me what would be the best way to monitor the open sockets and file descriptors on a machine?  I am running into a situation where I think my machine is getting max'ed out on its file descriptors, which is set to 256, but I want to monitor this before I starting tuning the kernel.  Can somebody give me some advice on how to do this?  I know that I can use netstat to look at open sockets, but that doesn't give me the total number of combined sockets and file descriptors.

Also, I noticed that netstat sometimes repeats information for a socket.  Does anybody know to prevent that from happening?

Thanks.
0
Comment
Question by:mromeo
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +2
13 Comments
 
LVL 38

Expert Comment

by:yuzh
ID: 16890907
you can use "lsof" to do the job, download lsof binary package from:
http://sunfreeware.com/

man lsof (after installation)
to learn more details.

For tuning:
http://www.princeton.edu/~psg/unix/Solaris/troubleshoot/kerntune.html
0
 

Author Comment

by:mromeo
ID: 16895622
This is supposed to list the open files on a system.  If I type ulimit -a, it reports that my max file descriptors is 256. Yet if I type lsof | wc -l there are over 4000 entries.  So what is the real max on the number of file descriptors.  I assume that I am interpretting something incorrectly.  Can someone please explain the max file descriptors to me and how I can know if I'm hitting that limit?  Thanks.

0
 
LVL 38

Expert Comment

by:yuzh
ID: 16899222
lsof lists all open files, including files which are not using file descriptors - such as current working directories, memory mapped library files.

Please have a look at the following docs to  learn more details:

http://technopark02.blogspot.com/2005/05/solaris-32-bits-fopen-and-max-number.html
http://www.netadmintools.com/art295.html

and
http://sial.org/howto/debug/unix/lsof/
0
Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI

 
LVL 10

Expert Comment

by:Nukfror
ID: 16914948
Assuming you run lsof as root or lsof is SUID root, then lsof show all files open files for the entire system for ALL processes running.  ulimit -a is showing the PER PROCESS limit for the user running the command.  So if you personally have 1000 running applications and your file descriptor rlimit is set to 256, you theoretically could have 256 open file descriptors for each of your 1000 applications.  

So you trying to compare a micro view with a macro view - doesn't work.
0
 

Author Comment

by:mromeo
ID: 16915014
I thought this would be easy.  This system is losing socket connections w/o errors and I don't really know what else to look at.  I am following the theory that it is running low on system resources, as it is a very busy machine, but finding the right tools to monitor the machine is not so easy.  

Any suggestions are appreciated.
Thanks.
0
 
LVL 10

Expert Comment

by:Nukfror
ID: 16915616
You need to be more specific.  What's losing socket connections e.g. what application is losing them ?  Solaris the OS can handle 10 of thousands of sockets.

If the machine is busy it could be something else.  Is the machine page thrashing or swapping ?  

Run vmstat command and look at W column (swapped) and SR column (scan rate).  A positive W column means applications have physically swapped out of memory at some point in the past.  A postivie SR columns means you are having a memory pressure.  

The R column (mean runnable jobs but they can't cuz the CPUs are too busy at the moment) may also be another interesting column.  The old rule is still pretty much 4x to CPU core count and you need to start looking at your environment.  Higher then this and you have a machine that's probably not sized correctly for the workload being thrown at it.  

Check your network statistics and see if you are getting lots of network errors:  netstat -in.  

A remote possibility is running out of swap space (but if this is true then you would be having more then just lost socket issues).  Run a swap -l - this will show physical swap and if you're running into a comsumption issue.  This relates to both a swap/page thrash situation as well a application that are consuming all your tmpfs space.  You can put limits on how big you let your /tmp or even /var/run directories get.  See the man page for mount_tmpfs.
0
 

Author Comment

by:mromeo
ID: 16915766
Some proprietary and 3rd party applications are losing their socket connections at about the same time every night.   The only way to recover is to restart these programs. It is very hard to pinpoint the exact time and sequence of events, but I am trying to write a script that will help gather some statistis.  I have added your suggestions above.  I am going to run lsof, swap, netstat, and vmstat.  I'm also going to use lsof and netstat to try to get the number of socket connections and their state.  
0
 
LVL 10

Expert Comment

by:Nukfror
ID: 16916406
This could be network related as well.  You might want to use a packet sniffer close to the time the sockets drop to see if something is coming in from the remote side closing down the connection(s).  
0
 

Author Comment

by:mromeo
ID: 17592222
I was able to do this monitoring using sar -r 10 100.  The last value in the list was what I was looking for.
0
 
LVL 20

Expert Comment

by:Venabili
ID: 17592310
Changed recommendation: PAQ - refund
0
 

Accepted Solution

by:
CetusMOD earned 0 total points
ID: 17631255
PAQed with points refunded (200)

CetusMOD
Community Support Moderator
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Installing FreeBSD… FreeBSD is a darling of an operating system. The stability and usability make it a clear choice for servers and desktops (for the cunning). Savvy?  The Ports collection makes available every popular FOSS application and packag…
Java performance on Solaris - Managing CPUs There are various resource controls in operating system which directly/indirectly influence the performance of application. one of the most important resource controls is "CPU".   In a multithreaded…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Suggested Courses

617 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question