suggestions to find centos 5.10 memory leak on ml350 G4

Identical hardware ie ml350 G4's.The one running centos 5.9 doesn't miss a beat, BUT the two running centos 5.10; two different machines have memory leaks.  Same software running on all 3 machines. Machines will respond to ping, but the applications stop running & you can't log on or vnc etc onto the servers.  It doesn't seem to make a difference whether you logoff of just lock the session
Neale2210Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Gerwin Jansen, EE MVETopic Advisor Commented:
You could do some basic debugging to find out which process is using more memory.

A cron script that runs ps or top in batch mode every 5 minutes should give you an idea what is happening. Redirect output to a log file and analyze the log file after some time.

Example with top:
*/5 * * * * /bin/top -b -n 1 -o %MEM >> /tmp/toplog.txt

Open in new window

0
Seth SimmonsSr. Systems AdministratorCommented:
two different machines have memory leaks

what tells you that you have memory leaks?
just having processes not respond doesn't necessarily mean something is leaking memory; could be multiple possibilities
0
SandyCommented:
Can you paste output of #free -m ?

and also please check which all processes are consuming most of the memory.. let us know the percentage...

TY/SA
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

nociSoftware EngineerCommented:
top, and sorting on M might help too
0
Neale2210Author Commented:
Sorry All, been busy, I double checked that the firmware was up to date on everything... raid controller was 2 updates old on one server, & memory swapped out... from the top
Gerwin... I'll try that idea
Seth... If your logs suddenly stop on the main program you're running ..either you've got a hardware fault.. or a program miss behaving.. after eliminating possible hardware issues..you then have to look at software...
Sandy...free -m wasn't that useful.. but I will post it, as you may be looking for something different
noci.... fair enough
0
Gerwin Jansen, EE MVETopic Advisor Commented:
@noci - my batch mode top commmand is sorting on M(emory) :)
0
nociSoftware EngineerCommented:
does it show growth on the usage of any process? or Cache for that matter?
0
Gerwin Jansen, EE MVETopic Advisor Commented:
@noci - Growth: yes, any process: if run as root, yes,  cache: don't think so but top is just a start (other tools like vmstat will).
0
Neale2210Author Commented:
top -b -n 1 -o %MEM gives a usuage warning.... unknown argument 'o'
0
Neale2210Author Commented:
mem total:3546   used:1434    free:2111   shared:0    buffers:237   cached:764
-/+ buffers/cache:  432  3113   used/free

for 1 of the machines
0
Seth SimmonsSr. Systems AdministratorCommented:
what is the last line of your free output?  you only pasted 2 of 3 lines
0
Neale2210Author Commented:
swap: 9339  used 0 free 9339
0
Neale2210Author Commented:
for the 2nd machine mem total:3546  used:1522   free:2024   shared:0  buffers 218  cached:902
-/+ buffers/cache: 401 3145  used/free
swap: 4094  used:0  free:4094
0
Seth SimmonsSr. Systems AdministratorCommented:
ok...your swap partition isn't being used which is good
if there was a memory leak i would expect physical memory to be exhausted and swap space utilized but not seeing that
one place where i worked before we had an application that did file processing but if it came across a file of a certain size or data it didn't like it would leak memory until physical memoryand swap space was exhausted then the OOM killer would appear.  vendor acknowledged the issue and we put in a 2gb per-process limit on memory usage to mitigate the issue

doesn't seem to be the case here; not yet convinced of a memory leak
have you looked at syslog for anything that might help when this happens?
have you worked with the vendor at all for possible troubleshooting options for the application?
0
Neale2210Author Commented:
ok just seeing whether the firmware update to the raid controller made a difference, along with the change of memory.  The net difference in time before freezing was a gain of 3 days, probably attributable to the increase in memory.  So now I'm left with software, because I have an identical machine running centos 5.9, same program CommuniGate 6.0.5, with no freezing.

gerwin.. I still haven't had a correction to your 'top' command, to get mem stats.
0
Gerwin Jansen, EE MVETopic Advisor Commented:
I tested with another Linux distribution, sort on memory usage in CentOS(6):

top -b -n 1 -a

Can you try the above, if it works:
*/5 * * * * /bin/top -b -n 1 -a >> /tmp/toplog.txt

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.