How to trace the cause of Out Of Memory issues

Is there a way to trace what is causing out of memory kernel panics? Server keeps rebooting
montypyAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

montypyAuthor Commented:
anyone?
jmcgOwnerCommented:
If the log files and panic messages don't give you any clues, you could try running a tool like top or vmstat to see if you can spot a runaway process as the culprit. Once you can identify a particular program as the cause, it should be possible to drill down and find out why it's misbehaving. Depending on what's going wrong, allocating a much larger pagefile may slow the runaway and delay the crash so as to give you more of a window to catch the leaker, if it is a leak.

Knowing that your server's configuration is adequate for the workload is sometimes hard. If you can run the server image in a virtual machine, more tools can be brought to bear. If it originated as a pre-packaged VM, it's possible the default configuration is just too stingy for it to do its job: e.g. the Turnkey Linux packages, at least the ones I've tried, start off using just a quarter-GB of RAM. This may have to be increased if the workload they're doing turns out to require more RAM.
arnoldCommented:
Panic usually means there are faulty memory modules.
I.e. There is a point where the consumed memory hits the faulty modules triggering the kernel panic event.

Is this a new or old server? Deals with whether the voltage across the memory modules is marginal below required but within the 5, 10% allotment until at one point it dips below that threshold destabilizing the data in memory responses triggering a kernel panic.

Please post the event from /var/log/messages dealing with the panic event as well as 10-20 lines preceding it.

Please include details what this server does, functions, etc.


The simplest thing, during bootup, run memtest, often installs include this test. See if the hardware includes a test as well.
Protecting & Securing Your Critical Data

Considering 93 percent of companies file for bankruptcy within 12 months of a disaster that blocked access to their data for 10 days or more, planning for the worst is just smart business. Learn how Acronis Backup integrates security at every stage

jmcgOwnerCommented:
This raises a question. If you are getting a panic that's specifically labeled "out of memory", then I think it's unlikely to be caused by faulty memory modules. Possible, but unlikely.

A few questions that need to be answered: 32-bit or 64-bit kernel? Physical or VM? How much physical RAM installed in the server (or virtual RAM allocated to the VM). What's the exact text of the panic message? Can you show the tail end of /var/log/messages from the point of the panic backwards?

Since the server is crashing more-or-less continually, it doesn't hurt to run memory diagnostics for a few rounds. I'm just a bit dubious that you'll find the source of this problem in that direction.
montypyAuthor Commented:
It turned out to be memcache not initializing in boot. This is physical server with plenty is memory.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
jmcgOwnerCommented:
So, when setting this server up, was some step skipped to cause memcached to be started up and initialized? There's sort of a loose coupling that I supposed someone might miss.

Glad you were able to resolve it. You can choose whether you want to ask for the question to be deleted or to accept your ultimate resolution as a 0-point solution.
montypyAuthor Commented:
I guess it was skipped and yes very glad we caught it.
montypyAuthor Commented:
Found the answer.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux OS Dev

From novice to tech pro — start learning today.