ESX 3.5 Locking up


I installed 2 ESX 3.5 Update 2 servers at the same time a few months ago. They are both DL380 G5s 14 GB RAM, 2 x Quad core processors and are attached to an MSA500.

ESX02 as I called it locks up once a week. The guests are in accessible and the Infrastructure client cannot connect to the serve. The only way I can get it back up is to connect to ILO and reboot it. The server is by no means stressed and has plenty of available resources.

I managed to get a look at what it says on the console when the above happens. It read
"kernel 2.4.21-57. Elvmnix on an i686 you probably have a hardware problem with your RAM chips. Please consult hardware error logs"

I booted the server off a diagnostic cd and ran a memory test and it gave the all clear, as usual

I am looking for a way to get the server to send out logs or see what is happening in the back-ground when this happens. I also have the HP Insight Management agents installed and the ILO doesn't have any errors when the server is locked up.

Help much appreciated!

Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

I was told my VMware tech support awhile back that those HP Management Agents can cause problems sometimes.

I haven't always had good luck testing RAM with software apps like the diagnostic disks.  Try swapping it out for sure.  It's an easy fix if it is indeed the RAM.
I second azjeep on that this is probably a hw issue, and that your memory is likely the culprit. Checking your memory dimms before going into prod is very important.

davewexAuthor Commented:
I installed the agents after I started getting the issue. I am going to run memtest on it this evening as I used HP diagnostics the last time. I guess you guys can't help and its down to trial and error...I was just hoping for an easy fix

thanks anyway
It doesn't get much easier than swapping out a couple of DIMMs ;)
davewexAuthor Commented:
There is 14 GB of RAM in this server and I don't have that spare and the issue only arises once every few weeks.

Memtest found an error with the RAM in DIMM A so I have replaced this and all seems well


Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.