Link to home
Start Free TrialLog in
Avatar of Vince Janisse
Vince Janisse

asked on

ESXi 5.5.0, 1623387 - HP Proliant DL380P G8 server

Hi I have a customer with an HP Proliant DL380P server with ESXi 5.5.0, 1623387. Everything was fine until we upgraded from 64 GB of memory to 128 GB (16 - 8 GB chips). The  server still runs fine just the ilo and front error lights always show memory error lights.

They said the the 4 DIMMs closest to CPU1 and CPU2 should be 1333 Mhz 1.35v and the 4 DIMMs furthest from CPU and CPU2 should be 1600 MHz 1.5v (something to do with a Ranking System). I think the DIMMs should all be the same speed and volts. (See attched picture of how the memory is curently configured.)

I am trying to get HP to send me 8 1333MHZ 1.3v DIMMs to replace the 1600 Mhz 1.5v DIMMs.
User generated image

I would appreciate others thoughts on this.

Also they want me to update the drivers on the ESXi OS (HP always wants all firmware and drivers up to date before they replace hardware), they sent me this link http://vibsdepot.hpe.com has anyone ever updated HP drivers on ESXi OS. I am not 100% sure how to go about it and I find the link rather confusing. It looks like an entire ESXi image for the entire os not just the drivers. This is what the HP Tech sent in an email - . I have listed a web site for VMWare / HPE ProLiant software drivers. Please review the current VMWare drivers and apply the updates http://vibsdepot.hpe.com.  But I do not see where I would just download the drivers.
ASKER CERTIFIED SOLUTION
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Vince Janisse
Vince Janisse

ASKER

Thanks Andrew,

The memory has been replaced several times sorry I forgot to mention that, they also replaced the mainboard. but the memory has always displayed errors since upgrading to 128 GB and if I reboot the errors go away sometimes and come back or display errors on chips in different bays, it is very flaky. the HP firmware is up to date also. The only thing that is not up to date is the ESXi OS. The errors almost always seem to show up on the 1600 MHz DIMMs. I don't think the ESXi server version would cause the memory to fail and have weird fluctuating errors.

I think the mixed speeds is causing the problem but I cannot say for sure. An HP tech has came onsite twice and not solved the problem. They now have a level 3 tech involved he wants the ESXi OS updated.

I do not use vCenter I setup everything from the vSphere client.

Sorry I am new to ESXi kind of learning on the fly, I would appreciate a good resource for maintaining and keeping ESXi up to date.

So I can remote in after hours shutdown the VMs and do it with SSH? How do you enter maintenance mode from SSH?

Then I would just type the commands you mentioned in your above response?

and wait for it to finish, should I do this onsite or doing it remotely over the internet is fine?

Also the HP level 3 tech said the  ilo logs has been showing errors from CPU2 so possibly a CPU problem.
But the ilo system information always says CPU's are fine I only see memory errors and now I see NIC link down but no network problems and I am only using one NIC.
Memory errors are due to Host Hardware and nothing to do with ESXi. Keep pressure on HPE to fix. This is NOT correct.

In cases like this with HPE and Dell, we've had new motherboards, new CPUs, new servers, and complete banks of memory swapped out, until the issues was fixed.

So I can remote in after hours shutdown the VMs and do it with SSH? How do you enter maintenance mode from SSH?

Then I would just type the commands you mentioned in your above response?

and wait for it to finish, should I do this onsite or doing it remotely over the internet is fine?

Also the HP level 3 tech said the  ilo logs has been showing errors from CPU2 so possibly a CPU problem.
But the ilo system information always says CPU's are fine I only see memory errors and now I see NIC link down but no network problems and I am only using one NIC.

Yes all can be done remote. Whether you do it depends on your comfort factor, and confidence, maybe first time you should be onsite, we've been doing remote for over 14+ years now!
As for HPE, standard Support practice to be on the latest version, but the OS, has got nothing to do with WHY iLo has got it as degraded!

Also depends how much remote access you have e.g. Full iLo is brilliant, just as if you are there at the console.

enter

vim-cmd hostsvc/maintenance_mode_enter

Open in new window


exit

vimsh -n -e /hostsvc/maintenance_mode_exit

Open in new window

Thanks again Andrew,

I am going to get the hardware stable before trying any upgrades. I will post a follow up once resolved. Do you back up the VM's and the ESXi OS before upgrading /updating?
You should have valid backups of all your VMs, before any production changes.

ESXi OS does not need to be backed up.
do you require any additional help, to close this question, and select a solution.