Diagnosing mystery fan failure

VMware displayed fan hardware error message but HP Proliant never ever shows any diagnostics errors when running the full hardware diagnostic at boot.  We have had periodic issues of server shutdowns a couple times a year w/ again no diagnostics errors but this particular one caused a couple virtual machine to get really corrupt to the point we had to perform a Veeam restore.  Anyway, here's the message below from VMWARE about a hardware fan issue but I don't understand why I can't any event logs or diagnostics to reveal a fan error too.  I ripped out the chassis panel and all four fans are blowing hard, no signs of slowing down.  I really don't this type of corruption to happen again causing catastrophic to their most application in the environment.  Any ideas of what needs replacing?  Is this masking a motherboard issue?   I am purchasing 4 fans that are not HP as a start. HP stopped selling for this particular Proliant unit.

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Dr. KlahnPrincipal Software EngineerCommented:
It's hard to tell without being there and looking at the unit, but the fan reading of "0 unspecified" suggests any of the following:

  • The fan in question (whatever one it might be) has failed and is not rotating
  • The fan's tachometer lead has broken
  • The tachometer circuitry in the fan has failed
  • A non-tachometer fan has been installed where there should be a tachometer fan
  • VMware is misconfigured and reading fan info from fan monitoring hardware that is not present
  • VMware is misconfigured and reading fan info for a fan that was never installed

Since VMware is not telling you which fan, it could be a CPU fan, a case fan or a graphics card fan.  Further, the presence of monitoring hardware on the motherboard capable of monitoring N fans does not require N fans to be installed!  Software looking at monitoring hardware must be configurable to know which inputs are valid and which are not if it is going to produce sensible alerts, and good monitoring software should be configurable to report "Center case fan on rear panel" rather than "Fan 4."

I suggest getting a copy of Open Hardware Monitor, check all the fans and their speeds, and see if that suggests anything.  It reports on just about everything installed in a system that (a) is monitored and (b) can fail.

The image below is Open Hardware Monitor, taken from a recent discussion on a thermal issue.

Open Hardware Monitor
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Are you using the OEM HPE version of ESXi ?

Firmware issue, or phantom errors in ESXi.
snoopaloopAuthor Commented:
How would I check for HPR version of ESXi?   I installed the open Monitor.  I don't see the fans.  WHat's next?

Get a highly available system for cyber protection

The Acronis SDI Appliance is a new plug-n-play solution with pre-configured Acronis Software-Defined Infrastructure software that gives service providers and enterprises ready access to a fault-tolerant system, which combines universal storage and high-performance virtualization.

Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
What did you install ?

in the Summary Page it might state OEM
snoopaloopAuthor Commented:
I installed the open monitor program.  I'm not sure how to proceed after that.
snoopaloopAuthor Commented:
I ended up adding an exhaust fan to the server.  So far so good.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Server Hardware

From novice to tech pro — start learning today.