Diagnosing mystery fan failure

snoopaloop
snoopaloop used Ask the Experts™
on
VMware displayed fan hardware error message but HP Proliant never ever shows any diagnostics errors when running the full hardware diagnostic at boot.  We have had periodic issues of server shutdowns a couple times a year w/ again no diagnostics errors but this particular one caused a couple virtual machine to get really corrupt to the point we had to perform a Veeam restore.  Anyway, here's the message below from VMWARE about a hardware fan issue but I don't understand why I can't any event logs or diagnostics to reveal a fan error too.  I ripped out the chassis panel and all four fans are blowing hard, no signs of slowing down.  I really don't this type of corruption to happen again causing catastrophic to their most application in the environment.  Any ideas of what needs replacing?  Is this masking a motherboard issue?   I am purchasing 4 fans that are not HP as a start. HP stopped selling for this particular Proliant unit.


1547573535989.PNG
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Dr. KlahnPrincipal Software Engineer

Commented:
It's hard to tell without being there and looking at the unit, but the fan reading of "0 unspecified" suggests any of the following:

  • The fan in question (whatever one it might be) has failed and is not rotating
  • The fan's tachometer lead has broken
  • The tachometer circuitry in the fan has failed
  • A non-tachometer fan has been installed where there should be a tachometer fan
  • VMware is misconfigured and reading fan info from fan monitoring hardware that is not present
  • VMware is misconfigured and reading fan info for a fan that was never installed

Since VMware is not telling you which fan, it could be a CPU fan, a case fan or a graphics card fan.  Further, the presence of monitoring hardware on the motherboard capable of monitoring N fans does not require N fans to be installed!  Software looking at monitoring hardware must be configurable to know which inputs are valid and which are not if it is going to produce sensible alerts, and good monitoring software should be configurable to report "Center case fan on rear panel" rather than "Fan 4."

I suggest getting a copy of Open Hardware Monitor, check all the fans and their speeds, and see if that suggests anything.  It reports on just about everything installed in a system that (a) is monitored and (b) can fail.

The image below is Open Hardware Monitor, taken from a recent discussion on a thermal issue.

Open Hardware Monitor
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
Are you using the OEM HPE version of ESXi ?

Firmware issue, or phantom errors in ESXi.

Author

Commented:
How would I check for HPR version of ESXi?   I installed the open Monitor.  I don't see the fans.  WHat's next?

Capture.PNG
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
What did you install ?

in the Summary Page it might state OEM

Author

Commented:
I installed the open monitor program.  I'm not sure how to proceed after that.
I ended up adding an exhaust fan to the server.  So far so good.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial