Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 445
  • Last Modified:

Dell R610 - "Current capacity of the battery is below threshold" and unexpected reboot

What to do?
Change battery?
What battery - for raid controller?

event logs
0
dr_fred
Asked:
dr_fred
  • 5
  • 5
  • 3
  • +2
11 Solutions
 
Scott SilvaNetwork AdministratorCommented:
It does point to the battery on the PERC controller, but I have never seen that force a reboot...

You can either change battery or turn off caching on the controller...
2
 
Rich WeisslerProfessional Troublemaker^h^h^h^h^hshooterCommented:
(Agree with Scott Silva - If you decide to run the PERC controller without a good battery, you would want to disabled WRITE cache (or Writeback cache) on the controller.  You don't need the battery if the cache is only storing reads from the drive(s).

Looking at the timing:
16:07:24 -- system recovered from a crash.
16:08:50 -- Critical firmware error identified in PERC Controller
16:08:54 -- Battery identified low
16:08:55 -- The controller switched from WB (Write Back) to WT (Write Through)

The controller has already made the necessary cache changes, based on the condition of the battery.  The battery could be a symptom, or unrelated to the underlying problem, which appears to be a firmware error.  There is a CHANCE that the underlying problem is a known, fixed issue which DELL has already fixed in a firmware upgrade, but there is also a CHANCE that you have a hardware problem-- and attempting to flash the firmware could fail.

What version firmware do you currently have on that PERC controller?  (Has SUU been run on there somewhat regularly to keep it up to date?)
2
 
dr_fredAuthor Commented:
Will this happen again on the next battery check?
 (90 days I heard)
0
Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

 
Gerald ConnollyCommented:
I think it checks more often than that, and of course you will get this error if you do not replace the battery.

With the RAID controller running in pass through mode, you should be seeing a big hit in performance, so you should be replacing the battery sooner rather than later.
2
 
dr_fredAuthor Commented:
With the RAID controller running in pass through mode, you should be seeing a big hit in performance, so you should be replacing the battery sooner rather than later
.

Ok, but will the server reboot now when it "fixed" the problem automatically?

Decommission is not too far away for this server.
0
 
andyalderCommented:
Dell haven't documented it but VMware have, it's the PERC firmware.

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2011987

Note that "line 156" is the same in your error and the VMware document, although the OSs are completely different the line number is the same because the OS error log is just reporting what the firmware on the controller had in its internal log and the RAID controller error log is the same whatever the OS.

It may well be the battery state change that's causing the error, it's not meant to crash the server but unexpected things can happen if firmware is written wrong.
2
 
Rich WeisslerProfessional Troublemaker^h^h^h^h^hshooterCommented:
> "[...]big hit in performance[...]"
The value of 'big' in this case depends on how the system is used.  It absolutely will degrade performance, but there are use cases in which the degradation will be slight enough not to be noticed.  That said, I wouldn't focus on the battery in this case at the moment.

> Ok, but will the server reboot now when it "fixed" the problem automatically?
Steps were taken to prevent data loss caused by a failing battery in the PERC controller.  But full props to andyalder -- I'd say 95%+ that you need a firmware update on the PERC controller to prevent server reboots caused by this problem.
After you fix that, see how your performance is.  If it is acceptable, you might be able to limp along without replacing the battery.  (But if you have a write-heavy workload, and/or performance is unacceptable, investing in a replacement battery might be justified.  It's also not impossible that the existing firmware is preventing the battery from recharging correctly... answer stays the same though: fix the firmware first.)
2
 
dr_fredAuthor Commented:
How come this did not happen in the passed?
Is it that the outdated firmware can't cope with a bad battery?
0
 
andyalderCommented:
I don't have the source code but normally such bugs are a combination of things, for example if battery state transitions and it's Friday and if IOPS are high then crash. dlethe may be able to isolate the exact combination for it to crash as he's worked on the firmware.
1
 
dr_fredAuthor Commented:
Wow, I mean the fact that it happened at first when the server was five years old doesen't speak against that a firmware update will solve it?
0
 
Rich WeisslerProfessional Troublemaker^h^h^h^h^hshooterCommented:
As andyalder indicated, some of these bugs only manifest when specific sets of conditions are present.  The fact that you haven't experienced this particular bug in the past five years does not decrease the probability that the underlying cause is a firmware bug.  It isn't impossible that it might not manifest again for five years.  (But it's also not impossible that you'll experience it four times in the next three days...)
1
 
Rich WeisslerProfessional Troublemaker^h^h^h^h^hshooterCommented:
The upshot is -- the firmware flaw is the cause of the reboot.  The message about the low battery is 'normal' after a reboot, because the battery condition is checked immediately after a reboot.  It is possible that the low battery condition factors into the conditions which is causing the firmware to hit the bug and crash, but the the battery condition itself shouldn't be the problem.
1
 
andyalderCommented:
Battery may have been bad for years, as you say it will always log that condition on bootup whether related to why it crashed or not.
2
 
Gerald ConnollyCommented:
Hey Rich, going from cached mode to pass-through mode nearly always causes a big hit on performance, even if you read/write ratio is low, then even reads have to go to disk rather than be satifised out of cache, so slow!
1
 
Rich WeisslerProfessional Troublemaker^h^h^h^h^hshooterCommented:
WT (Write Thru) continues to use cache for read operations.  The avoid data loss from dirty cache buffers which aren't protected by a battery, write operations go to disk.  (I was working from memory... what I should have said was that the cache write policy is determined separately from the read policy.  In this case, the changes made as the battery is detected as degraded or failed are only to the write policy.  The read policy is unchanged.  More information about the cache policies for Dell's RAID controllers is here.)
1
 
dr_fredAuthor Commented:
Thx guys, the www is amazing :)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 5
  • 5
  • 3
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now