Solved

Dell R610 - "Current capacity of the battery is below threshold" and unexpected reboot

Posted on 2016-11-03
16
45 Views
Last Modified: 2016-11-22
What to do?
Change battery?
What battery - for raid controller?

event logs
0
Comment
Question by:dr_fred
  • 5
  • 5
  • 3
  • +2
16 Comments
 
LVL 9

Assisted Solution

by:Scott Silva
Scott Silva earned 83 total points
ID: 41872606
It does point to the battery on the PERC controller, but I have never seen that force a reboot...

You can either change battery or turn off caching on the controller...
2
 
LVL 29

Assisted Solution

by:Rich Weissler
Rich Weissler earned 251 total points
ID: 41873858
(Agree with Scott Silva - If you decide to run the PERC controller without a good battery, you would want to disabled WRITE cache (or Writeback cache) on the controller.  You don't need the battery if the cache is only storing reads from the drive(s).

Looking at the timing:
16:07:24 -- system recovered from a crash.
16:08:50 -- Critical firmware error identified in PERC Controller
16:08:54 -- Battery identified low
16:08:55 -- The controller switched from WB (Write Back) to WT (Write Through)

The controller has already made the necessary cache changes, based on the condition of the battery.  The battery could be a symptom, or unrelated to the underlying problem, which appears to be a firmware error.  There is a CHANCE that the underlying problem is a known, fixed issue which DELL has already fixed in a firmware upgrade, but there is also a CHANCE that you have a hardware problem-- and attempting to flash the firmware could fail.

What version firmware do you currently have on that PERC controller?  (Has SUU been run on there somewhat regularly to keep it up to date?)
2
 

Author Comment

by:dr_fred
ID: 41873957
Will this happen again on the next battery check?
 (90 days I heard)
0
 
LVL 16

Assisted Solution

by:Gerald Connolly
Gerald Connolly earned 83 total points
ID: 41873997
I think it checks more often than that, and of course you will get this error if you do not replace the battery.

With the RAID controller running in pass through mode, you should be seeing a big hit in performance, so you should be replacing the battery sooner rather than later.
2
 

Author Comment

by:dr_fred
ID: 41874009
With the RAID controller running in pass through mode, you should be seeing a big hit in performance, so you should be replacing the battery sooner rather than later
.

Ok, but will the server reboot now when it "fixed" the problem automatically?

Decommission is not too far away for this server.
0
 
LVL 55

Assisted Solution

by:andyalder
andyalder earned 83 total points
ID: 41874018
Dell haven't documented it but VMware have, it's the PERC firmware.

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2011987

Note that "line 156" is the same in your error and the VMware document, although the OSs are completely different the line number is the same because the OS error log is just reporting what the firmware on the controller had in its internal log and the RAID controller error log is the same whatever the OS.

It may well be the battery state change that's causing the error, it's not meant to crash the server but unexpected things can happen if firmware is written wrong.
2
 
LVL 29

Accepted Solution

by:
Rich Weissler earned 251 total points
ID: 41874096
> "[...]big hit in performance[...]"
The value of 'big' in this case depends on how the system is used.  It absolutely will degrade performance, but there are use cases in which the degradation will be slight enough not to be noticed.  That said, I wouldn't focus on the battery in this case at the moment.

> Ok, but will the server reboot now when it "fixed" the problem automatically?
Steps were taken to prevent data loss caused by a failing battery in the PERC controller.  But full props to andyalder -- I'd say 95%+ that you need a firmware update on the PERC controller to prevent server reboots caused by this problem.
After you fix that, see how your performance is.  If it is acceptable, you might be able to limp along without replacing the battery.  (But if you have a write-heavy workload, and/or performance is unacceptable, investing in a replacement battery might be justified.  It's also not impossible that the existing firmware is preventing the battery from recharging correctly... answer stays the same though: fix the firmware first.)
2
 

Author Comment

by:dr_fred
ID: 41874206
How come this did not happen in the passed?
Is it that the outdated firmware can't cope with a bad battery?
0
Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

 
LVL 55

Assisted Solution

by:andyalder
andyalder earned 83 total points
ID: 41874254
I don't have the source code but normally such bugs are a combination of things, for example if battery state transitions and it's Friday and if IOPS are high then crash. dlethe may be able to isolate the exact combination for it to crash as he's worked on the firmware.
1
 

Author Comment

by:dr_fred
ID: 41874283
Wow, I mean the fact that it happened at first when the server was five years old doesen't speak against that a firmware update will solve it?
0
 
LVL 29

Assisted Solution

by:Rich Weissler
Rich Weissler earned 251 total points
ID: 41874337
As andyalder indicated, some of these bugs only manifest when specific sets of conditions are present.  The fact that you haven't experienced this particular bug in the past five years does not decrease the probability that the underlying cause is a firmware bug.  It isn't impossible that it might not manifest again for five years.  (But it's also not impossible that you'll experience it four times in the next three days...)
1
 
LVL 29

Assisted Solution

by:Rich Weissler
Rich Weissler earned 251 total points
ID: 41874344
The upshot is -- the firmware flaw is the cause of the reboot.  The message about the low battery is 'normal' after a reboot, because the battery condition is checked immediately after a reboot.  It is possible that the low battery condition factors into the conditions which is causing the firmware to hit the bug and crash, but the the battery condition itself shouldn't be the problem.
1
 
LVL 55

Assisted Solution

by:andyalder
andyalder earned 83 total points
ID: 41874615
Battery may have been bad for years, as you say it will always log that condition on bootup whether related to why it crashed or not.
2
 
LVL 16

Assisted Solution

by:Gerald Connolly
Gerald Connolly earned 83 total points
ID: 41875436
Hey Rich, going from cached mode to pass-through mode nearly always causes a big hit on performance, even if you read/write ratio is low, then even reads have to go to disk rather than be satifised out of cache, so slow!
1
 
LVL 29

Assisted Solution

by:Rich Weissler
Rich Weissler earned 251 total points
ID: 41875822
WT (Write Thru) continues to use cache for read operations.  The avoid data loss from dirty cache buffers which aren't protected by a battery, write operations go to disk.  (I was working from memory... what I should have said was that the cache write policy is determined separately from the read policy.  In this case, the changes made as the battery is detected as degraded or failed are only to the write policy.  The read policy is unchanged.  More information about the cache policies for Dell's RAID controllers is here.)
1
 

Author Closing Comment

by:dr_fred
ID: 41883382
Thx guys, the www is amazing :)
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
If you need to start windows update installation remotely or as a scheduled task you will find this very helpful.
To efficiently enable the rotation of USB drives for backups, storage pools need to be created. This way no matter which USB drive is installed, the backups will successfully write without any administrative intervention. Multiple USB devices need t…
This tutorial will show how to configure a single USB drive with a separate folder for each day of the week. This will allow each of the backups to be kept separate preventing the previous day’s backup from being overwritten. The USB drive must be s…

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now