Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Dell R610 - "Current capacity of the battery is below threshold" and unexpected reboot

Posted on 2016-11-03
16
Medium Priority
?
293 Views
Last Modified: 2016-11-22
What to do?
Change battery?
What battery - for raid controller?

event logs
0
Comment
Question by:dr_fred
  • 5
  • 5
  • 3
  • +2
16 Comments
 
LVL 11

Assisted Solution

by:Scott Silva
Scott Silva earned 332 total points
ID: 41872606
It does point to the battery on the PERC controller, but I have never seen that force a reboot...

You can either change battery or turn off caching on the controller...
2
 
LVL 30

Assisted Solution

by:Rich Weissler
Rich Weissler earned 1004 total points
ID: 41873858
(Agree with Scott Silva - If you decide to run the PERC controller without a good battery, you would want to disabled WRITE cache (or Writeback cache) on the controller.  You don't need the battery if the cache is only storing reads from the drive(s).

Looking at the timing:
16:07:24 -- system recovered from a crash.
16:08:50 -- Critical firmware error identified in PERC Controller
16:08:54 -- Battery identified low
16:08:55 -- The controller switched from WB (Write Back) to WT (Write Through)

The controller has already made the necessary cache changes, based on the condition of the battery.  The battery could be a symptom, or unrelated to the underlying problem, which appears to be a firmware error.  There is a CHANCE that the underlying problem is a known, fixed issue which DELL has already fixed in a firmware upgrade, but there is also a CHANCE that you have a hardware problem-- and attempting to flash the firmware could fail.

What version firmware do you currently have on that PERC controller?  (Has SUU been run on there somewhat regularly to keep it up to date?)
2
 

Author Comment

by:dr_fred
ID: 41873957
Will this happen again on the next battery check?
 (90 days I heard)
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 17

Assisted Solution

by:Gerald Connolly
Gerald Connolly earned 332 total points
ID: 41873997
I think it checks more often than that, and of course you will get this error if you do not replace the battery.

With the RAID controller running in pass through mode, you should be seeing a big hit in performance, so you should be replacing the battery sooner rather than later.
2
 

Author Comment

by:dr_fred
ID: 41874009
With the RAID controller running in pass through mode, you should be seeing a big hit in performance, so you should be replacing the battery sooner rather than later
.

Ok, but will the server reboot now when it "fixed" the problem automatically?

Decommission is not too far away for this server.
0
 
LVL 56

Assisted Solution

by:andyalder
andyalder earned 332 total points
ID: 41874018
Dell haven't documented it but VMware have, it's the PERC firmware.

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2011987

Note that "line 156" is the same in your error and the VMware document, although the OSs are completely different the line number is the same because the OS error log is just reporting what the firmware on the controller had in its internal log and the RAID controller error log is the same whatever the OS.

It may well be the battery state change that's causing the error, it's not meant to crash the server but unexpected things can happen if firmware is written wrong.
2
 
LVL 30

Accepted Solution

by:
Rich Weissler earned 1004 total points
ID: 41874096
> "[...]big hit in performance[...]"
The value of 'big' in this case depends on how the system is used.  It absolutely will degrade performance, but there are use cases in which the degradation will be slight enough not to be noticed.  That said, I wouldn't focus on the battery in this case at the moment.

> Ok, but will the server reboot now when it "fixed" the problem automatically?
Steps were taken to prevent data loss caused by a failing battery in the PERC controller.  But full props to andyalder -- I'd say 95%+ that you need a firmware update on the PERC controller to prevent server reboots caused by this problem.
After you fix that, see how your performance is.  If it is acceptable, you might be able to limp along without replacing the battery.  (But if you have a write-heavy workload, and/or performance is unacceptable, investing in a replacement battery might be justified.  It's also not impossible that the existing firmware is preventing the battery from recharging correctly... answer stays the same though: fix the firmware first.)
2
 

Author Comment

by:dr_fred
ID: 41874206
How come this did not happen in the passed?
Is it that the outdated firmware can't cope with a bad battery?
0
 
LVL 56

Assisted Solution

by:andyalder
andyalder earned 332 total points
ID: 41874254
I don't have the source code but normally such bugs are a combination of things, for example if battery state transitions and it's Friday and if IOPS are high then crash. dlethe may be able to isolate the exact combination for it to crash as he's worked on the firmware.
1
 

Author Comment

by:dr_fred
ID: 41874283
Wow, I mean the fact that it happened at first when the server was five years old doesen't speak against that a firmware update will solve it?
0
 
LVL 30

Assisted Solution

by:Rich Weissler
Rich Weissler earned 1004 total points
ID: 41874337
As andyalder indicated, some of these bugs only manifest when specific sets of conditions are present.  The fact that you haven't experienced this particular bug in the past five years does not decrease the probability that the underlying cause is a firmware bug.  It isn't impossible that it might not manifest again for five years.  (But it's also not impossible that you'll experience it four times in the next three days...)
1
 
LVL 30

Assisted Solution

by:Rich Weissler
Rich Weissler earned 1004 total points
ID: 41874344
The upshot is -- the firmware flaw is the cause of the reboot.  The message about the low battery is 'normal' after a reboot, because the battery condition is checked immediately after a reboot.  It is possible that the low battery condition factors into the conditions which is causing the firmware to hit the bug and crash, but the the battery condition itself shouldn't be the problem.
1
 
LVL 56

Assisted Solution

by:andyalder
andyalder earned 332 total points
ID: 41874615
Battery may have been bad for years, as you say it will always log that condition on bootup whether related to why it crashed or not.
2
 
LVL 17

Assisted Solution

by:Gerald Connolly
Gerald Connolly earned 332 total points
ID: 41875436
Hey Rich, going from cached mode to pass-through mode nearly always causes a big hit on performance, even if you read/write ratio is low, then even reads have to go to disk rather than be satifised out of cache, so slow!
1
 
LVL 30

Assisted Solution

by:Rich Weissler
Rich Weissler earned 1004 total points
ID: 41875822
WT (Write Thru) continues to use cache for read operations.  The avoid data loss from dirty cache buffers which aren't protected by a battery, write operations go to disk.  (I was working from memory... what I should have said was that the cache write policy is determined separately from the read policy.  In this case, the changes made as the battery is detected as degraded or failed are only to the write policy.  The read policy is unchanged.  More information about the cache policies for Dell's RAID controllers is here.)
1
 

Author Closing Comment

by:dr_fred
ID: 41883382
Thx guys, the www is amazing :)
0

Featured Post

Creating Active Directory Users from a Text File

If your organization has a need to mass-create AD user accounts, watch this video to see how its done without the need for scripting or other unnecessary complexities.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article provides a convenient collection of links to Microsoft provided Security Patches for operating systems that have reached their End of Life support cycle. Included operating systems covered by this article are Windows XP,  Windows Server…
IF you are either unfamiliar with rootkits, or want to know more about them, read on ....
In this video, viewers will be given step by step instructions on adjusting mouse, pointer and cursor visibility in Microsoft Windows 10. The video seeks to educate those who are struggling with the new Windows 10 Graphical User Interface. Change Cu…
If you’ve ever visited a web page and noticed a cool font that you really liked the look of, but couldn’t figure out which font it was so that you could use it for your own work, then this video is for you! In this Micro Tutorial, you'll learn yo…

963 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question