Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Dell R610 - "Current capacity of the battery is below threshold" and unexpected reboot

Posted on 2016-11-03
16
Medium Priority
?
352 Views
Last Modified: 2016-11-22
What to do?
Change battery?
What battery - for raid controller?

event logs
0
Comment
Question by:dr_fred
  • 5
  • 5
  • 3
  • +2
16 Comments
 
LVL 11

Assisted Solution

by:Scott Silva
Scott Silva earned 332 total points
ID: 41872606
It does point to the battery on the PERC controller, but I have never seen that force a reboot...

You can either change battery or turn off caching on the controller...
2
 
LVL 31

Assisted Solution

by:Rich Weissler
Rich Weissler earned 1004 total points
ID: 41873858
(Agree with Scott Silva - If you decide to run the PERC controller without a good battery, you would want to disabled WRITE cache (or Writeback cache) on the controller.  You don't need the battery if the cache is only storing reads from the drive(s).

Looking at the timing:
16:07:24 -- system recovered from a crash.
16:08:50 -- Critical firmware error identified in PERC Controller
16:08:54 -- Battery identified low
16:08:55 -- The controller switched from WB (Write Back) to WT (Write Through)

The controller has already made the necessary cache changes, based on the condition of the battery.  The battery could be a symptom, or unrelated to the underlying problem, which appears to be a firmware error.  There is a CHANCE that the underlying problem is a known, fixed issue which DELL has already fixed in a firmware upgrade, but there is also a CHANCE that you have a hardware problem-- and attempting to flash the firmware could fail.

What version firmware do you currently have on that PERC controller?  (Has SUU been run on there somewhat regularly to keep it up to date?)
2
 

Author Comment

by:dr_fred
ID: 41873957
Will this happen again on the next battery check?
 (90 days I heard)
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 17

Assisted Solution

by:Gerald Connolly
Gerald Connolly earned 332 total points
ID: 41873997
I think it checks more often than that, and of course you will get this error if you do not replace the battery.

With the RAID controller running in pass through mode, you should be seeing a big hit in performance, so you should be replacing the battery sooner rather than later.
2
 

Author Comment

by:dr_fred
ID: 41874009
With the RAID controller running in pass through mode, you should be seeing a big hit in performance, so you should be replacing the battery sooner rather than later
.

Ok, but will the server reboot now when it "fixed" the problem automatically?

Decommission is not too far away for this server.
0
 
LVL 56

Assisted Solution

by:Handy Holder
Handy Holder earned 332 total points
ID: 41874018
Dell haven't documented it but VMware have, it's the PERC firmware.

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2011987

Note that "line 156" is the same in your error and the VMware document, although the OSs are completely different the line number is the same because the OS error log is just reporting what the firmware on the controller had in its internal log and the RAID controller error log is the same whatever the OS.

It may well be the battery state change that's causing the error, it's not meant to crash the server but unexpected things can happen if firmware is written wrong.
2
 
LVL 31

Accepted Solution

by:
Rich Weissler earned 1004 total points
ID: 41874096
> "[...]big hit in performance[...]"
The value of 'big' in this case depends on how the system is used.  It absolutely will degrade performance, but there are use cases in which the degradation will be slight enough not to be noticed.  That said, I wouldn't focus on the battery in this case at the moment.

> Ok, but will the server reboot now when it "fixed" the problem automatically?
Steps were taken to prevent data loss caused by a failing battery in the PERC controller.  But full props to andyalder -- I'd say 95%+ that you need a firmware update on the PERC controller to prevent server reboots caused by this problem.
After you fix that, see how your performance is.  If it is acceptable, you might be able to limp along without replacing the battery.  (But if you have a write-heavy workload, and/or performance is unacceptable, investing in a replacement battery might be justified.  It's also not impossible that the existing firmware is preventing the battery from recharging correctly... answer stays the same though: fix the firmware first.)
2
 

Author Comment

by:dr_fred
ID: 41874206
How come this did not happen in the passed?
Is it that the outdated firmware can't cope with a bad battery?
0
 
LVL 56

Assisted Solution

by:Handy Holder
Handy Holder earned 332 total points
ID: 41874254
I don't have the source code but normally such bugs are a combination of things, for example if battery state transitions and it's Friday and if IOPS are high then crash. dlethe may be able to isolate the exact combination for it to crash as he's worked on the firmware.
1
 

Author Comment

by:dr_fred
ID: 41874283
Wow, I mean the fact that it happened at first when the server was five years old doesen't speak against that a firmware update will solve it?
0
 
LVL 31

Assisted Solution

by:Rich Weissler
Rich Weissler earned 1004 total points
ID: 41874337
As andyalder indicated, some of these bugs only manifest when specific sets of conditions are present.  The fact that you haven't experienced this particular bug in the past five years does not decrease the probability that the underlying cause is a firmware bug.  It isn't impossible that it might not manifest again for five years.  (But it's also not impossible that you'll experience it four times in the next three days...)
1
 
LVL 31

Assisted Solution

by:Rich Weissler
Rich Weissler earned 1004 total points
ID: 41874344
The upshot is -- the firmware flaw is the cause of the reboot.  The message about the low battery is 'normal' after a reboot, because the battery condition is checked immediately after a reboot.  It is possible that the low battery condition factors into the conditions which is causing the firmware to hit the bug and crash, but the the battery condition itself shouldn't be the problem.
1
 
LVL 56

Assisted Solution

by:Handy Holder
Handy Holder earned 332 total points
ID: 41874615
Battery may have been bad for years, as you say it will always log that condition on bootup whether related to why it crashed or not.
2
 
LVL 17

Assisted Solution

by:Gerald Connolly
Gerald Connolly earned 332 total points
ID: 41875436
Hey Rich, going from cached mode to pass-through mode nearly always causes a big hit on performance, even if you read/write ratio is low, then even reads have to go to disk rather than be satifised out of cache, so slow!
1
 
LVL 31

Assisted Solution

by:Rich Weissler
Rich Weissler earned 1004 total points
ID: 41875822
WT (Write Thru) continues to use cache for read operations.  The avoid data loss from dirty cache buffers which aren't protected by a battery, write operations go to disk.  (I was working from memory... what I should have said was that the cache write policy is determined separately from the read policy.  In this case, the changes made as the battery is detected as degraded or failed are only to the write policy.  The read policy is unchanged.  More information about the cache policies for Dell's RAID controllers is here.)
1
 

Author Closing Comment

by:dr_fred
ID: 41883382
Thx guys, the www is amazing :)
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This is an update to some code that someone else posted on Experts Exchange. It is an alternate approach, I think a little easier to use, & makes sure that things like the Task Bar will update.
In this tutorial, we’re going to learn how to convert Youtube to mp3 for Free. We'll show you how easy it is to make an mp3 from your video clips so that you can enjoy them offline.
This tutorial will walk an individual through setting the global and backup job media overwrite and protection periods in Backup Exec 2012. Log onto the Backup Exec Central Administration Server. Examine the services. If all or most of them are stop…
In this video, viewers are given an introduction to using the Windows 10 Snipping Tool, how to quickly locate it when it's needed and also how make it always available with a single click of a mouse button, by pinning it to the Desktop Task Bar. Int…

564 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question