Solved

Dell R610 - "Current capacity of the battery is below threshold" and unexpected reboot

Posted on 2016-11-03
16
146 Views
Last Modified: 2016-11-22
What to do?
Change battery?
What battery - for raid controller?

event logs
0
Comment
Question by:dr_fred
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
  • 3
  • +2
16 Comments
 
LVL 10

Assisted Solution

by:Scott Silva
Scott Silva earned 83 total points
ID: 41872606
It does point to the battery on the PERC controller, but I have never seen that force a reboot...

You can either change battery or turn off caching on the controller...
2
 
LVL 30

Assisted Solution

by:Rich Weissler
Rich Weissler earned 251 total points
ID: 41873858
(Agree with Scott Silva - If you decide to run the PERC controller without a good battery, you would want to disabled WRITE cache (or Writeback cache) on the controller.  You don't need the battery if the cache is only storing reads from the drive(s).

Looking at the timing:
16:07:24 -- system recovered from a crash.
16:08:50 -- Critical firmware error identified in PERC Controller
16:08:54 -- Battery identified low
16:08:55 -- The controller switched from WB (Write Back) to WT (Write Through)

The controller has already made the necessary cache changes, based on the condition of the battery.  The battery could be a symptom, or unrelated to the underlying problem, which appears to be a firmware error.  There is a CHANCE that the underlying problem is a known, fixed issue which DELL has already fixed in a firmware upgrade, but there is also a CHANCE that you have a hardware problem-- and attempting to flash the firmware could fail.

What version firmware do you currently have on that PERC controller?  (Has SUU been run on there somewhat regularly to keep it up to date?)
2
 

Author Comment

by:dr_fred
ID: 41873957
Will this happen again on the next battery check?
 (90 days I heard)
0
Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

 
LVL 17

Assisted Solution

by:Gerald Connolly
Gerald Connolly earned 83 total points
ID: 41873997
I think it checks more often than that, and of course you will get this error if you do not replace the battery.

With the RAID controller running in pass through mode, you should be seeing a big hit in performance, so you should be replacing the battery sooner rather than later.
2
 

Author Comment

by:dr_fred
ID: 41874009
With the RAID controller running in pass through mode, you should be seeing a big hit in performance, so you should be replacing the battery sooner rather than later
.

Ok, but will the server reboot now when it "fixed" the problem automatically?

Decommission is not too far away for this server.
0
 
LVL 55

Assisted Solution

by:andyalder
andyalder earned 83 total points
ID: 41874018
Dell haven't documented it but VMware have, it's the PERC firmware.

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2011987

Note that "line 156" is the same in your error and the VMware document, although the OSs are completely different the line number is the same because the OS error log is just reporting what the firmware on the controller had in its internal log and the RAID controller error log is the same whatever the OS.

It may well be the battery state change that's causing the error, it's not meant to crash the server but unexpected things can happen if firmware is written wrong.
2
 
LVL 30

Accepted Solution

by:
Rich Weissler earned 251 total points
ID: 41874096
> "[...]big hit in performance[...]"
The value of 'big' in this case depends on how the system is used.  It absolutely will degrade performance, but there are use cases in which the degradation will be slight enough not to be noticed.  That said, I wouldn't focus on the battery in this case at the moment.

> Ok, but will the server reboot now when it "fixed" the problem automatically?
Steps were taken to prevent data loss caused by a failing battery in the PERC controller.  But full props to andyalder -- I'd say 95%+ that you need a firmware update on the PERC controller to prevent server reboots caused by this problem.
After you fix that, see how your performance is.  If it is acceptable, you might be able to limp along without replacing the battery.  (But if you have a write-heavy workload, and/or performance is unacceptable, investing in a replacement battery might be justified.  It's also not impossible that the existing firmware is preventing the battery from recharging correctly... answer stays the same though: fix the firmware first.)
2
 

Author Comment

by:dr_fred
ID: 41874206
How come this did not happen in the passed?
Is it that the outdated firmware can't cope with a bad battery?
0
 
LVL 55

Assisted Solution

by:andyalder
andyalder earned 83 total points
ID: 41874254
I don't have the source code but normally such bugs are a combination of things, for example if battery state transitions and it's Friday and if IOPS are high then crash. dlethe may be able to isolate the exact combination for it to crash as he's worked on the firmware.
1
 

Author Comment

by:dr_fred
ID: 41874283
Wow, I mean the fact that it happened at first when the server was five years old doesen't speak against that a firmware update will solve it?
0
 
LVL 30

Assisted Solution

by:Rich Weissler
Rich Weissler earned 251 total points
ID: 41874337
As andyalder indicated, some of these bugs only manifest when specific sets of conditions are present.  The fact that you haven't experienced this particular bug in the past five years does not decrease the probability that the underlying cause is a firmware bug.  It isn't impossible that it might not manifest again for five years.  (But it's also not impossible that you'll experience it four times in the next three days...)
1
 
LVL 30

Assisted Solution

by:Rich Weissler
Rich Weissler earned 251 total points
ID: 41874344
The upshot is -- the firmware flaw is the cause of the reboot.  The message about the low battery is 'normal' after a reboot, because the battery condition is checked immediately after a reboot.  It is possible that the low battery condition factors into the conditions which is causing the firmware to hit the bug and crash, but the the battery condition itself shouldn't be the problem.
1
 
LVL 55

Assisted Solution

by:andyalder
andyalder earned 83 total points
ID: 41874615
Battery may have been bad for years, as you say it will always log that condition on bootup whether related to why it crashed or not.
2
 
LVL 17

Assisted Solution

by:Gerald Connolly
Gerald Connolly earned 83 total points
ID: 41875436
Hey Rich, going from cached mode to pass-through mode nearly always causes a big hit on performance, even if you read/write ratio is low, then even reads have to go to disk rather than be satifised out of cache, so slow!
1
 
LVL 30

Assisted Solution

by:Rich Weissler
Rich Weissler earned 251 total points
ID: 41875822
WT (Write Thru) continues to use cache for read operations.  The avoid data loss from dirty cache buffers which aren't protected by a battery, write operations go to disk.  (I was working from memory... what I should have said was that the cache write policy is determined separately from the read policy.  In this case, the changes made as the battery is detected as degraded or failed are only to the write policy.  The read policy is unchanged.  More information about the cache policies for Dell's RAID controllers is here.)
1
 

Author Closing Comment

by:dr_fred
ID: 41883382
Thx guys, the www is amazing :)
0

Featured Post

What Is Transaction Monitoring and who needs it?

Synthetic Transaction Monitoring that you need for the day to day, which ensures your business website keeps running optimally, and that there is no downtime to impact your customer experience.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Background Information Recently I have fixed file server permission issues for one of my client. The client has 1800 users and one Windows Server 2008 R2 domain joined file server with 12 TB of data, 250+ shared folders and the folder structure i…
Ever visit a website where you spotted a really cool looking Font, yet couldn't figure out which font family it belonged to, or how to get a copy of it for your own use? This article explains the process of doing exactly that, as well as showing how…
In this video, viewers are given an introduction to using the Windows 10 Snipping Tool, how to quickly locate it when it's needed and also how make it always available with a single click of a mouse button, by pinning it to the Desktop Task Bar. Int…
If you’ve ever visited a web page and noticed a cool font that you really liked the look of, but couldn’t figure out which font it was so that you could use it for your own work, then this video is for you! In this Micro Tutorial, you'll learn yo…

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question