Link to home
Start Free TrialLog in
Avatar of grexx
grexx

asked on

APC 700 Smart UPS causes crash every other week?

Three of our servers were connected to a APC 700 Smart UPS power supply. Those three servers crash every other Friday around 17.30. (During summertime, the crashes appeared at 18.30.) We have several other servers that are not powered via this ups, and which don't crash. Up till last Friday we never realized it could be the power supply, but this is too much of a coincidence. I've removed those three servers from the power supply to see if the crashes stop. Furthermore, I've moved one other server back to the ups. If that server crashes next week Friday, while the other three don't, then it's clear what the cause is.

Still I'm confused as to why this happens. The ups is about 4 years old and I don't believe the battery has ever been replaced. But if the battery is dead, then why doesn't it completely shut down, and why every other Friday does it stop supplying power, and then start again. Furthermore, the three servers don't crash at exactly the same time. The eventlogs (Windows 2000) show different times, sometimes more than 3 minutes apart. And there is no order in which they crash. Every server has at least once crashed first, according to the time in the eventlog.

For more info about these crashes see:  https://www.experts-exchange.com/questions/21181132/CMD-processes-crash-at-Friday-afternoon-each-other-week.html
ASKER CERTIFIED SOLUTION
Avatar of Dr-IP
Dr-IP

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I would agree with DR-IP.  Three servers on a lowely 700 I'm sure is overloading this pretty well.  4 years on a battery is probably stretching it too.  Those 700's only take a little RBC4 battery and that's pretty tiny.
Avatar of Dr-IP
Dr-IP

One thing I have found out about smaller batteries, is they don’t last anywhere as long as the bigger ones do. I deal with a lot of UPS’s, and have found on the bigger ones the batteries can easily go over five years, while smaller ones in the same building are complexly shot in just three years.  
The problem has to do with time and there are no warning indicaters...

APC 700 Smart UPS uses (can use) a program called POWERCHUTE which provides some advances UPS features through software.

If this thing was previously in a different configuration it's possible funtions/settings were put in place through software INSIDE the UPS that weren't "turned off" before you started using it.

~ That would explain why it happens on a schedule...

Why it crashes could be either a fault in the UPS that is irrelivant until whatever funtion occures, or it could BE the funtion to Reboot periodically.

PCBONEZ
"APC 700 Smart UPS uses (can use) a program called POWERCHUTE ...could BE the funtion to Reboot periodically."

That could be what is happening, but the minimum settable off time is 6 minutes in the PowerChute software, so it would be pretty obvious in the logs that the servers have been down for longer than it takes for them just to reboot. Also the chances that someone would have done that intentionally, as it’s highly unlikely to have happened by its self, and have forgotten they have done it is pretty low.  

I agree with Dr-IP.  I have many configured with PowerChute and I don't believe it's caused by it or previous settings.
Avatar of grexx

ASKER

I mailed APC Support about this issue, and they replied the following:

"It looks like the unit is trying to do the self test, it does it normally each 14 days and the unit is failing to do it."

I removed the ups from the server rack, and tested it with a desktop computer. After repowering the unit, I see the following leds (from left to right):
  - left side 5 leds for computer load (outgoing): 2 leds burning with one pc and monitor on it
  - left side 3 leds: only middle green led is burning (I suppose this is for 220v power)
  - right side 3 leds: none is burning
  - right side 5 leds: all 5 are burning
If I pull the power plug, all systems go down. The orange led flickers for a moment and then everything is dead.

So it seems clear that the battery is dead and should be replaced.
Dead Battery, so even a minor power sag could drop everything.  Most of them use the RBC4 which is about $46 from CDW.  I just replaced 5 of them in 650 and 700 units this week.
It’s probably a dead battery, but it also could be a failed inverter, so unless you pull out the battery and check it, or replace it you won’t know for sure. By the way, you can get the battery for it really cheap on eBay, it’s where I picked up the last one for my 650. But regardless of what you do with the 700, I think you should seriously consider upgrading to a bigger UPS for those servers.
Avatar of grexx

ASKER

Thanks for all your answers! I'll discuss the matter of replacing the UPS or buying a new battery. I'm not sure what's the best option at the moment. The former network administrator told me his only intention with it was to keep the network running in case of a short power failure, like several seconds or a minute or so.  Apparently it did work like intended, as the two-weekly test never caused a crash for several years.
It’s probably is big enough that it will give you a couple minutes of run time or so, but that is not enough not even properly shut down the servers in an orderly manor, which is one of the reasons for having a UPS. Also such fast discharge rates tend to kill batteries real quick especially if you get a lot of momentary power disruptions like I do during the rainy season. I have seen more than a few overloaded UPS’s where the battery would become completely discarded from the power flashing on and off during a thunderstorm. This is one of the reasons I advise sizing for a minimum of 15 minutes. As most of the time after a bad thunderstorm there will still be enough juice left in the battery that you can do a proper shutdown before the battery dies.    
A 700 is what I usually put on a single workstation.  3 Servers on it I'm sure has it near capacity from the start and that little RBC4 Battery just doesn't have the longevity for keeping them up.  I agree, I would look for a minimum of 15 minutes in a UPS.

Again, use KISS.  If you just want this one to work, the cheapest alternative is about $46 for a battery.  They are much more likely to fail than the inverter, but I have seen those fail too in a few instances.  The best thing you can do is get the battery and put the UPS on a workstation or just your networking equipment, Switches, Router, Firewall.  Get a 1500 or 2200 for the 3 Servers you want to protect.
Avatar of grexx

ASKER

Last Friday the servers didn't crash. It wasn't a surprise, but still nice to have a final confirmation.

Furthermore I had one more little problem which I didn't understand, and that was the time of the crash in the windows event log. These times differed up to 4 minutes per server. And this was the time of the crash, not the time of the restart. It turns out that Windows makes a log entry every five minutes. See:

https://www.experts-exchange.com/questions/21251118/How-does-the-Windows-Event-Log-know-the-time-of-a-crash.html