Link to home
Start Free TrialLog in
Avatar of netfriendsinc
netfriendsinc

asked on

Intermittent server rebooting

I have a new server, ML570 G4 Proliant, that has been chugging along fine for the last few months.  However, just recently - probably within the last several weeks - I've been noticing in the event log, at seemingly random times, "The previous system shutdown at <time> on <date> was unexpected"  EVENT ID 6008.  The only thing I can think of that has change is Retrospect Enterprise Server 7.6 was installed to back up files to a network share (one folder in the backup que is 720k files); however, retrospect will say in it's internal log "backup script <name> ran successfully on <time>" so I don't think it's due to teh backup software.  Yet then it will say "execution terminated  unexpectedly possibly due to power failure or system crash"  I've include two screen shots of the event viewer errors
xcelera-shutdown2.JPG
xcelera-nic-problem.JPG
Avatar of CPAsAdmin
CPAsAdmin
Flag of United States of America image

I would be checking the UPS for problems. Weak batteries, excessive load. Move the server to a different UPS or to the Wall. Did the problem follow the server?
Avatar of netfriendsinc
netfriendsinc

ASKER

It's connected to an older UPS but the batteries are only 2 months old, as I had to replace them - no the server did not have this issue when installed, only last couple of weeks.  I'll try and change power sources though, it's worth a shot ;)
I'd suspect the backup.  If you can, suspend the schedule to see if it goes away.
Is this daily, or weekly?
It's nightly at 10pm - i'll stop the service and see if it makes a different - any thoughts on the SNMP Trap error in the event viewer logs?
i'll report back tomorrow on this topic and let you know if stopping the retrospect services did the trick..thanks guys!
I noticed this information snippet in the event viewer - the file that it says to look at is over 2GB and will not open with notepad as there is not enough memory to view it (8GB of ram installed currently)...not sure if this is important or not..
xcelera-bug-dump.JPG
Sorry, I've not been brave (or foolish) enough to try to open a memory dump in Notepad.  :)
There are tools to parse sections and read them out, or break them into smaller files.
But, never having done that, I couldn't report on an effective means of using it to troubleshoot.
After shutting down the Retrospect services yesterday afternoon, the server has yet to reboot - looks like it may have been that folder containing 720k files that is choking it..does that sound like a possibliity?  That retrospect could 'choke' on a large number of files?
Server just rebooted again with all retrospect services stopped :(
I decided to disable SNMP as it sucks in 2000; will report back tomorrow with an update
disabling SNMP did not work - server rebooted again this morning; switched to a different UPS like earlier suggested.  I'll keep you updated.  
So changing power scources did nothing to help; any ideas?  Am i talking to myself here?
ASKER CERTIFIED SOLUTION
Avatar of aleghart
aleghart
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
We have 5 ML 570 servers that randomly reboot with event ID 6008. I am trying AGAIN to get to the bottom of this and came across your post.

One of these servers that we have was running Server 2000 we got Microsoft to look at the issue and it needed a hotfix that we could not get from them because we had not taken out a Microsoft extended hotfix agreement in 2005 but as Windows 2000 is no longer supported we cannot purchase one of these agreements or get the necessary hotfix.

Apparently there is a problem with Windows 2000 running on this server. This can happen on a computer that is using an Intel Dual-Core Xeon 7100 series CPU. However, the problem could also occur with other dual-core processors that use an L3 cache.

We installed Server 2003 and the server has now only rebooted once since (In about 6 months).  It is just the other 4 servers running server 2003 we now have to stop rebooting randomly.

If it helps we our servers are in an air conditioned server room with no temperature issues, three of the servers are SAN attached but the two that are not also reboot randomly. We have noticed that they reboot at a similar time so it is possible that it could be a backup job. It does not happen on our ML 570 G4 that is running ESX 3.5 so I believe that the issue might be a microsoft issue.

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial