Server restarting, nothing in logs, ideas?

Hi,

I have a server that is restarting for no apparent reason with nothing in the logs to explain it.  The server is brand new with Windows SBS 2003 on it, I did a SwingMigration to it a couple of weeks back.

The server is a HP DL380 G5.  I swapped the server out with a brand new one last week (complete new server except the processor and hard drives) but the problem is still happening, random restarts once or twice a day.  It happens at different times, sometimes the middle of the night, sometimes in the morning, sometimes in the evening etc.  Server is connected to an APC UPS (which is a couple of years old), but the APC managment software shows no "power events".

I see nothing in the logs, just the usual event saying the server restarted unexpectedly.  No memory dump files, so I'm guessing it's not blue screening.  Been searching the drives for files modified around the time of the restart, nothing obvious.

The fact that it's restarting so "silently" definitely says to me a likely hardware problem, but the only original parts are the UPS, the CPU and the drives (x5 146gb SAS drives, RAID5).  It's a dual CPU server and the CPUs are brand new so I'm not too inclined to suspect the CPU but let me know if this is a bad assumption.  I don't think the drives would cause a restart like this.  That leaves the UPS or Windows itself.  Any thoughts?  I can get somebody to plug the server directly into the mains instead of into the UPS but I'd like to have other plans prepared.

So I'm just looking for feedback on what you'd try to identify the cause of the restarts, first impressions etc.  All thoughts appreciated!  Thanks!
LVL 11
Zenith63Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Netman66Commented:
Did you complete the server installation to make it a DC (the root DC in it's own forest)?

SBS must hold all the FSMO roles and be a GC, AND it needs to be activated.

0
Zenith63Author Commented:
Interesting point, never thought of that!  Would there not be something in the logs telling me about this?

I certainly went the whole way through the migration, the server holds all the FSMO roles and is a GC.  It was and is the only DC in the forest.  Maybe it didn't complete fully, any place I look for events or anything to prove this?
0
Pete LongTechnical ConsultantCommented:
HP server check the insight manager - the ASR will reboot in the event of an error :) thats on by default if you built the server from smartstart

open


https://localhost:2381/
or
https://127.0.0.1:2381/

log in

domainname\administrator
password

look at the logs :)
0
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

Zenith63Author Commented:
Hi Pete,

Yeah I came across that ASR setting earlier and disabled it so hopefully now when something goes wrong again the server will stay up and show what it was.  There's nothing in the Insight logs though, I'd have thought if the ASR was recovering the server it would log something somewhere?  Have you ever seen it in action?

Thanks!
0
Zenith63Author Commented:
Just read through the sbsmigration instructions again, apparently the server restart every 100 minutes and log events 1001, 1013 and 1014 if it was restarting because it didn't like being in control.  Any other thoughts NetMan?
0
Jeffrey Kane - TechSoEasyPrincipal ConsultantCommented:
What version of APC PowerChute are you running on the Server?  There's a bug in older versions... make sure it's 7.x and not 6.x.

http://nam-en.apc.com/cgi-bin/nam_en.cfg/php/enduser/std_adp.php?p_faqid=7202

Jeff
TechSoEasy

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Zenith63Author Commented:
I'm using 7.0.4.
0
Pete LongTechnical ConsultantCommented:
>>Have you ever seen it in action?

No if it was the ASR insight would be logging it :)
0
Zenith63Author Commented:
I was thinking it would, but googling there I came across a few people having similar issues where ASR was triggered but didn't log anything so I'm gonna hope it's the same in my case!

Any more ideas?

ASR is disabled and I'm getting somebody out there in the morning to get the server off that UPS in case it has issues.
0
Netman66Commented:
Just ensure it's activated and we can eliminate that.  Is there another SBS server on this network?  If so, is it in the same domain?  If that's true then turn it off and see what happens.

0
Zenith63Author Commented:
Yep it's activated, has FSMO roles and is GC.  The only other server is a Windows 2003 Server member server.
0
Netman66Commented:
Well....I tend to agree with those whom have mentioned hardware since intermittant hardware issues don't necessarily create logs where OS-related issues generally do.

You may want to shut off the server, reseat the RAM and all the procs then restart it to see if the reboot interval changes or stops completely.
0
Zenith63Author Commented:
This problem has been going on for a couple of weeks since I did the swing to the new HP server.  I went in late last week and replaced the HP with another brand new one we had in stock.  I just moved the drives and extra CPU across.  So I reckon I've eliminated all hardware issues like RAM, all that's left is power (eg. UPS), that second processor and the drives.  I could be wrong but I don't suspect the drives as HP Smart Array Controllers are usually execellent at reporting any errors in that department.  CPU?  Maybe, I feel it probably isn't though, with two in there I'd have though ILO would catch any unusal activity from them, but I could be very wrong?  We could pull out the processor I didn't replace and leave it on one for a few days (any problems doing this?)
0
Netman66Commented:
How many procs were in the old server?
0
Zenith63Author Commented:
In the old Dell I swung from there was 1.  I swung to a DL380 with 2 processors.  I then went out last week with a server which only had one processor in it from our suppliers, so I took one processor from the server I was replacing and stuck it in this one I'd just brought out.  Hope that makes sense, not easy to explain :).  Basically the new install of SBS was installed on 2 processors and is still on 2 now.
0
Netman66Commented:
Did you update the HAL in Device Manager for 2 procs?

Expand Computer.
Right click whatever is being shown there and select Update Driver.
If prompted to go out to the internet select No.
Hit next.
Select Install from Specific location>Next.
Select Don't Search...>Next
Select the matching HAL for Multiprocessor (ie, if it was ACPI Uniprocessor then select ACPI Multiprocessor)
0
Zenith63Author Commented:
I was using the Swing migration, so the new server had SBS installed from scratch on it.  It had the two processors in it the whole time.  Maybe I should just clarify what has happened so far -


1. Old Dell server in place
2. Did SBSMigration's Swing migration to a brand new dual-proc HP DL380 three weeks ago, Dell server now out of the picture (thank god!)
3.  This HP was restarting randomly (I don't think the old Dell was).
4. Went out with another brand new HP DL380 which came from the suppliers with one processor.  Plugged out live DL380.  Took the drives from the live DL380 and stuck them in this second DL380.  Also took one of the processors from the live DL380 and put it in this second DL380.
5. Plugged in this second DL380, plugged it into the network and it booted straight into SBS.
6. Left the site with the original DL380 which I was hoping had a dodgy main board or something in it.
7. Today I see the new DL380 I put in is restarting in the exact same way.

So everything was replaced in stage 4/5 except one processor, the UPS and the drives.
0
Netman66Commented:
It sounds like it may be the UPS and or cable.  Try running without the signal cable to see if it stops.

0
Jeffrey Kane - TechSoEasyPrincipal ConsultantCommented:
Can you run a DCDIAG /v and post the results here?  Also, it'd be helpful to see the output from a SYSTEMINFO command as well.

Thanks

Jeff
TechSoEasy
0
snazyCommented:
Cant you monitor the temperature of your CPUs with something like speed fan to see if they are overheating or at least what temperatures they reach the most.Check your bios settings to see at what processor temperature will the computer shutdown.If the power supply(UPS) is not producing enough power for the system,it wil make it restart.Does it have enough wattage,VA?Two processors is very demanding on power.Do you have your monitor connected to the UPS?What are the specs of your UPS?
0
Zenith63Author Commented:
Turned out to be the UPS!  I'm kind of surprised to be honest, it wasn't that old and was showing nothing in the management software, kind of negates the point of having management software if it doesn't report errors!

Thanks for the help guys!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Components

From novice to tech pro — start learning today.