Link to home
Start Free TrialLog in
Avatar of Zenith63
Zenith63

asked on

Server restarting, nothing in logs, ideas?

Hi,

I have a server that is restarting for no apparent reason with nothing in the logs to explain it.  The server is brand new with Windows SBS 2003 on it, I did a SwingMigration to it a couple of weeks back.

The server is a HP DL380 G5.  I swapped the server out with a brand new one last week (complete new server except the processor and hard drives) but the problem is still happening, random restarts once or twice a day.  It happens at different times, sometimes the middle of the night, sometimes in the morning, sometimes in the evening etc.  Server is connected to an APC UPS (which is a couple of years old), but the APC managment software shows no "power events".

I see nothing in the logs, just the usual event saying the server restarted unexpectedly.  No memory dump files, so I'm guessing it's not blue screening.  Been searching the drives for files modified around the time of the restart, nothing obvious.

The fact that it's restarting so "silently" definitely says to me a likely hardware problem, but the only original parts are the UPS, the CPU and the drives (x5 146gb SAS drives, RAID5).  It's a dual CPU server and the CPUs are brand new so I'm not too inclined to suspect the CPU but let me know if this is a bad assumption.  I don't think the drives would cause a restart like this.  That leaves the UPS or Windows itself.  Any thoughts?  I can get somebody to plug the server directly into the mains instead of into the UPS but I'd like to have other plans prepared.

So I'm just looking for feedback on what you'd try to identify the cause of the restarts, first impressions etc.  All thoughts appreciated!  Thanks!
Avatar of Netman66
Netman66
Flag of Canada image

Did you complete the server installation to make it a DC (the root DC in it's own forest)?

SBS must hold all the FSMO roles and be a GC, AND it needs to be activated.

Avatar of Zenith63
Zenith63

ASKER

Interesting point, never thought of that!  Would there not be something in the logs telling me about this?

I certainly went the whole way through the migration, the server holds all the FSMO roles and is a GC.  It was and is the only DC in the forest.  Maybe it didn't complete fully, any place I look for events or anything to prove this?
HP server check the insight manager - the ASR will reboot in the event of an error :) thats on by default if you built the server from smartstart

open


https://localhost:2381/
or
https://127.0.0.1:2381/

log in

domainname\administrator
password

look at the logs :)
Hi Pete,

Yeah I came across that ASR setting earlier and disabled it so hopefully now when something goes wrong again the server will stay up and show what it was.  There's nothing in the Insight logs though, I'd have thought if the ASR was recovering the server it would log something somewhere?  Have you ever seen it in action?

Thanks!
Just read through the sbsmigration instructions again, apparently the server restart every 100 minutes and log events 1001, 1013 and 1014 if it was restarting because it didn't like being in control.  Any other thoughts NetMan?
ASKER CERTIFIED SOLUTION
Avatar of Jeffrey Kane - TechSoEasy
Jeffrey Kane - TechSoEasy
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I'm using 7.0.4.
>>Have you ever seen it in action?

No if it was the ASR insight would be logging it :)
I was thinking it would, but googling there I came across a few people having similar issues where ASR was triggered but didn't log anything so I'm gonna hope it's the same in my case!

Any more ideas?

ASR is disabled and I'm getting somebody out there in the morning to get the server off that UPS in case it has issues.
Just ensure it's activated and we can eliminate that.  Is there another SBS server on this network?  If so, is it in the same domain?  If that's true then turn it off and see what happens.

Yep it's activated, has FSMO roles and is GC.  The only other server is a Windows 2003 Server member server.
Well....I tend to agree with those whom have mentioned hardware since intermittant hardware issues don't necessarily create logs where OS-related issues generally do.

You may want to shut off the server, reseat the RAM and all the procs then restart it to see if the reboot interval changes or stops completely.
This problem has been going on for a couple of weeks since I did the swing to the new HP server.  I went in late last week and replaced the HP with another brand new one we had in stock.  I just moved the drives and extra CPU across.  So I reckon I've eliminated all hardware issues like RAM, all that's left is power (eg. UPS), that second processor and the drives.  I could be wrong but I don't suspect the drives as HP Smart Array Controllers are usually execellent at reporting any errors in that department.  CPU?  Maybe, I feel it probably isn't though, with two in there I'd have though ILO would catch any unusal activity from them, but I could be very wrong?  We could pull out the processor I didn't replace and leave it on one for a few days (any problems doing this?)
How many procs were in the old server?
In the old Dell I swung from there was 1.  I swung to a DL380 with 2 processors.  I then went out last week with a server which only had one processor in it from our suppliers, so I took one processor from the server I was replacing and stuck it in this one I'd just brought out.  Hope that makes sense, not easy to explain :).  Basically the new install of SBS was installed on 2 processors and is still on 2 now.
Did you update the HAL in Device Manager for 2 procs?

Expand Computer.
Right click whatever is being shown there and select Update Driver.
If prompted to go out to the internet select No.
Hit next.
Select Install from Specific location>Next.
Select Don't Search...>Next
Select the matching HAL for Multiprocessor (ie, if it was ACPI Uniprocessor then select ACPI Multiprocessor)
I was using the Swing migration, so the new server had SBS installed from scratch on it.  It had the two processors in it the whole time.  Maybe I should just clarify what has happened so far -


1. Old Dell server in place
2. Did SBSMigration's Swing migration to a brand new dual-proc HP DL380 three weeks ago, Dell server now out of the picture (thank god!)
3.  This HP was restarting randomly (I don't think the old Dell was).
4. Went out with another brand new HP DL380 which came from the suppliers with one processor.  Plugged out live DL380.  Took the drives from the live DL380 and stuck them in this second DL380.  Also took one of the processors from the live DL380 and put it in this second DL380.
5. Plugged in this second DL380, plugged it into the network and it booted straight into SBS.
6. Left the site with the original DL380 which I was hoping had a dodgy main board or something in it.
7. Today I see the new DL380 I put in is restarting in the exact same way.

So everything was replaced in stage 4/5 except one processor, the UPS and the drives.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Can you run a DCDIAG /v and post the results here?  Also, it'd be helpful to see the output from a SYSTEMINFO command as well.

Thanks

Jeff
TechSoEasy
Cant you monitor the temperature of your CPUs with something like speed fan to see if they are overheating or at least what temperatures they reach the most.Check your bios settings to see at what processor temperature will the computer shutdown.If the power supply(UPS) is not producing enough power for the system,it wil make it restart.Does it have enough wattage,VA?Two processors is very demanding on power.Do you have your monitor connected to the UPS?What are the specs of your UPS?
Turned out to be the UPS!  I'm kind of surprised to be honest, it wasn't that old and was showing nothing in the management software, kind of negates the point of having management software if it doesn't report errors!

Thanks for the help guys!