Link to home
Start Free TrialLog in
Avatar of systemssupply
systemssupply

asked on

Server 2003 Event ID 6008 Unexpected Reboot

Hi Everyone,
 
I have a SQL server which has just started a series of random reboots. The problem started one week ago. This is a production server and has not had any modifications (updates etc) in the recent past.

What i see is an event 6008 in event viewer:
The previous system shutdown at 11:42:04 AM on 3/15/2008 was unexpected.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

I have also noticed that for an hour or two before the reboot, no new entires are made into any of the event viewer logs.

As i see the lack of new events prior to the reboot, it is difficult to troubleshoot so any input woudl be great.

Also due to the lack of logs my line of thinking is that this is probably a software issue not a hardware issue.

The server does complete the reboot successfully without any user input and then will run for several hours before the issue starts again.
Avatar of Richard_de_groot
Richard_de_groot
Flag of Netherlands image

Hi,

Does consume SQL Server the complete internal memory?

Do you have a swap file?

can you give us some more insight info on the eventlog errors?

Cheers

Avatar of systemssupply
systemssupply

ASKER

Thanks for a reply on a saturday:

Server has 2.0 GB of Ram / Current User is 365 MB. We are closed today so really there is no load at all and it is still rebooting.

Paging file is set to max 4.0 GB on C drive

On Events: First sign of a problem is that you see no new events for a couple of hours then
1) The previous system shutdown at 11:42:04 AM on 3/15/2008 was unexpected (Event ID 6008)
2) Then a normal startup sequence:
 a: Microsoft (R) Windows (R) 5.02. 3790 Service Pack 2 Multiprocessor Free.
 b: The Event log service was started.
so on and so forth

from the reboot when you enter a reason code:
The reason supplied by user domain\Administrator for the last unexpected shutdown of this computer is: Other (Unplanned)
 Reason Code: 0xa000000

I am at the console now, all looks healthy, good video, no crazy disk access, motherboard reports normal voltage and temp.
Is there a dump file? c:\windows\memory.dmp?

Maybe you can uncheck the option that it will reboot and you can see the BSOD. It will give you more info.

Can you recall the last thing you did on this server? It could also be a device driver.
No memory.dmp also no minidump files. That would be just too easy :)  I believe i did unckeck the box to auto reboot. Can you verify Under Statup and Recovery, I unchecked Automatically Restart. Then I rebooted just to make sure it holds. At this point, I will head home and watch for the pings to the server to stop as my clue that its gone down.

Server is in production and thus we have mode no changed to the config at all in the past 6 months. Only access if clients via their apps.

I will hope for a lock with a more verbose error code.

In the interim i have done a cold boot and I re-seated all devices, ram, disk, cpu, raid, etc.


Under startup and recovery it is :-) Are you sure nothings changed??

You can also try (if it is a software problem):

1. Deep file virus/spyware scan. Maybe also with another virusscan tool than you already have.
2. download ccleaner http://www.ccleaner.com/download and run it to clean up alot :-)

Do you run McAfee antivirus? I saw alot of articles on the Net similar to your problem and blaming it on MCAfee.

Just to make sure....if the server goes belly-up again tomorrow, make sure you're on time at the clients site on Monday ;-)

Good luck.

Richard
Thanks for getting back to me, I am sure I changed nothing, have not been there is weeks. If one of our other consultants did change something they are not telling.  

I run Symantec AV 10.2, I did notice that the install was strange (client was not reporting properly) so for the moment it has been removed.

I am checking the server every hour or so, waiting for it to fail, with my luck it will run for 3 weeks now without issue.
Have you checked this article out?
http://support.microsoft.com/kb/326564

When logging off 2003 server, it asks the reason. If none is provided, then you might get the 6008 error. So this error is a result of the reboot, not the cause. I know this doesn't help resolve the issue as to why the computer is shutting down. So, I am going to take a couple guesses.

I have seen where Windows updates will shut down automatically after downloading the latest updates. It can do this for servers and clients. To fix this problem you might want to look at this post and create a domain policy that prevents Windows Updates, or WSUS servers, from rebooting the machine upon installing updates:
https://www.experts-exchange.com/questions/23199407/Disabling-WSUS-auto-restart.html
Hi Again,

The server has rebooted on its own 3 times in the last hour.  No user logged during the time betwen the reboots.

Windows update has been disabled via group policy previously.

Question: I changed the server not to reboot on failure but it is, any ideas as to why?
Update: I found an event entry this seems to appear with most of the reboots:

id 10026
The COM sub system is suppressing duplicate event log entries for a duration of 86400 seconds.  The suppression timeout can be controlled by a REG_DWORD value named SuppressDuplicateDuration under the following registry key: HKLM\Software\Microsoft\Ole\EventLog.
Antivirus applications may not pick up on certain spyware. Spyware might reboot the server without errors because the server sees this as an application that is finishing the install and needs to reboot. If you run a scan, you may wish to run it in safemode. That will prevent the services it runs from denying you the ability to remove the spyware application.

I have seen some spyware reboot machines. I know you are running symantec antivirus. But, you still might want to take Richard's advice and run a antispyware program on this machine.

The two I found that remove the ""rebooting problem"" are housecall and spybot S&D. They don't conflict with eachother because spybot is an executable and housecall is a full program. Another thing you might think about is running Hijackthis and posting your results here.

Please credit Richard, if you determine spyware was the issue. I am only agreeing that this is most likely the cause.



 



I will run the spyware check thie morning. I did run SAV last night, it was clean.
Ran Spybot, only thing found were 6 tracking cookies. I have removed them but I doubt that was the anser. I think it is safe to say that something is forcing the computer to reboot and that it is not a typical stop error as the machine is rebooting not hanigng as I had set it to do.

Any other ideas how I can find the offending application ?
OK,

lets try someting else.

I'm beginning to run out of software solutions so I think we can try to investigate hardwareproblems :-s

1. Is it possible to think about temerpatures inside this server case? Downl. Everest to check temperature http://www.majorgeeks.com/download4181.html
2. Is it possible that dust can be the problem for short circuiting the server? Is it possible to extensively clean the inside?
3. Could it be that the PSU slowely reducing it's power? Maybe you need to try another PSU. Check this on: http://www.journeysystems.com/?powercalc
4. Check your memoy http://oca.microsoft.com/en/windiag.asp or http://www.memtest86.com/

Can you give us the type of server you use?

Richard
I just noticed that the Help and Support service was not running (probably a result of when SP2 was loaded, maybe this has impact on the error reporting?) I have just reinstlled the service, and it is now running properly.

The machine is a SuperMicro X6DH8.

I have the super doctor running, it is reporting no errors, CPU temp currently is  152 (AC in the room is off on the weekends, nothing i can do about it.) I have a identical server in the rack above and below, their CPU temps are in 130 range.

I ripped the machine apart yesterday, pulled all mem, chips, drives, raid cards, etc to make sure all connections are good.

Server is not dusty at all, clean server room with good venting just no AC on weekends.

The box has tripple power supplies all appear to be operating fine.

As I have an idential server, I could swap out RAM to test.
Let's try the memory.

Is it possible that more services (that suppose to be on automatic) are stopped?
Everything set to automatic is stared with the exception of Perf Logs. As to if something is missing, I will begin to cross compare now.
here are all services:
Name      Status        Startup Type
Adaptec Storage Manager Agent      Started      Automatic
Alerter      Started      Automatic
Application Experience Lookup Service      Started      Automatic
Automatic Updates      Started      Automatic
Backup Exec Remote Agent for Windows Systems      Started      Automatic
COM+ Event System      Started      Automatic
Computer Browser      Started      Automatic
Cryptographic Services      Started      Automatic
DCOM Server Process Launcher      Started      Automatic
DHCP Client      Started      Automatic
Distributed Link Tracking Client      Started      Automatic
Distributed Transaction Coordinator      Started      Automatic
DNS Client      Started      Automatic
Error Reporting Service      Started      Automatic
Event Log      Started      Automatic
Help and Support      Started      Automatic
Interwoven WorkSite Server      Started      Automatic
IPSEC Services      Started      Automatic
Logical Disk Manager      Started      Automatic
Microsoft Search      Started      Automatic
MSSQLSERVER      Started      Automatic
Net Logon      Started      Automatic
Network Connections      Started      Manual
Network Location Awareness (NLA)      Started      Manual
NT LM Security Support Provider      Started      Manual
Plug and Play      Started      Automatic
PowerChute Network Shutdown      Started      Automatic
Print Spooler      Started      Automatic
Protected Storage      Started      Automatic
Remote Access Connection Manager      Started      Manual
Remote Procedure Call (RPC)      Started      Automatic
Remote Registry      Started      Automatic
SavRoam      Started      Automatic
Secondary Logon      Started      Automatic
Security Accounts Manager      Started      Automatic
Server      Started      Automatic
Shell Hardware Detection      Started      Automatic
SQLSERVERAGENT      Started      Automatic
SuperMicro Health Assistant      Started      Automatic
Symantec AntiVirus      Started      Automatic
Symantec AntiVirus Definition Watcher      Started      Automatic
Symantec Event Manager      Started      Automatic
Symantec Settings Manager      Started      Automatic
Symantec SPBBCSvc      Started      Automatic
System Event Notification      Started      Automatic
Task Scheduler      Started      Automatic
TCP/IP NetBIOS Helper      Started      Automatic
Telephony      Started      Manual
Terminal Services      Started      Manual
Windows Audio      Started      Automatic
Windows Management Instrumentation      Started      Automatic
Windows Time            
      Automatic      Local Service
Wireless Configuration      Started      Automatic
Workstation      Started      Automatic
Application Layer Gateway Service            Manual
Application Management            Manual
Background Intelligent Transfer Service            Manual
ClipBook            Disabled
COM+ System Application            Manual
Distributed File System            Manual
Distributed Link Tracking Server            Disabled
File Replication            Manual
HTTP SSL            Manual
Human Interface Device Access            Disabled
IMAPI CD-Burning COM Service            Disabled
Indexing Service            Disabled
InstallDriver Table Manager            Manual
Intersite Messaging            Disabled
Kerberos Key Distribution Center            Disabled
License Logging            Disabled
LiveUpdate            Manual
Logical Disk Manager Administrative Service            Manual
Messenger            Disabled
Microsoft Software Shadow Copy Provider            Manual
MSSQLServerADHelper            Manual
NetMeeting Remote Desktop Sharing            Disabled
Network DDE            Disabled
Network DDE DSDM            Disabled
Network Provisioning Service            Manual
Performance Logs and Alerts            Automatic
Portable Media Serial Number Service            Manual
Remote Access Auto Connection Manager            Manual
Remote Desktop Help Session Manager            Manual
Remote Procedure Call (RPC) Locator            Manual
Removable Storage            Manual
Resultant Set of Policy Provider            Manual
Routing and Remote Access            Disabled
Smart Card            Manual
Special Administration Console Helper            Manual
Terminal Services Session Directory            Disabled
Themes            Disabled
Uninterruptible Power Supply            Manual
Upload Manager            Manual
Virtual Disk Service            Manual
Volume Shadow Copy            Manual
WebClient            Disabled
Windows Firewall/Internet Connection Sharing (ICS)            Disabled
Windows Image Acquisition (WIA)            Disabled
Windows Installer            Manual
Windows Management Instrumentation Driver Extensions            Manual
Windows User Mode Driver Framework            Manual
WinHTTP Web Proxy Auto-Discovery Service            Manual
WMI Performance Adapter            Manual
Perf Logs are indeed always on autom. and stopped.
"Question: I changed the server not to reboot on failure but it is, any ideas as to why?"

I believe stop errors require that a reboot take place regardless of this setting.

Software can also do reboot the computer because there is no real failure. I think we covered software that will reboot the computer without a stop error, like Windows updates, viruses, and spyware. Is there anything else you can think of that might be in the install and reboot mode?
____________________________________________________________________________
All stop errors that I know of supply a system event and/or a BSOD.

What is perplexing me is there are no real event errors in the SYSTEM event logs, no BSODs, and there doesn't appear to be an application hung in the reboot after install mode. Do you have multiple information in the system event logs that say install complete? Or is the installer service running in task manager?

You do have a COM error saying that duplicate events are being supressed for 24 hours. I don't think a flood of supressed COM events will reboot the computer. But, it might be hiding something that is.

I have to admit, this is a challenging troubleshoot. I haven't yet been able to decifer if this is hardware or software.

Since this spontaneously happened, It might be a driver. Just to cover the bases have you flashed the bios? It might be a good idea to do so. Then, I think I would update the chipset and nic card drivers. They are the most common drivers that cause problems. I have seen where drivers don't cause errors but have problems.

Second that!!  ChiefIT has a point. A driver update could have maybe caused the problem with Windows Update.
How can we verify wich hardware update was done last time??

Services seem alright. Is there sum software of Supermicro that you can use to diagnose all the hardware that's inside this server?

Do you have new info?
Yep, I think we might be on a good troubleshooting track:

Does Windows update history provide driver update information?
I think Windows update download history might point us in the right direction. You might be able to see what drivers were downloaded about a week ago when the system performed an update. In your case, third party drivers might have worked correctly and windows updates wiped your third party driver and updated to a driver that messed you up. (Especially if this is a 64 bit machine)

Are you working with WSUS or Windows updates? If Windows updates, do you schedule routine driver updates?

Guys:
Sorry about the redundancy on my last. I walked away from the machine, ate breakfast and posted my input. Then, I found out that you guys were already a couple steps ahead of me. I guess great minds think alike.
Guys I am back on-site, my current thougth is that this is a hardware issue. I have 4 of these in the same rack and this one is running the hottest of all 150f. CPU.  Also when i check the fan speed this server is much slower 1600rpm v 4000 on the others (again server room hot today).  

That said I have just flashed the BIOS and still fan speed slow.

On the others if i hit reset i at least get full speed fan control until post is complete. On the server in question they just spin at 1600 rpm constantly.

I am comparing BIOS settings now to try and find a cause.

Thinking it may be a bad MB.

On windows update, its totally disabled via GP so nothign has been loaded. Its log reports that win update has been supressed by GP.

thanks for working on this one with me.
Feel this is a temperature issue. Can you let the fans speedup? Can you take the hood off the case and ventilate th MB area?

How much Celsius is 150 F ? LoL

Are the servers still on warranty?
I set the "fan sped control" to "full speed" / saved and rebooted but no such luck still slooooow.  Tested another server and I get full speed fan on boot.  That said I think that I may need a MB.  I doubt its still under warranty as they are near 3 years old I believe. Will be checking on that next.

I can leave this one open in the rack for the time being trying to find a fan now. That will at least tell me if i have found the issue.  

Of course this one is near the bottom so I am off to play rack games now.

ASKER CERTIFIED SOLUTION
Avatar of ChiefIT
ChiefIT
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
With my home made CPU fan we are at 44c. much better than before.
Given that this server was running hot and that the fans are acting up. I think we may be onto something.  That said, I think I may head out and continue death watch from home.

I should now more in 6 hours or so, see if it stabilizes.

Fingers crossed :-)
How are things?

Richard
So far so good, calls into Super Micro for a new board.  Server has been up since 3:17 yesterday afternoon. I want to see the magic 24 hour mark. Then I will order the new board.

So far so good indeed.

It's hard to work over here though.....keeping my fingers crossed for the remaining time....LoL
Well after about 35 hours of running the server rebooted on its own again. It was around 2am  unfortuantely i cannot see what the CPU temp was then but this morning the temp is in line at 100 or so.

Question is, do we still think it is hardware?  I say given that i cannot get the fans to spin correctly that I sould replace the MB anyway. Agree?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hello again, it appears that the MB was the answer at least it has yet to reboot since the replacement on Wednesday eventing. I would say we have success.  Thanks for all the help
This is a known problem in Windows 2003.

A hotfix is available for this from Microsoft.

http://support.microsoft.com/kb/950323
Hi Team, we got legacy server we converted from physical to virtual and it's given exactly the same symptoms,, will that be considered as cpu temperature issue.
we running Hyper-V 2012.
please advise.
Thanks