ryansinn
asked on
NTBackup on Windows SBS 2003r2 Server Hangs intermittently
I've recently installed a Windows Server 2003r2 SBS Edition on an HP ML150 Server with an E200 add-on RAID card. I patched the server until everything was up to date (as of February 7th, 2009.)
The following weekend I updated to the latest tape and raid drivers without issue.
Two weeks ago (on Tuesday, Feb 9th) I received a call in the early morning that the users could not log in... or get email... or access the internet (DNS.)
I RDP'd into the Terminal Server and could ping the server, but it was not responding to RDP requests. I went on site and saw that the console for the server was stuck on the gray background of the login screen (the server had been logged in and locked.)
I restarted the server and it came up slowly but after letting it sit for 2 hrs it was still "Applying Computer Settings" -- I restarted the server into Safe Mode and uninstalled the updated RAID and Tape Drive... thinking that maybe a bad driver had caused the system to crash hard.
The thing I noted was that the server froze at 11:22pm (backups start at 10pm and take 3hrs.)
So I thought maybe the system kicked off the backup and ran into an error.
Anyway, I restarted into Safe Mode and disabled all services and was able to then restart and log into the system. I started enabling services one by one until everything that could start would. DHCP and IIS were still having issues started. I spent about 3hrs playing around with trying to get the services started... reregistering the ocx file (with regsvr32, I can't remember the specific DLL, but it's documented in the DHCP recovery documention from Microsoft.)
Anyway -- with seemingly no intervention on my part (I know it sounds absurd) at about 3pm after spending about 7hrs on the issue the services finally started and the system started performing like normal... I hadn't restarted it since 12:30p -- so I'm not sure why it started working suddenly.
Anyway -- I've restarted the server a few times since then and have been fixing issues related to updating the system to WSS 3.0 since the first hang up.
Last night for the first time since the last system hang... the server froze again. This time at 7:30pm with the backup starting at 7pm. The backup logs are empty (0 bytes) for both the 9th and last night (the 17th.)
I logged into the terminal server again and pinged the server, but could not RDP. The server console was once again frozen on the grey screen.
I powered off the server and restarted it -- it came up fine with no logs after 7:30p until 8:17am when I restarted it. No errors right before it stopped.
Upon restarting the server the following information was the first new entry stored in the System Event Log:
"The previous system shutdown at 9:06:00 PM on 2/18/2009 was unexpected."
The logs stopped at 7:30pm, but apparently the system didn't register it had hung until 9:06.
The backups when started at 7pm typically finish backup at 9:30pm and verification between 10:40p and 11:00pm...
Tonight's backup went off without a hitch, so the behavior is not consistant.
At 7:30pm tonight the following Informational Alerts were entered into the Event Log:
"The Removable Storage service was successfully sent a start control."
"The Removable Storage service entered the running state."
"The Volume Shadow Copy service was successfully sent a start control."
"The Volume Shadow Copy service entered the running state."
"The Microsoft Software Shadow Copy Provider service was successfully sent a start control."
"The Microsoft Software Shadow Copy Provider service entered the running state."
And at 9:30pm:
"The Volume Shadow Copy service entered the stopped state."
"The Microsoft Software Shadow Copy Provider service entered the stopped state."
Then at 11:08pm tonight:
"RSM was stopped."
"The Removable Storage service entered the stopped state."
And the Backup Logs are complete and SBS says it's successful.
I've seen some VSS issues that people are mentioning applying a hotfix for, but most of them are a few years old and relate to Servers running 2003r2 SP1, not SP2.... with people mentioning the problems have been fixed in SP2.
I'm still willing to try applying redundant hotfixes if that solves the problem.
Any thoughts on this... I hope I was through enough and yet still to the point. :)
-- UPDATE --
I've now run:
regsvr32 msxml.dll
regsvr32 msxml3.dll
regsvr32 msxml4.dll
per:
I've run:
vssadmin list writers
per:
http://www.petri.co.il/forums/showthread.php?t=25841
^^ that seems to be my issue identically
This also seems to partially be my issue (minus the actual error log / message:)
http://www.eggheadcafe.com/software/aspnet/33545710/ntbackup-failing.aspx
Tonight I'm trying:
http://support.microsoft.com/kb/940349
To see if that fixes the issue... of course we won't know for another week or two...
The following weekend I updated to the latest tape and raid drivers without issue.
Two weeks ago (on Tuesday, Feb 9th) I received a call in the early morning that the users could not log in... or get email... or access the internet (DNS.)
I RDP'd into the Terminal Server and could ping the server, but it was not responding to RDP requests. I went on site and saw that the console for the server was stuck on the gray background of the login screen (the server had been logged in and locked.)
I restarted the server and it came up slowly but after letting it sit for 2 hrs it was still "Applying Computer Settings" -- I restarted the server into Safe Mode and uninstalled the updated RAID and Tape Drive... thinking that maybe a bad driver had caused the system to crash hard.
The thing I noted was that the server froze at 11:22pm (backups start at 10pm and take 3hrs.)
So I thought maybe the system kicked off the backup and ran into an error.
Anyway, I restarted into Safe Mode and disabled all services and was able to then restart and log into the system. I started enabling services one by one until everything that could start would. DHCP and IIS were still having issues started. I spent about 3hrs playing around with trying to get the services started... reregistering the ocx file (with regsvr32, I can't remember the specific DLL, but it's documented in the DHCP recovery documention from Microsoft.)
Anyway -- with seemingly no intervention on my part (I know it sounds absurd) at about 3pm after spending about 7hrs on the issue the services finally started and the system started performing like normal... I hadn't restarted it since 12:30p -- so I'm not sure why it started working suddenly.
Anyway -- I've restarted the server a few times since then and have been fixing issues related to updating the system to WSS 3.0 since the first hang up.
Last night for the first time since the last system hang... the server froze again. This time at 7:30pm with the backup starting at 7pm. The backup logs are empty (0 bytes) for both the 9th and last night (the 17th.)
I logged into the terminal server again and pinged the server, but could not RDP. The server console was once again frozen on the grey screen.
I powered off the server and restarted it -- it came up fine with no logs after 7:30p until 8:17am when I restarted it. No errors right before it stopped.
Upon restarting the server the following information was the first new entry stored in the System Event Log:
"The previous system shutdown at 9:06:00 PM on 2/18/2009 was unexpected."
The logs stopped at 7:30pm, but apparently the system didn't register it had hung until 9:06.
The backups when started at 7pm typically finish backup at 9:30pm and verification between 10:40p and 11:00pm...
Tonight's backup went off without a hitch, so the behavior is not consistant.
At 7:30pm tonight the following Informational Alerts were entered into the Event Log:
"The Removable Storage service was successfully sent a start control."
"The Removable Storage service entered the running state."
"The Volume Shadow Copy service was successfully sent a start control."
"The Volume Shadow Copy service entered the running state."
"The Microsoft Software Shadow Copy Provider service was successfully sent a start control."
"The Microsoft Software Shadow Copy Provider service entered the running state."
And at 9:30pm:
"The Volume Shadow Copy service entered the stopped state."
"The Microsoft Software Shadow Copy Provider service entered the stopped state."
Then at 11:08pm tonight:
"RSM was stopped."
"The Removable Storage service entered the stopped state."
And the Backup Logs are complete and SBS says it's successful.
I've seen some VSS issues that people are mentioning applying a hotfix for, but most of them are a few years old and relate to Servers running 2003r2 SP1, not SP2.... with people mentioning the problems have been fixed in SP2.
I'm still willing to try applying redundant hotfixes if that solves the problem.
Any thoughts on this... I hope I was through enough and yet still to the point. :)
-- UPDATE --
I've now run:
regsvr32 msxml.dll
regsvr32 msxml3.dll
regsvr32 msxml4.dll
per:
I've run:
vssadmin list writers
per:
http://www.petri.co.il/forums/showthread.php?t=25841
^^ that seems to be my issue identically
This also seems to partially be my issue (minus the actual error log / message:)
http://www.eggheadcafe.com/software/aspnet/33545710/ntbackup-failing.aspx
Tonight I'm trying:
http://support.microsoft.com/kb/940349
To see if that fixes the issue... of course we won't know for another week or two...
C:\Documents and Settings\Administrator>vssadmin list writers
vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool
(C) Copyright 2001 Microsoft Corp.
Writer name: 'System Writer'
Writer Id: {e8132975-6f93-4464-a53e-1050253ae220}
Writer Instance Id: {8b70819a-81f3-4bcd-8fa8-b90385b29523}
State: [5] Waiting for completion
Last error: No error
Writer name: 'MSDEWriter'
Writer Id: {f8544ac1-0611-4fa5-b04b-f7ee00b03277}
Writer Instance Id: {eb9cd8d4-55f7-49f9-9d8e-2896e49cfd84}
State: [1] Stable
Last error: No error
Writer name: 'SqlServerWriter'
Writer Id: {a65faa63-5ea8-4ebc-9dbd-a0c4db26912a}
Writer Instance Id: {de6d3ee3-d6a4-4e0f-97e8-3209ed3703e4}
State: [5] Waiting for completion
Last error: No error
Writer name: 'Event Log Writer'
Writer Id: {eee8c692-67ed-4250-8d86-390603070d00}
Writer Instance Id: {9eee7ecb-0eb5-46f2-80fe-56f59e934001}
State: [1] Stable
Last error: No error
Writer name: 'WINS Jet Writer'
Writer Id: {f08c1483-8407-4a26-8c26-6c267a629741}
Writer Instance Id: {ee60961c-3792-4f2e-8115-e38d16b08330}
State: [5] Waiting for completion
Last error: No error
Writer name: 'IIS Metabase Writer'
Writer Id: {59b1f0cf-90ef-465f-9609-6ca8b2938366}
Writer Instance Id: {017748bf-5f6f-4a0f-a5ee-20b2c4e176fd}
State: [5] Waiting for completion
Last error: No error
Writer name: 'COM+ REGDB Writer'
Writer Id: {542da469-d3e1-473c-9f4f-7847f01fc64f}
Writer Instance Id: {7c716e84-e6ff-47d7-8101-0ee56488a3ab}
State: [1] Stable
Last error: No error
Writer name: 'Dhcp Jet Writer'
Writer Id: {be9ac81e-3619-421f-920f-4c6fea9e93ad}
Writer Instance Id: {1c65cd90-37aa-40ed-9263-64f88e50ec60}
State: [5] Waiting for completion
Last error: No error
Writer name: 'Registry Writer'
Writer Id: {afbab4a2-367d-4d15-a586-71dbb18f8485}
Writer Instance Id: {6d254596-2668-4a92-969b-bf3cd62917be}
State: [1] Stable
Last error: No error
Writer name: 'NTDS'
Writer Id: {b2014c9e-8711-4c5c-a5a9-3cf384484757}
Writer Instance Id: {1e2419bd-27c5-4a6e-82ec-41893940338d}
State: [1] Stable
Last error: No error
Writer name: 'SPSearch VSS Writer'
Writer Id: {57af97e4-4a76-4ace-a756-d11e8f0294c7}
Writer Instance Id: {dbae9288-e9e5-4aaf-8712-62ae719be159}
State: [5] Waiting for completion
Last error: No error
Writer name: 'FRS Writer'
Writer Id: {d76f5a28-3092-4589-ba48-2958fb88ce29}
Writer Instance Id: {002866ec-eb21-42c9-aef5-07badb04843e}
State: [5] Waiting for completion
Last error: No error
Writer name: 'BITS Writer'
Writer Id: {4969d978-be47-48b0-b100-f328f07ac1e0}
Writer Instance Id: {540c2bbe-3042-4c90-848d-86f35f20a78a}
State: [5] Waiting for completion
Last error: No error
Writer name: 'WMI Writer'
Writer Id: {a6ad56c2-b509-4e6c-bb19-49d8f43532f0}
Writer Instance Id: {7c9f2413-6390-4900-88ed-bb26868e01d3}
State: [5] Waiting for completion
Last error: No error
Since this is SBS, check what other tasks are running schedules, and also turn on the alerting option.
While you are at it run the SBS BPA ( best practices analyzer )
I hope this helps !
While you are at it run the SBS BPA ( best practices analyzer )
I hope this helps !
ASKER
Install the recovery console and attempt to remove the virus from there:
http://support.microsoft.com/kb/216417
http://support.microsoft.com/kb/216417
ASKER
sorry -- wrong question :)
ASKER
Best Practices only has two issues, which I'm ok with:
The Network Driver is more than a Year Old
The Update for Daylight Savings Time (DST) is not installed... it is, I've tried to rerun it and it says it's already installed.
The Scheduled Tasks look fine as well.
Which "Alerting" option are you talking about?
scheduledtasks.png
The Network Driver is more than a Year Old
The Update for Daylight Savings Time (DST) is not installed... it is, I've tried to rerun it and it says it's already installed.
The Scheduled Tasks look fine as well.
Which "Alerting" option are you talking about?
scheduledtasks.png
Schedule task does not look fine. You have something set to run on every hours. This one may randomly start up as the same time with your backup creating the issue in your memory. What is that task (95%)? You should check your task manager when thing start freezing to see what process is taking the CPU and memory.
K
K
ASKER
looks fine now. I think that 95% was the SBS Monitoring Service. I just looked at Scheduled Tasks now... 95% is gone.
scheduledtasks.png
scheduledtasks.png
ASKER
not sure why it grabbed the wrong screenshot... but here's the updated Scheduled Taks... no 95%
ASKER
attachment
scheduledtasks.png
scheduledtasks.png
You get my statement properly. You need to loook in your schedule task and reorganize it. You have a lot of overlap tasks set in schedule task such as: volume shadow copy, performance data collection.... these tasks can start at the same time with the backup causing the memory insuffient issue. Add more memory to your system or organize your tasks to avoid other activities during NTbackup is running will free up memory for the backup task.
K
K
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
K