[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

NTBackup on Windows SBS 2003r2 Server Hangs intermittently

Posted on 2009-02-19
12
Medium Priority
?
1,060 Views
Last Modified: 2012-05-06
I've recently installed a Windows Server 2003r2 SBS Edition on an HP ML150 Server with an E200 add-on RAID card.  I patched the server until everything was up to date (as of February 7th, 2009.)

The following weekend I updated to the latest tape and raid drivers without issue.

Two weeks ago (on Tuesday, Feb 9th) I received a call in the early morning that the users could not log in... or get email... or access the internet (DNS.)

I RDP'd into the Terminal Server and could ping the server, but it was not responding to RDP requests.  I went on site and saw that the console for the server was stuck on the gray background of the login screen (the server had been logged in and locked.)

I restarted the server and it came up slowly but after letting it sit for 2 hrs it was still "Applying Computer Settings" -- I restarted the server into Safe Mode and uninstalled the updated RAID and Tape Drive... thinking that maybe a bad driver had caused the system to crash hard.

The thing I noted was that the server froze at 11:22pm (backups start at 10pm and take 3hrs.)

So I thought maybe the system kicked off the backup and ran into an error.

Anyway, I restarted into Safe Mode and disabled all services and was able to then restart and log into the system.  I started enabling services one by one until everything that could start would.  DHCP and IIS were still having issues started.  I spent about 3hrs playing around with trying to get the services started... reregistering the ocx file (with regsvr32, I can't remember the specific DLL, but it's documented in the DHCP recovery documention from Microsoft.)

Anyway -- with seemingly no intervention on my part (I know it sounds absurd) at about 3pm after spending about 7hrs on the issue the services finally started and the system started performing like normal... I hadn't restarted it since 12:30p -- so I'm not sure why it started working suddenly.

Anyway -- I've restarted the server a few times since then and have been fixing issues related to updating the system to WSS 3.0 since the first hang up.

Last night for the first time since the last system hang... the server froze again.  This time at 7:30pm with the backup starting at 7pm.  The backup logs are empty (0 bytes) for both the 9th and last night (the 17th.)

I logged into the terminal server again and pinged the server, but could not RDP.  The server console was once again frozen on the grey screen.

I powered off the server and restarted it -- it came up fine with no logs after 7:30p until 8:17am when I restarted it.  No errors right before it stopped.

Upon restarting the server the following information was the first new entry stored in the System Event Log:

"The previous system shutdown at 9:06:00 PM on 2/18/2009 was unexpected."

The logs stopped at 7:30pm, but apparently the system didn't register it had hung until 9:06.

The backups when started at 7pm typically finish backup at 9:30pm and verification between 10:40p and 11:00pm...

Tonight's backup went off without a hitch, so the behavior is not consistant.

At 7:30pm tonight the following Informational Alerts were entered into the Event Log:

"The Removable Storage service was successfully sent a start control."
"The Removable Storage service entered the running state."
"The Volume Shadow Copy service was successfully sent a start control."
"The Volume Shadow Copy service entered the running state."
"The Microsoft Software Shadow Copy Provider service was successfully sent a start control."
"The Microsoft Software Shadow Copy Provider service entered the running state."

And at 9:30pm:
"The Volume Shadow Copy service entered the stopped state."
"The Microsoft Software Shadow Copy Provider service entered the stopped state."

Then at 11:08pm tonight:
"RSM was stopped."
"The Removable Storage service entered the stopped state."

And the Backup Logs are complete and SBS says it's successful.

I've seen some VSS issues that people are mentioning applying a hotfix for, but most of them are a few years old and relate to Servers running 2003r2 SP1, not SP2.... with people mentioning the problems have been fixed in SP2.

I'm still willing to try applying redundant hotfixes if that solves the problem.

Any thoughts on this... I hope I was through enough and yet still to the point.  :)

-- UPDATE --

I've now run:
regsvr32 msxml.dll
regsvr32  msxml3.dll
regsvr32 msxml4.dll

per:


I've run:
vssadmin list writers

per:
http://www.petri.co.il/forums/showthread.php?t=25841

^^ that seems to be my issue identically

This also seems to partially be my issue (minus the actual error log / message:)
http://www.eggheadcafe.com/software/aspnet/33545710/ntbackup-failing.aspx

Tonight I'm trying:
http://support.microsoft.com/kb/940349

To see if that fixes the issue... of course we won't know for another week or two...

C:\Documents and Settings\Administrator>vssadmin list writers
vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool
(C) Copyright 2001 Microsoft Corp.
 
Writer name: 'System Writer'
   Writer Id: {e8132975-6f93-4464-a53e-1050253ae220}
   Writer Instance Id: {8b70819a-81f3-4bcd-8fa8-b90385b29523}
   State: [5] Waiting for completion
   Last error: No error
 
Writer name: 'MSDEWriter'
   Writer Id: {f8544ac1-0611-4fa5-b04b-f7ee00b03277}
   Writer Instance Id: {eb9cd8d4-55f7-49f9-9d8e-2896e49cfd84}
   State: [1] Stable
   Last error: No error
 
Writer name: 'SqlServerWriter'
   Writer Id: {a65faa63-5ea8-4ebc-9dbd-a0c4db26912a}
   Writer Instance Id: {de6d3ee3-d6a4-4e0f-97e8-3209ed3703e4}
   State: [5] Waiting for completion
   Last error: No error
 
Writer name: 'Event Log Writer'
   Writer Id: {eee8c692-67ed-4250-8d86-390603070d00}
   Writer Instance Id: {9eee7ecb-0eb5-46f2-80fe-56f59e934001}
   State: [1] Stable
   Last error: No error
 
Writer name: 'WINS Jet Writer'
   Writer Id: {f08c1483-8407-4a26-8c26-6c267a629741}
   Writer Instance Id: {ee60961c-3792-4f2e-8115-e38d16b08330}
   State: [5] Waiting for completion
   Last error: No error
 
Writer name: 'IIS Metabase Writer'
   Writer Id: {59b1f0cf-90ef-465f-9609-6ca8b2938366}
   Writer Instance Id: {017748bf-5f6f-4a0f-a5ee-20b2c4e176fd}
   State: [5] Waiting for completion
   Last error: No error
 
Writer name: 'COM+ REGDB Writer'
   Writer Id: {542da469-d3e1-473c-9f4f-7847f01fc64f}
   Writer Instance Id: {7c716e84-e6ff-47d7-8101-0ee56488a3ab}
   State: [1] Stable
   Last error: No error
 
Writer name: 'Dhcp Jet Writer'
   Writer Id: {be9ac81e-3619-421f-920f-4c6fea9e93ad}
   Writer Instance Id: {1c65cd90-37aa-40ed-9263-64f88e50ec60}
   State: [5] Waiting for completion
   Last error: No error
 
Writer name: 'Registry Writer'
   Writer Id: {afbab4a2-367d-4d15-a586-71dbb18f8485}
   Writer Instance Id: {6d254596-2668-4a92-969b-bf3cd62917be}
   State: [1] Stable
   Last error: No error
 
Writer name: 'NTDS'
   Writer Id: {b2014c9e-8711-4c5c-a5a9-3cf384484757}
   Writer Instance Id: {1e2419bd-27c5-4a6e-82ec-41893940338d}
   State: [1] Stable
   Last error: No error
 
Writer name: 'SPSearch VSS Writer'
   Writer Id: {57af97e4-4a76-4ace-a756-d11e8f0294c7}
   Writer Instance Id: {dbae9288-e9e5-4aaf-8712-62ae719be159}
   State: [5] Waiting for completion
   Last error: No error
 
Writer name: 'FRS Writer'
   Writer Id: {d76f5a28-3092-4589-ba48-2958fb88ce29}
   Writer Instance Id: {002866ec-eb21-42c9-aef5-07badb04843e}
   State: [5] Waiting for completion
   Last error: No error
 
Writer name: 'BITS Writer'
   Writer Id: {4969d978-be47-48b0-b100-f328f07ac1e0}
   Writer Instance Id: {540c2bbe-3042-4c90-848d-86f35f20a78a}
   State: [5] Waiting for completion
   Last error: No error
 
Writer name: 'WMI Writer'
   Writer Id: {a6ad56c2-b509-4e6c-bb19-49d8f43532f0}
   Writer Instance Id: {7c9f2413-6390-4900-88ed-bb26868e01d3}
   State: [5] Waiting for completion
   Last error: No error

Open in new window

0
Comment
Question by:ryansinn
  • 7
  • 3
11 Comments
 
LVL 26

Expert Comment

by:lnkevin
ID: 23691853
Most of the time, when NTbackup kicks on, the system may have another activity that overlap the time and takes up the resources. I would suggest to move the backup schedule to a few hours after 12:00 and keep monitoring it to see if the issue is still there. Also, let me know what objects that you select to backup. Common problem is people choose to backup C: with some system file actively running and NT failed to back it up. If you can, snapshot the selection with all tasks expanded and post it here.

K
0
 
LVL 63

Expert Comment

by:SysExpert
ID: 23695065
Since this is SBS, check what other tasks are running schedules, and also turn on the alerting option.

While you are at it run the SBS BPA ( best practices analyzer )


I hope this helps !
0
 

Author Comment

by:ryansinn
ID: 23695091
Install the recovery console and attempt to remove the virus from there:
http://support.microsoft.com/kb/216417
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:ryansinn
ID: 23695098
sorry -- wrong question :)
0
 

Author Comment

by:ryansinn
ID: 23695153
Best Practices only has two issues, which I'm ok with:

The Network Driver is more than a Year Old

The Update for Daylight Savings Time (DST) is not installed... it is, I've tried to rerun it and it says it's already installed.

The Scheduled Tasks look fine as well.

Which "Alerting" option are you talking about?
scheduledtasks.png
0
 
LVL 26

Expert Comment

by:lnkevin
ID: 23695714
Schedule task does not look fine. You have something set to run on every hours. This one may randomly start up as the same time with your backup creating the issue in your memory. What is that task (95%)? You should check your task manager when thing start freezing to see what process is taking the CPU and memory.

K
0
 

Author Comment

by:ryansinn
ID: 23697858
looks fine now.  I think that 95% was the SBS Monitoring Service.  I just looked at Scheduled Tasks now... 95% is gone.
scheduledtasks.png
0
 

Author Comment

by:ryansinn
ID: 23697862
not sure why it grabbed the wrong screenshot... but here's the updated Scheduled Taks... no 95%
0
 

Author Comment

by:ryansinn
ID: 23697864
attachment
scheduledtasks.png
0
 
LVL 26

Expert Comment

by:lnkevin
ID: 23705060
You get my statement properly. You need to loook in your schedule task and reorganize it. You have a lot of overlap tasks set in schedule task such as: volume shadow copy, performance data collection.... these tasks can start at the same time with the backup causing the memory insuffient issue. Add more memory to your system or organize your tasks to avoid other activities during NTbackup is running will free up memory for the backup task.

K
0
 

Accepted Solution

by:
ryansinn earned 0 total points
ID: 24761295
Thank you HP for a horrible server - The problem was finally fixed Mid-March... no lockups since replacing the following:

We replaced each of these components separately and retested the server:

Physically Damaged SCSI Cable (damaged clip)
-- didn't fix lockups

Bad System Motherboard
-- Reseating all RAM determined that Slot 3 was flakey / bad.
-- found potential bad memory DIMM(s)

Replaced 2 RAM DIMM
-- failed to book server when mounted in any slot

Bad E200 RAID Controller  (these cards run *HOT* intentionally) due to parts falling over (i think due to heat + poor soldering)
-- New RAID Controller came with out of date firmware (1.66)
--- after being replaced upgraded firmware to 1.80

...

The replacement of the systemboard, memory and raid controlller then upgrading the firmware on the RAID controller fixed the issue.

The server has run for 3 months without crash since replacing all of this hardware... the OS has not been reinstalled since January, so multiple hardware failures created some really inconsistent problems.

Backups have been fine since March as well.
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Restoring deleted objects in Active Directory has been a standard feature in Active Directory for many years, yet some admins may not know what is available.
Backups and Disaster RecoveryIn this post, we’ll look at strategies for backups and disaster recovery.
This tutorial will walk an individual through locating and launching the BEUtility application to properly change the service account username and\or password in situation where it may be necessary or where the password has been inadvertently change…
This tutorial will show how to configure a single USB drive with a separate folder for each day of the week. This will allow each of the backups to be kept separate preventing the previous day’s backup from being overwritten. The USB drive must be s…
Suggested Courses

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question