Solved

win 2003 server standard NTBackup.exe utility freezes computer(s)

Posted on 2004-10-12
7
1,272 Views
Last Modified: 2007-12-19
I discovered an issue that is somewhat reproducible (on 2 different
motherboards) concerning Windows 20003 Server Standard when a backup is
running.
Basically, after anywhere from 1 to 5 minutes into the backup, the entire
system freezes.
I have reproduced it on 3 servers, 2 of which are running dual Opteron 242
CPUs w/Corsair PC3200 memory (2GB), running 2 RAID 1 sets (SATA RAID on a
SI3114 on-board controller) - a Tyan 2882 motherboard, and a single Opteron
142 CPU with Corsair PC2700 memory (1GB) - same RAID controller &
configuration on a Tyan 2850 motherboard.  All machines are running Windows
2003 Server Standard, Symantec A/V Corporate Edition 9, 2 setup as
webservers (IIS HTTP&FTP&MAIL services running) and 1 SQL 2000 box (no other
services running).

I've ruled out the network cards as a problem (each board sports both a
Broadcom and an Intel Pro/100 on-board NIC) as I have tried the on-board
NICs by utilizing an offboard NIC.
No errors are in any of the event logs, in fact there are no errors reported
at all of any  kind and, unfortunately, no "blue screen of death".
Replacing the motherboard, memory, drives and CPUs have no effect, either.

The last event before complete lock up (screen frozen, keyboard frozen,
unpingable interfaces, etc) is the following:

Event Type: Information
Event Source: Service Control Manager
Event Category: None
Event ID: 7036
Date:  10/13/2004
Time:  12:13:43 AM
User:  N/A
Computer: KPSWS1
Description:
The Microsoft Software Shadow Copy Provider service entered the running
state.

For more information, see Help and Support Center at
http://go.microsoft.com/fwlink/events.asp.


From past experience, I feel this is most likely due to some sort of
hardware/driver conflict, but without a memory dump or some sort of error
log, I am at a loss as to how to fix this problem - it seems to be related
to a Shadow Copy bug, but because the machines freeze at different times
during the backup (eg usually crashes somewhere between 500MB and 4GB).
The weird thing is that sometimes it doesn't crash (it does not crash if the
RAID SETs are rebuilding - which is what happens when I have to
reset/powercycle the box - I confirmed this twice, but because I tried the
backup after rebooting, the ultimate reason for it working could also be
because the system was rebooted, thus clearing the memory - the backups are
normally done after the machine was idling for a day, which means it could
be a memory leak causing the backups to fail).

I think it may be somehow related to the SATA RAID controller by Silicon
Image as their drivers (for the 3114) were not "logo" approved and their
management software runs on Java (which could be another problem - the JVM
has a history of being problematic in my experience) so I went ahead and
ordered a RocketRAID card to see if that is the problem.

I also stress-tested the system with some off-the-shelf tools, with very
large file transfers via Ethernet (10 to 100GB in both directions
simultaneously), extreme memory & paging transfers and a CPU burn-in test
(cooking at 100%) as well as an Ethernet stress test - all at once for 6
hours straight with no errors nor lockups.
More and more it seems to point to a possible bug or conflict with "Shadow
Copy" and Silicon Image controllers (or something else these machines have
in common, Opteron CPUs perhaps).  Since commercial backup products also
utilize shadow copy, I am afraid to blow $1000 or so on another package and
end up with the same results.

Has anyone else had a problem with this?

TIA

DAve
0
Comment
Question by:simplyamazing
  • 4
  • 3
7 Comments
 
LVL 21

Accepted Solution

by:
briancassin earned 500 total points
Comment Utility
Volume Shadow copies do have a problem causing freezes and interupting backups.... Try turning off volume shadow copies and see if this resolves your problem.

If any service, application etc... is accessing that server while you are trying to do the backup with volume shadow copies enabled. The volume shadow copies will not complete successfully and will get in a loop like state.
0
 
LVL 21

Expert Comment

by:briancassin
Comment Utility
What most likely is happening is while you are trying to run the backup volume shadow copies is accessing different files you are trying to backup... Both of them intermittently end up hitting on the same file.... then the battle begins and whoever gets there first locks the other one out or they both lose either way a lock up results.
0
 
LVL 21

Expert Comment

by:briancassin
Comment Utility
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 21

Expert Comment

by:briancassin
Comment Utility
0
 

Author Comment

by:simplyamazing
Comment Utility
Turned off shadow-copy and did another backup, but it locked up after about 12GB (ie complete system-wide freeze), this blows away the shadow copy theory.

I did a "C drive" to "D drive" backup with shadow copy 'on' and it did not lock up - but then this could of just been luck.  
If not luck, then maybe the network redirector is the culprit.
A drive letter z: was mapped to a network share on another standalone server for all the test backups done so far (I tried changing the share to another machine w/ running WinXP to rule out the destination machine).

I've set the system to backup to drive D every hour for the next 8 hours, if that is a success, then the network redirector is most-likely the culprit and not shadow copy.



0
 

Author Comment

by:simplyamazing
Comment Utility
Did 10 more backups from drive to drive without a single lockup utilizing shadow copy on NTbackup.exe.

So the problem is consistently with NTbackup only when it is backing up to a network share, regardless of whether shadow copy is on or off, ruling out shadow copy as the culprit.

Changing network cards has no effect (w/diff brands), ruling out NICs/drivers as the problem.
Because local drive backups work in all cases, the SATA RAID is off the hook as the culprit.

Copying/Moving enormous files (10-100GB) manually does not cause any lockups, ruling out any throughput/network issues.

Vigorous hardware testing reveals nothing out of the ordinary (in fact, the TYAN 2882 is probably the most impressive motherboard I have seen! I fully expected a system crash with these tests, but did not get one as I usually do with other brands).

In short: NTbackup.exe on Windows 2003 Server Standard has a compatibility problem with the network redirector (or vice-versa), so I'd better buy a 3rd party backup software package.
There still, however, remains the possibility that the Southbridge motherboard drivers could be an issue as the on-board NICs are sharing the PCI bus (connected to SB).
0
 

Author Comment

by:simplyamazing
Comment Utility
forgot to close this one out.

The problem turned out to be the onboard BroadCom Gigabit Ethernet NICs,  apparently, they have a problem with Windows 2003 Server (driver? interrupts? who knows! ).

Disabling the onboard BroadCom's and replacing with dual Intel Pro/1000 Server GB adapter fixed the problems.
It may be that the BroadCom is not completely compatible with the Tyan chipset or a badly written driver is the cause.  I sent them emails over the course of several weeks to try every possible combination of things that would cause system freezes.  After over 60 separate trial and error tests, the BroadCom chip was isolated as the sole cause of all my brain-racking problems with all the motherboards involved.

What a friggin' nightmare!  I'm afraid to buy any more system boards utilizing BroadCom chips for fear this will happen again (which, unfortunately, is just about every Opteron board maker out there)!


0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Numerous times I have been asked this questions that what is it that makes my machine log on so slow, there have been cases where computers took 23 minute exactly after taking password and getting to the desktop. Interesting thing was the fact th…
ADCs have gained traction within the last decade, largely due to increased demand for legacy load balancing appliances to handle more advanced application delivery requirements and improve application performance.
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now