Solved

win 2003 server standard NTBackup.exe utility freezes computer(s)

Posted on 2004-10-12
7
1,279 Views
Last Modified: 2007-12-19
I discovered an issue that is somewhat reproducible (on 2 different
motherboards) concerning Windows 20003 Server Standard when a backup is
running.
Basically, after anywhere from 1 to 5 minutes into the backup, the entire
system freezes.
I have reproduced it on 3 servers, 2 of which are running dual Opteron 242
CPUs w/Corsair PC3200 memory (2GB), running 2 RAID 1 sets (SATA RAID on a
SI3114 on-board controller) - a Tyan 2882 motherboard, and a single Opteron
142 CPU with Corsair PC2700 memory (1GB) - same RAID controller &
configuration on a Tyan 2850 motherboard.  All machines are running Windows
2003 Server Standard, Symantec A/V Corporate Edition 9, 2 setup as
webservers (IIS HTTP&FTP&MAIL services running) and 1 SQL 2000 box (no other
services running).

I've ruled out the network cards as a problem (each board sports both a
Broadcom and an Intel Pro/100 on-board NIC) as I have tried the on-board
NICs by utilizing an offboard NIC.
No errors are in any of the event logs, in fact there are no errors reported
at all of any  kind and, unfortunately, no "blue screen of death".
Replacing the motherboard, memory, drives and CPUs have no effect, either.

The last event before complete lock up (screen frozen, keyboard frozen,
unpingable interfaces, etc) is the following:

Event Type: Information
Event Source: Service Control Manager
Event Category: None
Event ID: 7036
Date:  10/13/2004
Time:  12:13:43 AM
User:  N/A
Computer: KPSWS1
Description:
The Microsoft Software Shadow Copy Provider service entered the running
state.

For more information, see Help and Support Center at
http://go.microsoft.com/fwlink/events.asp.


From past experience, I feel this is most likely due to some sort of
hardware/driver conflict, but without a memory dump or some sort of error
log, I am at a loss as to how to fix this problem - it seems to be related
to a Shadow Copy bug, but because the machines freeze at different times
during the backup (eg usually crashes somewhere between 500MB and 4GB).
The weird thing is that sometimes it doesn't crash (it does not crash if the
RAID SETs are rebuilding - which is what happens when I have to
reset/powercycle the box - I confirmed this twice, but because I tried the
backup after rebooting, the ultimate reason for it working could also be
because the system was rebooted, thus clearing the memory - the backups are
normally done after the machine was idling for a day, which means it could
be a memory leak causing the backups to fail).

I think it may be somehow related to the SATA RAID controller by Silicon
Image as their drivers (for the 3114) were not "logo" approved and their
management software runs on Java (which could be another problem - the JVM
has a history of being problematic in my experience) so I went ahead and
ordered a RocketRAID card to see if that is the problem.

I also stress-tested the system with some off-the-shelf tools, with very
large file transfers via Ethernet (10 to 100GB in both directions
simultaneously), extreme memory & paging transfers and a CPU burn-in test
(cooking at 100%) as well as an Ethernet stress test - all at once for 6
hours straight with no errors nor lockups.
More and more it seems to point to a possible bug or conflict with "Shadow
Copy" and Silicon Image controllers (or something else these machines have
in common, Opteron CPUs perhaps).  Since commercial backup products also
utilize shadow copy, I am afraid to blow $1000 or so on another package and
end up with the same results.

Has anyone else had a problem with this?

TIA

DAve
0
Comment
Question by:simplyamazing
  • 4
  • 3
7 Comments
 
LVL 21

Accepted Solution

by:
briancassin earned 500 total points
ID: 12295265
Volume Shadow copies do have a problem causing freezes and interupting backups.... Try turning off volume shadow copies and see if this resolves your problem.

If any service, application etc... is accessing that server while you are trying to do the backup with volume shadow copies enabled. The volume shadow copies will not complete successfully and will get in a loop like state.
0
 
LVL 21

Expert Comment

by:briancassin
ID: 12295273
What most likely is happening is while you are trying to run the backup volume shadow copies is accessing different files you are trying to backup... Both of them intermittently end up hitting on the same file.... then the battle begins and whoever gets there first locks the other one out or they both lose either way a lock up results.
0
 
LVL 21

Expert Comment

by:briancassin
ID: 12295307
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 21

Expert Comment

by:briancassin
ID: 12295316
0
 

Author Comment

by:simplyamazing
ID: 12300689
Turned off shadow-copy and did another backup, but it locked up after about 12GB (ie complete system-wide freeze), this blows away the shadow copy theory.

I did a "C drive" to "D drive" backup with shadow copy 'on' and it did not lock up - but then this could of just been luck.  
If not luck, then maybe the network redirector is the culprit.
A drive letter z: was mapped to a network share on another standalone server for all the test backups done so far (I tried changing the share to another machine w/ running WinXP to rule out the destination machine).

I've set the system to backup to drive D every hour for the next 8 hours, if that is a success, then the network redirector is most-likely the culprit and not shadow copy.



0
 

Author Comment

by:simplyamazing
ID: 12310168
Did 10 more backups from drive to drive without a single lockup utilizing shadow copy on NTbackup.exe.

So the problem is consistently with NTbackup only when it is backing up to a network share, regardless of whether shadow copy is on or off, ruling out shadow copy as the culprit.

Changing network cards has no effect (w/diff brands), ruling out NICs/drivers as the problem.
Because local drive backups work in all cases, the SATA RAID is off the hook as the culprit.

Copying/Moving enormous files (10-100GB) manually does not cause any lockups, ruling out any throughput/network issues.

Vigorous hardware testing reveals nothing out of the ordinary (in fact, the TYAN 2882 is probably the most impressive motherboard I have seen! I fully expected a system crash with these tests, but did not get one as I usually do with other brands).

In short: NTbackup.exe on Windows 2003 Server Standard has a compatibility problem with the network redirector (or vice-versa), so I'd better buy a 3rd party backup software package.
There still, however, remains the possibility that the Southbridge motherboard drivers could be an issue as the on-board NICs are sharing the PCI bus (connected to SB).
0
 

Author Comment

by:simplyamazing
ID: 12511816
forgot to close this one out.

The problem turned out to be the onboard BroadCom Gigabit Ethernet NICs,  apparently, they have a problem with Windows 2003 Server (driver? interrupts? who knows! ).

Disabling the onboard BroadCom's and replacing with dual Intel Pro/1000 Server GB adapter fixed the problems.
It may be that the BroadCom is not completely compatible with the Tyan chipset or a badly written driver is the cause.  I sent them emails over the course of several weeks to try every possible combination of things that would cause system freezes.  After over 60 separate trial and error tests, the BroadCom chip was isolated as the sole cause of all my brain-racking problems with all the motherboards involved.

What a friggin' nightmare!  I'm afraid to buy any more system boards utilizing BroadCom chips for fear this will happen again (which, unfortunately, is just about every Opteron board maker out there)!


0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many of us need to configure DHCP server(s) in their environment. We can do that simply via DHCP console on server or using MMC snap-in on each computer with Administrative Tools installed in a network. But what if we have to configure many DHCP ser…
While rebooting windows server 2003 server , it's showing "active directory rebuilding indices please wait" at startup. It took a little while for this process to complete and once we logged on not all the services were started so another reboot is …

832 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question