I discovered an issue that is somewhat reproducible (on 2 different
motherboards) concerning Windows 20003 Server Standard when a backup is
Basically, after anywhere from 1 to 5 minutes into the backup, the entire
I have reproduced it on 3 servers, 2 of which are running dual Opteron 242
CPUs w/Corsair PC3200 memory (2GB), running 2 RAID 1 sets (SATA RAID on a
SI3114 on-board controller) - a Tyan 2882 motherboard, and a single Opteron
142 CPU with Corsair PC2700 memory (1GB) - same RAID controller &
configuration on a Tyan 2850 motherboard. All machines are running Windows
2003 Server Standard, Symantec A/V Corporate Edition 9, 2 setup as
webservers (IIS HTTP&FTP&MAIL services running) and 1 SQL 2000 box (no other
I've ruled out the network cards as a problem (each board sports both a
Broadcom and an Intel Pro/100 on-board NIC) as I have tried the on-board
NICs by utilizing an offboard NIC.
No errors are in any of the event logs, in fact there are no errors reported
at all of any kind and, unfortunately, no "blue screen of death".
Replacing the motherboard, memory, drives and CPUs have no effect, either.
The last event before complete lock up (screen frozen, keyboard frozen,
unpingable interfaces, etc) is the following:
Event Type: Information
Event Source: Service Control Manager
Event Category: None
Event ID: 7036
Time: 12:13:43 AM
The Microsoft Software Shadow Copy Provider service entered the running
For more information, see Help and Support Center at
From past experience, I feel this is most likely due to some sort of
hardware/driver conflict, but without a memory dump or some sort of error
log, I am at a loss as to how to fix this problem - it seems to be related
to a Shadow Copy bug, but because the machines freeze at different times
during the backup (eg usually crashes somewhere between 500MB and 4GB).
The weird thing is that sometimes it doesn't crash (it does not crash if the
RAID SETs are rebuilding - which is what happens when I have to
reset/powercycle the box - I confirmed this twice, but because I tried the
backup after rebooting, the ultimate reason for it working could also be
because the system was rebooted, thus clearing the memory - the backups are
normally done after the machine was idling for a day, which means it could
be a memory leak causing the backups to fail).
I think it may be somehow related to the SATA RAID controller by Silicon
Image as their drivers (for the 3114) were not "logo" approved and their
management software runs on Java (which could be another problem - the JVM
has a history of being problematic in my experience) so I went ahead and
ordered a RocketRAID card to see if that is the problem.
I also stress-tested the system with some off-the-shelf tools, with very
large file transfers via Ethernet (10 to 100GB in both directions
simultaneously), extreme memory & paging transfers and a CPU burn-in test
(cooking at 100%) as well as an Ethernet stress test - all at once for 6
hours straight with no errors nor lockups.
More and more it seems to point to a possible bug or conflict with "Shadow
Copy" and Silicon Image controllers (or something else these machines have
in common, Opteron CPUs perhaps). Since commercial backup products also
utilize shadow copy, I am afraid to blow $1000 or so on another package and
end up with the same results.
Has anyone else had a problem with this?