asked on

DPM server loses connection to Drobo FS during backup

I have a virtualized environment with a Windows Server 2008 R2 Enterprise (Core) physical server running Hyper-V, containing a Windows Server 2008 R2 Standard virtual machine on which I have installed Microsoft Data Protection Manager 2010. It backs up other VMs to a physical RAID-1 array that is implemented using hardware RAID and made available to the DPM server via Hyper-V's pass-through-disks feature. "Tape" backups are done via Cristalink Firestreamer, which provides a virtual tape library to DPM. Data gets backed up to virtual tape files that reside on the Drobo FS.

I use two sets of 2xWD RE4 enterprise drives in the Drobo, which gives them a proprietary RAID-1. Every week, the Drobo is shut down, and the current disk set is moved to offsite storage. The previous week's disk set is retrieved from offsite storage and used for the next full backup (weekly, alternating between disk sets).

This setup worked fine for almost 2 months before backups started failing every week because the DPM server lost the connection to the Drobo some hours into the backup. The next day, I would have to retry the backup (good thing DPM lets you resume) multiple times to get it to complete. If I try to back up to virtual tapes using another VM host as the target instead of the Drobo, the backups always succeed. The messages I get from FsHelperSvc (Firestreamer, event 30012) in the Windows Application event log when it fails are:

10/17/2011 21:58:39 | Error | E002 | L1T002 | The media drive reported the following error: The specified network name is no longer available [C000020C] | file://\\<path_to_drobo>\<tape_name>.fsrm*12288 | <tape_barcode>
10/17/2011 21:58:39 | Error | E012 | L1T002 | Unable to write data to the medium. | file://\\<path_to_drobo>\<tape_name>.fsrm*12288 | <tape_barcode>

And in the System log:

Source: clfs3mtp
Event ID: 7
Date and Time: 10/17/2011 9:58:39 PM
Description: The device, \Device\TapeDrive1, has a bad block.

"The specified network name is no longer available" is an error returned to Firestreamer by Windows when the computer is unable to access the network share, so Firestreamer is likely not the problem.

Drobo Support has not been able to resolve the problem. They've had me update Drobo Dashboard, update the Drobo firmware, switch Ethernet cables, copy files to the Drobo's file share using Windows Explorer, check for static IP, uninstall and reinstall Drobo Dashboard, and downdate the Drobo firmware. None of this has helped. Drobo diagnostic files are encrypted, so there's no way for me to see what the Drobo experiences. Support has told me that its network connection is up at the time of the failure, but I still suspect that it's dropping the connection or that maybe there is file system corruption.

I'll stress that I can access the Drobo from Windows Explorer on the server, so it's not like the server can't contact the Drobo at all under normal circumstances.

The most recent thing I tried was turning off disk spindown in the Drobo's settings. I also installed this hotfix from Microsoft.

Gerwin Jansen

Hello joshvaquez, nice setup you have. I like the part where you move the disk set offsite.

The eventlog messages you show have a tapedrive related error message as well, do you know when these tape error messages started? What happens when streaming to the tape drive is running and you get this 'bad block' message? Does the back process stop at that moment? Timestamps of all messages are the same so it's hard to tell what happens first.