asked on

Exchange 2010 DAG member and failed direct access storage

I have a situation with one Exchange 2010 Enterprise, SP1 server that is a member of a four server DAG. All of the DAG members replicate the databases between each other so a copy of the database is on each server. One of the direct access storage towers on one of the DAG members failed (two of the DAG members store their databases on SAN storage, and the third member is also using a DAS tower.

The OS of the server with the failed DAS is intact as the storage tower only stores database copies. The problem is the DAS towers, due to cost when we went to Ex 2010, were set up in a RAID 0 JBOD array. The array failed, so I had to rebuild the disk array configuration and wipe the drives. In the EMC the downed server and databases show "ServiceDown".

So, the next step is to bring up the Exchange server DAG member that is attached and set up the mount points and paths for all the databases as they were before. Again, the OS of the server is fine, so I don't see I need to reinstall Exchange. I figured I do this with the Exchange server services disabled, and the server off the network for the time being.

Once I have the disk storage and database folders set up again, I was going to bring the DAG member online, and figured the EMC will show those databases as "Failed", and then I can reseed them from the other servers. However, I ran the idea by an MS support tech (who really seemed not sure what to do which is why I ask here), and he suggested restoring the databases to the folders on the fixed DAS storage, then enabling the Exchange services on the DAG member. The databases would then show out of sync and reseed them that way. I'm not convinced I need to do a restore of the database files from tape. I can reseed the databases without issue with no tape restore as far as bandwidth goes, but the MS guy seemed to feel it would not work without a version of the database files restored.

For comparison, when we've had a drive failure on this DAS array before, because of the way the drives are set up, we would see one or two failed databases, NOT restore database files from tape and reseed the databases. Even with a complete failure of the DAS, once the original file paths are set up I would think this would theoretically work the same.

Thoughts?

The MS tech was even suggesting I run the setup.exe and with the switches to restore the Exchange server, when I've not removed anything from AD configs, nor have had to rebuild the server OS or reinstall Exchange on the server, so I'm not sure the MS guy understood my scenario. Hence why I'm checking here.

ASKER CERTIFIED SOLUTION

Member_2_4940386

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ckratsch

I would also avoid the restoring part. If necessary, just remove the failed DAG member as a passive copy holder for each of its databases, then readd it when the storage is online.

Side note, MS says that with three or more DB copies in a DAG, you can stop doing backups and turn on circular logging, so recovering the DB to the failed DAG member before being able to reseed can't be required.

gsmartin

I agree with Ckratsch.

Just to add, eventhough Microsoft says with a three or more database copies you don't need to backup. I am not 100% comfortable or confident on relying on no backup. In the event that one of your primary active databases gets corrupted the corruption replicates to all of the passive DAG copies. Having no back is very risky irrespective of multiple DAG copies.

In my environment about 4 months ago, we experienced a database corruption of one of our four exchange databases; but fortunately we were able to recover from the issue. Although, given the scenario it's not worth the risk to not back up. Therefore, I recommend backing up at least one of the passive DAG servers as a safeguard.

ckratsch

DAG replication happens with transaction log shipping, not by replicating database files, sO the corruption scenario you're describing isn't possible.

I'll find a link here about it in a bit.

ckratsch

http://www.windowsitpro.com/article/backup-recovery/more-on-going-backup-less-with-exchange-2010

This describes how Exchange 2010 DAG uses the same CCR/LCR/SCR replication technology that Exchange 2007 did:

http://technet.microsoft.com/en-us/library/dd638137.aspx#DA

And this describes all those in Exchange 2007 as "log shipping and replay":

http://technet.microsoft.com/en-us/library/bb676502(v=exchg.80).aspx

---

If a database on a primary becomes corrupted, not only will that not affect other database copies, but the health checks will notice that something has gone awry, and handle database failover and repair automatically. (Exchange 2010 does attempt some database and mailbox repairs without admin intervention.)

gsmartin

So, I assume your experience is correct given your readings and assumptions without actually experiencing the problem vs. my encounter and outage my company experienced as a result of a CORRUPTION that did in fact affect other copie. Not to mention, the hours spent on the phone with Microsoft to resolve the issue. Which the resolution certainly wasn't automatic.

Darthyw

ASKER

Thanks all.
I have the DAS nearly ready with all original drives formatted, etc. Right now the Exchange services on the "down" DAG member are disabled, and the DAG server is disconnected from the network. Since the storage failure the DAG server shows "ServiceDown" for the database copies it is holding (In EMC go to Organization, Mailbox, Database Management, and then the Copies tab.) Keep in mind that once the storage failed to come online the Exchange server DAG member was down until I fixed the storage. I'm thinking I just need to reenable the Exchange services on the DAG member, reconnect the NIC, reboot, then the copies will show "Failed" and I can reseed them. Does that sound reasonable? Am I missing a step?

And we do tape backups of the databases. Even though our DAG spans two sites, I'd also like a copy on tape offsite in a vault. And I've read of copy corruption being a possibility in my studies - thankfully not seen it myself. We have considered even running a lagged passive copy for that reason.

Member_2_4940386

I believe that should work, although I am not sure since there will be no data present.

You could also just remove the existing ServiceDown copies and then add them back once their services are up. That is the way I would do it, personally, and I know it would work.

Darthyw

ASKER

Well, I had tried to remove the ServiceDown copies and I get a message that there are other copies on other servers as well and I need to remove those first. I'm thinking if I brought the Exchange server up, even with failed databases if they won't seed I can delete it.

Darthyw

ASKER

Here is what I did, but basically restoring the databases from tape was unnecessary as I expected.

1)      On the direct storage on which the databases resided I wiped the RAID0 configuration on the JBOD, and set it back up again – initializing all the disks.
2)      I booted the DAG member – this is the original install I did not have to rebuild the Exchange server – while the network was unplugged. Turned off and disabled all the Microsoft Exchange services that were set to automatic.
3)      In Windows I set up all the mount points and folders as they were before I wiped the direct access storage configuration, so the path to the databases on the problem DAG member were the same as on the other DAG members. I DID NOT do a restore of the database files from tape, or other media. I simply left the folders empty.
4)      I reenabled the Exchange services, plugged in the NIC, then rebooted the DAG server.
5)      When the DAG server came online again the databases showed as “failed and suspended” whereas when this DAG member was offline the databases showed “ServiceDown”. Then, I was able to delete the database copies from the DAG member in EMC, then Add new copies.