A DR failback plan

I have a hypothetical question that will hopefully never need to be exercised.

I have set up a DR site away from our production datacenter, and use Dell appassure to replicate backups and have standby images of the critical machines there ready to turn on should they be needed.  Due to shortcomings in the Dell software, we must rely on log shipping to keep SQL standby server up to date, but we are able to deal with that.

The question is around the most efficient method of failback once the crisis is over.  To get the SQL data back I plan to just back up the current data and restore it back on to the production server.  The other servers might not be so straight forward. Since Appassure takes far too long to get replication going, and the export to standby can can days for the initial export to VM, I don't think the advertised method of failover the appassure backups and then export back into the production datacenter can be relied upon.

I am looking into alternatives.

Please look for problems in the following idea, and/or let me know what is working in the rest of the world.

There are 5 critical servers now running in the DR  They are only file servers.  At this point I am assuming the production datacenter is back alive, and the servers there are either restored from backup, or are repaired, and just have out of date information. Initially the link between the Production and DR datacenters is turned off.  I would propose changing the names of the servers that are running, say server1 becomes server1DR, Then turn on the link between the two, and use something as simple as terracopy or robocopy to copy only newer files back into production.  Production would have to be scheduled for downtime for as much as a weekend, but if the time period running from the DR is short, the changed data will not be that much, and probably be be ready to go back up within a day.

Possible issue:  Will Active directory have a fit when two machines that are clones of each other (except for the machine name) appear on the network?

Any other thoughts or better ideas?
Bill HerdeOwnerAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

If you re-join one of the two server "clones" to the domain that should be okay.

How much data are you talking about?

It sounds like your systems don't have enough storage performance to handle things like exporting to VM. It shouldn't take that long. What does Dell say? I only trialed AppAssure when 5.0 came out, and I didn't find it stable enough to use. They have awesome deduplication, but I can see how that would really tax a system over time as the data for a system would be 100% fragmented on the storage and would be a really killer to try to read quickly from spinning disk. An investment in SSD and/or 10K drives could be warranted.

One thing I am thinking of is that to fail back, is that you could take a fresh backup of the DR systems and then replicate those back. You wouldn't have a crazy long backup chain, so that might be faster to get replicating.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Bill HerdeOwnerAuthor Commented:
Thanks for the response.

Rejoining the domain with the rebuilt machines was something I was considering as part of the plan, but not sure if it was warranted.

The amount of data to sync back other than SQL would be around 100GB of changed data every day.  I have a 100MB link between the two sites which brings it into the realm of do able over a day.  The SQL would have be done using compressed backups, which looking at the daily SQL backups will be just under 400GB.  The SQL restore will take over 6 hours, but that also still keeps it in a one day failback plan.

The performance of Appassure has been a real disappointment. We did all the Dell guy on-site-performance-Dpack-size-and-hardware checks, and ended up with a system WAY underspeced. The issues did not show up while testing with the trial on the same hardware because only 4 servers were put in the test.  When we scaled up to all 26 servers, well you probably can guess the rest.  The package can still do the DR thing well enough as long as the remote standby images are updated with every hourly snapshot.  That keeps my DR just about 1.5 hours behind production.  

The Appassure failover failback routine would in theory be able to do a failback without having to start over, but when I tested it on a small server, it first had to send a full base image back to the original source. So, that didn't work right.

SQL on appassure is a joke.  It takes 11TB to keep a weeks worth of hourly snapshots of a system that has less than 1TB total files in all the databases. Of that week, only two days keep every hour.   Add to this that we had to drop back from SQL cluster to single server (on VMware HA) because Appassure cannot export a RDM disk.... anywhere!  

Backup is great, and snapshots complete quickly, but if you need that data, it will be a long time to retrieve.  Output for recovery for single file or bare metal or even export to VM and replication is all about the same speed.  Max I have ever seen is 14MB/s, with average hanging around 4MB/s.

What does Dell have to say? I has been a struggle to get everything going properly even after spending an additional 80 grand for bigger servers and new SANs.  I have such a string of tickets with them I now know half the support staff by first name.  We made a big mistake not only in buying the product, but trying too long to get it right.  Dell would not refund after 6 months even though the product was nowhere near working properly.

And some day I will tell you how I really feel.
That's why I am a happy Unitrends custome,  though I don't have my backups replicating yet.

Have you looked at Dfs-R for file replication?
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

Bill HerdeOwnerAuthor Commented:
I thought DFS required server enterprise?
DFS-R requires Windows 2003 R2 Standard or better. If either end is Windows Server Enterprise, or 2012 then it uses a more efficient algorithm. Your spend is high enough that is sounds to me like you should either already be using Windows Data Center for VM licensing, or you can upgrade/migrate a file server to 2012 R2.
Bill HerdeOwnerAuthor Commented:
2 of the servers are 2012, not R2.  The rest are 2008 various flavors with a few 2003 that are going away by the end of the year. I had DFS set up between two enterprise servers a few years ago, but it was not being very efficient across the WAN.  We have a much larger pipe now so I will look into that again for one server in particular.
Bill HerdeOwnerAuthor Commented:
Thanks for the feedback.
On a somewhat related note, I am interested in what performance you get recovering out of Unitrends.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Storage Software

From novice to tech pro — start learning today.