SQL 2008 HA and DR stategy


my company are currently discussing our HA and DR strategy for one of our core systems that's going to run on a MS SQL2008 server. The RPO is 0% lost transactions and the RTO has to be at an absolute minimum.

We have two servers that have solid state drives in them at our main datacenter. We've created a cluster for the servers using VSA and HP technologies. We have SQL running on the cluster. We're happy with this as our HA solution and with fail-over time. It's been proposed that we couple this SQL cluster with SQL mirroring to our DR site.

We're currently in the processing of extending our network to our DR site. Everything will run on the same subnet. I'm in discussions with our networking team and trying to figure out if we should create a third node of our cluster at the DR site and use that instead of mirroring.

They're reluctant to do this stating the following:

"if the VSa gets isolated the site will go down"
"there is no transparent failover"
"clustering doesn't accomodate for database corruption"

The corruption isn't a concern to me as we'll be doing daily DBCC checks on a staging database. The other two points I'm not sure if they're correct.

Can someone give me some advice on the pros and cons of these two solutions please?

Thanks in advance for the assistance.
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Don't use mirroring, that feature is depreciated in the new versions of sql so if you upgrade you'll have to redo. Its a poor method of resilience and the RTO is poor.

Your network team are correct there is a risk that if the sites become isolated then you will be stuck with two clusters and it will be hard to sort put

Fail over with a cluster is reasonably transparent but it does depend what the latency is between sites

If you are using HP CA replication the that won't account for corruption

Can you give details of the links as that will be a big influence on the best solution

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
telliot79Author Commented:
We will have a 2GB link between our main datacenter and our DR site. There's no redundancy on the link which is where our concerns lay. I'd like to have a third node at the DR but realize if we lose the connection the node will be useless.
a couple more questions
whats the latency on the line?
whats the transactional size of the DB - does it process much in terms of quantity either in number of transactions or physical size i.e. KB,MB or GB

you could think about setting up DB restore and TLOG shipping as a method - the more often you do the TLOGS will cut down on RPO and depending on the backs up you have then you could keep you RTO fairly low by DB restore and TLOG replay

if your application connected to the server by a DNS name thats not the SQL cluster name then a simple DNS change could point at the new SQL server
Has Powershell sent you back into the Stone Age?

If managing Active Directory using Windows Powershell® is making you feel like you stepped back in time, you are not alone.  For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why.

telliot79Author Commented:
i'm unsure of the latency of the line (it's not in yet) but i suspect it will be down to nano seconds as our DR site is literally half a mile away.

transactional size we'll be talking about KB for 90% of the time and during the heavy load in the evening maybe up to a MB or two.

tran log shipping with the DNS modification is our current solution using Sybase Adaptive Server. We were hoping to move away from this and make the process as seemless as possible. I know SQL 2012 offers more HA solutions that may work for us but unfortunately our vendor doesn't support this...

it sounds like you agree that clustering to the DR site isn't a possibilty then?
it technically possible, with 2Gb and if the latency is less that 10ms then most storage replication will support it - am i right in assuming you would be using Continuous Access for Storage replication.

but already having a two node cluster i wouldn suggest something that doesn't introduce risk into the environment as it would cause more issues in a DR failover and failback situation
telliot79Author Commented:
We're not using CA for storage replication. we have two physical hosts that are virtualized with each having a database server and a VSA. The VSA is the shared storage. I'm not entirely sure of how this all ties together just in case you can't tell.

if we went with the third node at the DR site, the fail over manager would reside at the main site and if we lost that site how would the DR site know it was the active node? there wouldn't be a quorum.
by VSA you do mean Vmware Storage Appliance?
thats a software level storage replication

if the Vsphere server is going to be at the central site then until you have that covered i would keep to a simple solution because as you say if you lost that site you wouldn't be able to fail over or fail back
telliot79Author Commented:
thanks irweazelwallis. much appreciated.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server 2008

From novice to tech pro — start learning today.