Link to home
Start Free TrialLog in
Avatar of Shawn Shine
Shawn Shine

asked on

SQL Server 2014 Alwayson failover

Hi everyone,

I have 3 SQL servers 2014 Enterprise (Windows Servers 2012 R2) in a cluster AlwaysOn on 1 availability group and that are configured as follows:
 - Availability Group 1
     Replicas:
    - DB1 : primary server (Availability mode : synchronous commit, failover mode : automatic) located in DC1
    - DB2 : secondary server ( Availability mode : asynchronous commit, failover mode : manual) located in DC2
    - DB3 : secondary server (Availability mode : synchronous commit, failover mode : automatic) located in DC1

 - Availability databases
    - databaseNumber1, databaseNumber2, databaseNumber3

I have to shutdown my main server DB1 because I need to replace 1 memory module which is in a critical state.

In this architecture, there is a problem with the failover of the databaseNumber2.
If DB1 goes down, all databases except databaseNumber2 (certainly a problem of structure of the database) are moved to DB3.

So I will put DB3 in availibity mode : asynchronous, failover mode : manual and when I stop DB1, all databases will stay on this server and won't move to DB3.

My problem :
If DB1 crashes before the change of the memory module, all the databases except databaseNumber2 will move to DB3 (it is currently in synchronous automatic).
If it is impossible to restart DB1, what should I exactly need to do to make the databaseNumber2 working on DB3 ?

Thank you in advance for your help.
Avatar of Patrick Bogers
Patrick Bogers
Flag of Netherlands image

Hi

What if you move through the cluster management? It should tell you what can be wrong.

Cheers
Avatar of Shawn Shine
Shawn Shine

ASKER

Hi Patrick,

Our previous DBA who install the solution SQL Server AlwaysOn spent a lot of time for fixing this problem and looked at every logs:
 - in test environment (similar to production environment) with the same databases, failover for all databases was working.
 - in production environment, failover for the databaseNumber2 always failed.

So in this post, I don't try to fix this issue. As I explained, I need to shutdown my main SQL server DB1 in 3 days because there is a critical problem with 1 memory module.

The failover is enabled between DB1 and DB3 (synchronous, automatic) so If DB1 crashes, all databases will move to DB3 but the databaseNumber2 will failed.

I want to know how to make the databaseNumber2 working on DB3 if the scenario occurs ?

I know that it is possible but I didn't find anything on internet yet.
I don't understand, how can you have the a different Availability Mode on the databases themselves? That is not possible in AG as the Availability Mode is set at the group level between the nodes and NOT between the databases.
Ignore my previous comment. I was confused, those are servers, not databases. So if I understand correctly your Primary node is DB1 set up in synchronous availability mode with DB3 server and asynchronous mode with DB2.

In that case it is safe to do this:

1. Manually failover to DB3. Ater this  DB3 will become Primary, DB1 secondary, both in synchronous av mode, while DC2 will keep its Secondary async mode.
2. Change the Availability mode between DB3 and DB1 to asynchronous and the Failover mode to Manual.

At this point you can do whatever you want on DC1: stop it, increase the memory, restart it, apply updates, etc. The AG group will persist having DC3 as Primary and DC2 as Secondary but will report DC1 as being down, no worries on that. Leave the Av and Fail mode between DC3 and DC2 as is: async and manual. Immediately as the server is back online and the SQL server has restarted fine the AG will pick it up and start bring the data back in sync with the Primary node, DC3 but this will happen only if the down time was not too long! Wait until all the databases and nodes show green and Synchronizing in the Availability Group Dashboard after which you can switch back the Av mode to Synchronous and Fail mode to Automated. After this last change you can safely fail over manually to DC1 as it was in the beginning.
Thank you very much for your explanations Zberteoc.

I forgot to mention that all SQL Servers are running SSRS too and reporting is accessible from another availability group but it is not critical.

When I do the failover from DB1 to DB3, the databaseNumber2 fails to move to DB3 (if i am not wrong the status for the databaseNumber2 becomes "Resolving").  We don't know for the moment why this is happening.
In this case, I have to reboot the server DB3 and all the databases move to the server DB1 again in a normal state.

How long DB1 can stay offline in order to be able to sync with the primary server?

I am going to do a failover from DB1 to DB2 in order to see if all databases can be moved without any problems.
I will put DB2 in Av mode to synchronous and fail mode to automatic and I will put DB3 in Av mode asynchronous, and Fail mode to manual.
But if the databaseNumber2 fails to move on DB2 :
 - can I stop DB1 and do the maintenance as planned and then bring it back ? When databases show green on DB1, I will reboot DB2 in order to move the databaseNumber1 and databaseNumber3 to DB1
 - if DB1 failed and can't be restarted, how can I make the databaseNumber2 working on DB2 ?
Does the resolving state persist?

I did Windows upgrade on both nodes we have and every time the sync came back with no issue. It took about half an hour each upgrade.
Databases don't "move" when you fail over but if their status was fine prior to failing over then they should be fine after as well. The only thing that changes is the node attribute, from Primary to Secondary and vice versa for the target node.

One thing you can do is to bring the DB2 as Sync with Automatic modes before you fail over. If you already tried the fail over and the status of database 2 is still resolving you will have to manually sync it by removing it from AG, restoring with no recovery from the Primary, whichever is at that moment, from a FULL backup followed by a transaction log backup restore with no recovery, the log backup should be taken right after the FULL backup.
Sorry for the late response Zberteoc, I was trying to assimilate all the information that you kindly gave me.  

So, yes the resolving state persist after the failover.

I checked with an other colleague who did a failover in the past with the same servers, he said that if the failover fails, after restarting SQL server service or the server, the failover works.

If DB1 crashes (can't be restarted) and failover fails only for databaseNumber2 (resolving state) on DB2, if I want to make it available on DB2:
 - Remove the databaseNumber from AG
 - In that case, just a Restore Database databaseNumber2 with Recovery is needed to make databaseNumber2 available ?
ASKER CERTIFIED SOLUTION
Avatar of Zberteoc
Zberteoc
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Well noted. I am going to do the failover tomorrow. I will let you know how it went.
Thank you for your help.
I did the failover from DB1 to DB2 and it worked. DB2 became primary on the AG and all databases were online.
Then, at the end of the maintenance, i did the failover from DB2 to DB1. No problems.

If DB1 is primary replica on AG and DB2 and DB3 are in manual failover mode, what will happen if DB1 crashes and it is not possible to turn it on ?  For me, DB2 and DB3 will be secondary replica but in this scenario is there a way to force a replica to become primary replica without DB1?
Of course there is. But from those secondary you have to choose one that will be the preferred node to take the role pf Primary, let's say DB2. It should be the "closest" node, in the same location, if you have nodes across different sites. You will set between DB1 and DB2 notes Synchronous mode and Automatic fail over so in case DB1 fails the Primary role will be taken instantly by DB2, you won't even feel it. Later you can see what was the issue with DB1, fix it and resume the role. Between DB1 and DB3 you can leave asynchronous mode and manual fail over. If you want you can set it so Synchronous as well but only if there si no impact on performance. More than 2 nodes in Synchronous mode could cause delays in response.
Well noted, it is already configured like that in our infrastructure.
Thank you for you help.