Link to home
Start Free TrialLog in
Avatar of marrowyung
marrowyung

asked on

setup SQL server AOG for HA and DR

hi,

For SQL server DR and HA, since SQL 2012 we starting have AOG, so now is it say that we only has to configure the AOG then we handle both HA and DR across different far away location ?

how to setup AOG for different site and subnet for auto failover  ?
the write/primary nodes can failover to other long distance site ?

usually if other office has a an application and that application when failover, has to write to a long distance primary miles away and it will bring application latence, how we solve this problem? we need to setup a separate AOG for each office and setup merge replication between them so that each office update their own in their own AOG and then merge the result together?
Avatar of lcohan
lcohan
Flag of Canada image

First of all we'll need to understand what actual scenario you are talking about and need as there's no "SQL server AOG" so is hard to give an answer to an unclear question.

As noted here: https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/always-on-availability-groups-sql-server?view=sql-server-ver15
Always On availability groups is the full, formal name for this availability feature. The abbreviation is AG, not AOAG or AAG.

If AG is what you are inquiring about and in particular how to set up "Always On Availability Group" for High Availability and Disaster Recovery then I suggest you start reading here:
https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/overview-of-always-on-availability-groups-sql-server?view=sql-server-ver15
https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/always-on-availability-groups-sql-server?view=sql-server-ver15
https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/availability-modes-always-on-availability-groups?view=sql-server-ver15

and decide the actual scenario you need and re-post that question so we can help.

This is another great resource https://www.brentozar.com/sql/sql-server-alwayson-availability-groups/ that can help you understand the different scenarios, potential issues/caveats and how to set up the servers in various scenarios/locations depending on your actual needs.
Avatar of marrowyung
marrowyung

ASKER

"First of all we'll need to understand what actual scenario you are talking about and need as there's no "SQL server AOG" so is hard to give an answer to an unclear question.
"

always on

"Always On availability groups is the full, formal name for this availability feature. The abbreviation is AG, not AOAG or AAG.
"
exactly that.

so now all HA and DR handled by Always on except we need to handle cross site latent between application server and DB server, like application server is in location A but DB is in location B, far away from each other.

how you guys handle this kind of situation ? for me, we setup another DB in location A and application in location A access that DB, and by this, new DB can't be in the SAME AOAG as the original one. so replication between them is needed.
I am reading this:

"https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/always-on-availability-groups-sql-server?view=sql-server-ver15"

and it say:

SQL Server 2019 (15.x) increases the maximum number of synchronous replicas to 5, up from 3 in SQL Server 2017 (14.x). You can configure this group of[b] five replicas to have automatic failover [/b]within the group. There is one primary replica, plus four synchronous secondary replicas.

Open in new window


but it also say :

Each availability group supports one primary replica and up to eight secondary replicas.

I though all replicas in the SAME site is ALL synchronous replicas? it is not ? why only some of that ? so BEFORE SQL 2019, we can only have 1 x primary replica and 3 x synchronous repilcas and the rest of 5 secondary replicas MUST BE set to Async ?

and this link:

https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/active-secondaries-backup-on-secondary-replicas-always-on-availability-groups?view=sql-server-ver15

In a distributed availability group, backups can be performed on secondary replicas in the same availability group as the active primary replica, or on the primary replica of any secondary availability groups. Backups cannot be performed on a secondary replica in a secondary availability group because secondary replicas only communicate with the primary replica in their own availability group. Only replicas that communicate directly with the global primary replica can perform backup operations

Open in new window


" secondary replicas in the same availability group as the active primary replica"

I do not understand this. secondary replicas refer to active primary replica ?

and it say:

"RESTORE statements are not allowed on either the primary or secondary databases of an availability group."

Then how can I restore ? is not allow on secondary database (secondary replicas) makes sense as that one is read only ! we can't restore over it.

"BACKUP DATABASE supports only copy-only full backups of databases, files, or filegroups when it is executed on secondary replicas. Note that copy-only backups do not impact the log chain or clear the differential bitmap."

so full backup and differential backup can only execute on primary replica ?

or the copy backup + log backup on secondary replicas can already allow us to do complete restore  ?

also here:

https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/availability-modes-always-on-availability-groups?view=sql-server-ver15#SyncCommitWithManual

I am interested on these:

Synchronous-Commit Mode with Only Manual Failover

Synchronous-Commit Mode with Automatic Failover

Open in new window


as usually for synchronous commit mode I expect primary AUTOMATICALLY failover to any secondary replica, why there has ONLY manual failover for us ?
Is not clear still if you have an existing implemented AG in production and there are some related issues OR you are looking for knowledge about how to set one and what the beast fit scenario is in your case - right?

So from what toy are saying above and quoted below
so now all HA and DR handled by Always on except we need to handle cross site latent between application server and DB server, like application server is in location A but DB is in location B, far away from each other.

how you guys handle this kind of situation ? for me, we setup another DB in location A and application in location A access that DB, and by this, new DB can't be in the SAME AOAG as the original one. so replication between them is needed.

in my opinion you need to forget:
I am interested on these:

Synchronous-Commit Mode with Only Manual Failover
Synchronous-Commit Mode with Automatic Failover

if you want to have the replicas in different physical location than the primary SQL as the "Asynchronous-commit mode" would be more appropriate for this architecture.
Asynchronous-commit mode. This availability mode is a disaster-recovery solution that works well when the availability replicas are distributed over considerable distances.

Synchronous-commit mode. This availability mode emphasizes high availability and data protection over performance, at the cost of increased transaction latency. A given availability group can support up to three synchronous-commit availability replicas, including the current primary replica.
https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/availability-modes-always-on-availability-groups?view=sql-server-ver15
Again, you will need to choose first what you want at high level then choose the AG solution/setup that fits and not vice-versa.
" you are looking for knowledge about how to set one and what the beast fit scenario is in your case - right?"

this one
OK so in that case we should start with details about your infrastructure except we don't want to put here any sensitive info just high level therefore considering all the above....

My guess is that you will have the Primary SQL Server in your main/primary DC(data center) A - Location A and tis is the location where all your apps/systems are running right? I assume that you have (or will have) a few Stand by Replicas in a DC - B, Location B which is different than A right?


In that case you will need to figure out together with networking what is the minimum speed that you need between the two DC's in order to be able to use synchronous-commit mode which supports manual failover and, optionally, automatic failover. After you determined the minimum speed to achieve that and confirm that it is possible to implement this "synchronous-commit mode that supports manual failover and, optionally, automatic failover." then you move to next step and decide how many "B" type locations you want and how many synchronous-commit mode replicas you want - depending if you need to open one or few replicas for readonly reporting for instance.

Also think about the fact that you could also add a Location C as a disaster recovery site where lets say Node04 is an asynchronous-commit replica deployed to this disaster recovery DC site. The fact that the other nodes stay as asynchronous-commit mode after failing over to Node 04 helps prevent potential performance degradation in your availability group due to high network latency between the two sites.

Have a look here for more details about possible scenarios https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/availability-modes-always-on-availability-groups?view=sql-server-ver15
"https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/availability-modes-always-on-availability-groups?view=sql-server-ver15"

I know that but

"a few Stand by Replicas in a DC - B, Location B which is different than A right?"

yes for this. I am referring to what is application server in location A and DB, after failover, in location B, as the AOAG can across different far away location.

"two DC's in order to be able to use synchronous-commit mode which supports manual failover and

this is another good point from the link, sync mode with manual failover! under what situation should/must we use MANUAL failover on synchronous-commit mode secondary replicas, why don't let it auto failover ! Asynchronous-commit mode need should be also able to auto failover as the primary site is dead, no one want to wake up at the middle of the night and just let DR site works.

The fall back has to be manual as we don't know what's wrong with primary and we prefer to stay with DR server until primary proof to be ready again.

"In that case you will need to figure out together with networking what is the minimum speed that you need between the two DC's"

a lot of company do not have fast network between 2 x location and we have a good point before but application server to DB server in 2 x diff location STILL makes trouble ! user click a button, like save, still wait for a long time (waiting for remote DB to reply/commit). all user will report problem.

" The fact that the other nodes stay as asynchronous-commit mode after failing over to Node 04 "

what other nodes here is about. other nodes in secondary DC which has good connection with primary one ?

and why we need the 3rd datacenter for it from your case/knowledge ?
from here:

https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/overview-of-always-on-availability-groups-sql-server?view=sql-server-ver15


"Performing backup operations on secondary replicas

The secondary replicas support performing log backups and copy-only backups of a full database, file, or filegroup. You can configure the availability group to specify a preference for where backups should be performed."

so on active secondary replicas, we can only do copy backup and log back ? so copy back with log backup can already allow us to fully restore the DB ?

the different between copy backup and full backup is copy backup do not break the log sequence ?
1. Per your above note about the speed between different locations:
Synchronous-commit mode emphasizes high availability over performance, at the cost of increased transaction latency.
https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/availability-modes-always-on-availability-groups?view=sql-server-ver15 and please read more before deciding the best AG solution that fits your environment.

2. Related to automatic failovers - AG versus SQL Cluster running on WFCS failovers are very different and there will be guaranteed occasions when the Automatic AG Failover will not happen even if you configured everything you can think of close to perfection and the reasons why this may not occur are all listed here: https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/configure-flexible-automatic-failover-policy?view=sql-server-ver15 
and here: https://learnsqlserverhadr.com/5-questions-sqlag-not-automatically-failover/
As inconvenient the manual AG failover may seem in my opinion this is not an event that can be guaranteed to automatically happen with all systems being able to switch and run against the new SQL and without any manual intervention however...you can obviously configure the automatic failover to a location B knowing that there is latency between primary and replica.
Think that IF the replica DATABASE that is set to take the load when primary DATABASE goes down is NOT in a SYNCHRONIZED state...then automatic failover does NOT happen until the replica DB is in that state + many more reasons as per links above.

3. FULL Database backups will break the chain indeed versus COPY only so after the AG was enabled only COPY ONLY backups can be taken from any SECONDARY database indeed however you can create any type of backup of a PRIMARY database.

4. "what other nodes here is about. other nodes in secondary DC which has good connection with primary one ?" - see the screen shot below:
User generated image
just interest on a topic, is Sync-commit mode replicate in transaction level like mirroring and Async commit mode use log shipping method to ship log to replica?
Your full answers are here: https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/availability-modes-always-on-availability-groups?view=sql-server-ver15#AsyncCommitAvMode
under: "Asynchronous-Commit Availability Mode"
Under asynchronous-commit mode, the secondary replica never becomes synchronized with the primary replica. Though a given secondary database might catch up to the corresponding primary database, any secondary database could lag behind at any point. Asynchronous-commit mode can be useful in a disaster-recovery scenario in which the primary replica and the secondary replica are separated by a significant distance and where you do not want small errors to impact the primary replica or in or situations where performance is more important than synchronized data protection. Furthermore, since the primary replica does not wait for acknowledgements from the secondary replica, problems on the secondary replica never impact the primary replica.

An asynchronous-commit secondary replica attempts to keep up with the log records received from the primary replica. But asynchronous-commit secondary databases always remain unsynchronized and are likely to lag somewhat behind the corresponding primary databases. Typically the gap between an asynchronous-commit secondary database and the corresponding primary database is small. But the gap can become substantial if the server hosting the secondary replica is over loaded or the network is slow.

The only form of failover supported by asynchronous-commit mode is forced failover (with possible data loss). Forcing failover is a last resort intended only for situations in which the current primary replica will remain unavailable for an extended period and immediate availability of primary databases is more critical than the risk of possible data loss.The failover target must be a replica whose role is in the SECONDARY or RESOLVING state. The failover target transitions to the primary role, and its copies of the databases become the primary database. Any remaining secondary databases, along with the former primary databases, once they become available, are suspended until you manually resume them individually. Under asynchronous-commit mode, any transaction logs that the original primary replica had not yet sent to the former secondary replica are lost. This means that some or all of the new primary databases might be lacking recently committed transactions.

"Synchronous-Commit Availability Mode" and "How Synchronization Works on a Secondary Replica"
Under the synchronous-commit mode, after a secondary replica joins the availability group and establishes a session with the primary replica, the secondary replica writes incoming log records to disk (hardens the log) and sends a confirmation message to the primary replica. Once the hardened log on the secondary database has caught up the end of log on the primary database, the state of the secondary database is set to SYNCHRONIZED. The time required for synchronization depends essentially on how far the secondary database was behind the primary database at the start of the session (measured by the number of log records initially received from the primary replica), the work load on the primary database, and the speed of the computer of the server instance that hosts the secondary replica.

Synchronous operation is maintained in the following manner:

On receiving a transaction from a client, the primary replica writes the log for the transaction to the transaction log and concurrently sends the log record to the secondary replicas.

Once a log record is written to the transaction log of the primary database, the transaction can be undone only if there is a failover at this point to a secondary that did not receive the log. The primary replica waits for confirmation from the synchronous-commit secondary replica.

The secondary replica hardens the log and returns an acknowledgement to the primary replica.

On receiving the confirmation from the secondary replica, the primary replica finishes the commit processing and sends a confirmation message to the client.

 Note

If a synchronous-commit secondary replica times out without confirming that it has hardened the log, the primary marks that secondary replica as failed. The connected state of the secondary replica changes to DISCONNECTED, and the primary replica stops waiting for confirmation from the secondary replica. This behavior ensures that a failed synchronous-commit secondary replica does not prevent hardening of the transaction log on the primary replica.

Synchronous-commit mode protects your data by requiring the data to be synchronized between two places, at the cost of somewhat increasing the latency of the transaction.
"https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/availability-modes-always-on-availability-groups?view=sql-server-ver15 and please read more before deciding the best AG solution that fits your environment."

you seems saying for fast network us sync commit mode and slow network use Async... ?

"the reasons why this may not occur are all listed here: https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/configure-flexible-automatic-failover-policy?view=sql-server-ver15 "

you mean the failover policy has control that ?

"and here: https://learnsqlserverhadr.com/5-questions-sqlag-not-automatically-failover"

and this just meant auto failover must set the node to Sync commit mode and failover policy is set for these sync commit nodes.

but my concern is more on why some sync commit nodes can't autofailover and some can ! probably because any SQL server version before SQL 2019 ONLY allow 3 x nodes auto failover? in which one is primary and the other 2x sync commit secondary replica?

therefore the 3 rd secondary replica can't do auto failover at all?

"3. FULL Database backups will break the chain indeed versus COPY only so after the AG was enabled only COPY ONLY backups can be taken from any SECONDARY database indeed however you can create any type of backup of a PRIMARY database."

I think it only breaks if both primary and secondary replicas do FULL BACKUP  will it breaks the log chain, so if ONLY secondary replicas do full backup should be fine ?

after reading your diagram I think you misunderstanding me, what i said is:

"what other nodes here is about. other nodes in secondary DC which has good connection with primary one ?"

I am not sure why once DB server 1,2,3 dead and it failover to nodes 4 in remote DC which has high latence network from primary site, nodes 1,2,3 automatically become async mode ? because the new primary is node 4 now and node 1,2,3 is in remote DC and it is all now secondary replica now, so SQL server will make them all in async mode?

"under: "Asynchronous-Commit Availability Mode""

you are talking about this?

"Under asynchronous-commit mode, any transaction logs that the original primary replica had not yet sent to the former secondary replica are lost. This means that some or all of the new primary databases might be lacking recently committed transactions. For more information on how forced failover works and on best practices for using it, see Failover and Failover Modes (Always On Availability Groups)."

actually I just want to know if async mode is based on log shipping and sync is based on mirror technology.

"On receiving a transaction from a client, the primary replica writes the log for the transaction to the transaction log and concurrently sends the log record to the secondary replicas.
"

so this seems sync commit mode also log shipping ?

so mirroring technology is not used in AOAG  ?
any updates for me ?
This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.