We help IT Professionals succeed at work.

SQL 2012 AOG and SQL2014 AOG

marrowyung
marrowyung asked
on
333 Views
Last Modified: 2018-03-21
hi,

right now we see that when SQL 2012 AOG's primary replica dead, the secondary replica also dead so the SQL 2012 AOG is useless, so we need a better multi-site failover solution.

for any one of you who use SQL 2012 AOG before please share what difficulty you have experienced ?

I knew it is by designed that for SQL 2012 will bring down secondary replica too if primary replica down, right?  so I expect SQL 2014/2016 already solved that.

if so, what is the suggested upgrade procedure from SQL2012 to SQL2014/2016? any idea on what is even better on SQL2016 AOG ?
Comment
Watch Question

Matt BowlerDatabase Reliability Engineer
CERTIFIED EXPERT

Commented:
Can you provide a bit more detail? Are you seeing errors? What state are the databases in? Are the SQL instances up and running?
marrowyungSenior Technical architecture (Data)

Author

Commented:
hi,

is that means that you are using SQL 2012 and when primary replica dead the secondary replica can fail over without any problem ?

"right now we see that when SQL 2012 AOG's primary replica dead, the secondary replica also dead so the SQL 2012 AOG "

actually from time to time we see news from internet that SQL 2012 AOG has failover problem, which MySQL's failover replication also has this problem.

so when primary site down the secondary replica also down, this is what I don't want to see.

any idea ?
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
I knew it is by designed that for SQL 2012 will bring down secondary replica too if primary replica down, right?
From whom did you get this information? This is totally wrong. If that's true then what for you'll need AlwaysOn?

I'm second Matt's questions:
Can you provide a bit more detail? Are you seeing errors? What state are the databases in? Are the SQL instances up and running?
marrowyungSenior Technical architecture (Data)

Author

Commented:
"From whom did you get this information? This is totally wrong. If that's true then what for you'll need AlwaysOn?
"

I just read news everyday and this is the feedback from the field.  that's why SQL2014 comes with much better replica, right?

"Can you provide a bit more detail? Are you seeing errors? What state are the databases in? Are the SQL instances up and running?"

What we are receiving from customer that, both server in AOG see as one logical unit, so when primary replica failes and the secondary replica also down.

so he said the AOG built by previous DBA is useless. so when primary replica down secondary replica down too, so finally all DB down.
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
I just read news everyday and this is the feedback from the field.
Reading and interpreting are different stuffs. You should always provide the articles that you're reading so we can be sure that you've interpreted it right.

that's why SQL2014 comes with much better replica, right?
Not because of that, because it's a false statement. MSSQL 2014 provides more replicas and some bug fixes, yes. But that's normal with new versions. SQL Server 2016 also has improvements.

so he said the AOG built by previous DBA is useless. so when primary replica down secondary replica down too, so finally all DB down.
Will be good if you could see it with your own eyes. The customer is passing the information to you and you are passing it to us without confirm nothing yet. This can only lead to useless discussion. Give us facts (error messages, log entries, ...) and then we can try to provide the necessary help.
marrowyungSenior Technical architecture (Data)

Author

Commented:
"Reading and interpreting are different stuffs. You should always provide the articles that you're reading so we can be sure that you've interpreted it right."

you are right on that but I can't find it out anymore, that's why I can't show you.

"MSSQL 2014 provides more replicas and some bug fixes, yes"

also good to hear !

" Give us facts (error messages, log entries, ...) and then we can try to provide the necessary help."

usually for the AOG group, where can I see the log related to failover problem ?

any information you want to see from configuration point of view? it can be the configuration is wrong I agree

sorry first time to work with AOG and that one is SQL2012, need more information on how to check out error. I want to see why previous DBA give shit like this.

But in the documentation from previous DBA, all failover procedure but not the design. this means he did a failover before and it works.
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
Here is something for you to read: AlwaysOn Troubleshooting
marrowyungSenior Technical architecture (Data)

Author

Commented:
one thing, I will do a test on TEST AOG group, so in order to trigger the failover, what should I do ?

just stop the SQL service on the primary replica and see if secondary replica still on and receive request?

how to check the AOG logical connection name? which should be the same from application point of view.
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
so in order to trigger the failover, what should I do ?
Simulate a disaster (service stop, power off, network down, ...)

just stop the SQL service on the primary replica and see if secondary replica still on and receive request?
You can stop the service but the expected behavior is that the Secondary Replica doesn't receive any request because it will become the Primary Replica.

how to check the AOG logical connection name? which should be the same from application point of view.
When using an AG solution, applications should work only with the Listener.
marrowyungSenior Technical architecture (Data)

Author

Commented:
"but the expected behavior is that the Secondary Replica doesn't receive any request because it will become the Primary Replica.

sure...

AOG should also like mirroring, it has high performance model and high available mode?

"When using an AG solution, applications should work only with the Listener."

 the listener has the logical connector name? the Virtual SQL server name? when can I see it ? what to double check thing. it should have DR listener, right?
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
AOG should also like mirroring, it has high performance model and high available mode?
AlwaysOn solution is the Mirroring replacement. So, it's a Mirroring with steroids :)

the listener has the logical connector name? the Virtual SQL server name?
Yes.

when can I see it ?
When? Every time you want to. Where? In SSMS, under AlwaysOn High Availability folder.

it should have DR listener, right?
It request only a listener but you can add more. You can have one for HA and another for DR, for example.
marrowyungSenior Technical architecture (Data)

Author

Commented:
"It request only a listener but you can add more. "

still manage under In SSMS, under AlwaysOn High Availability folder. ?
marrowyungSenior Technical architecture (Data)

Author

Commented:
for auto failover and manual fallback, usually you also suggest your customer to handle failover in this case ,right?

any witness is need in auto failover, like the case in mirroring ?

I don't think you will do auto fallback, right ?

one thing, for DR listener, is the primary and secondary replica both keep a copy of primary listener and DR listener, so that if primary down both listener still there? or primary replica keep the primary listener and secondary replica keep DR listener?
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
still manage under In SSMS, under AlwaysOn High Availability folder. ?
Yes, everything related to AO (Replicas, databases, listeners, ...) is managed under that folder.

for auto failover and manual fallback, usually you also suggest your customer to handle failover in this case ,right?
Sorry but didn't understand the question.

any witness is need in auto failover, like the case in mirroring ?
Forget about the witness in AO. That was one of the things improved with AO.

[/quote]I don't think you will do auto fallback, right ? [/quote]There's no need to. The databases are up and running in the other node. Applications are still working because they're using the Listener connection. Why bother to failback?
marrowyungSenior Technical architecture (Data)

Author

Commented:
"Sorry but didn't understand the question"

Usually we use auto failover to switch over from primary to secondary replica, we use manual fallback from secondary to primary replica instead of auto as we didn't know why primary failed so we prefer not fallback automatically until we found out what is going on in primary .


" That was one of the things improved with AO.

no need fallback any more..

"The databases are up and running in the other node"

by this you mean in AOG, both primary and secondary replica work as single logical unit, right? so who care about fallback or not, virtual SQL server still here but point to secondary replica ?

then I am not sure why my client's case is once primary replica is down secondary replica don't comes up and serve.

so for the AOG log, it also under  SSMS, under AlwaysOn High Availability folder ? I want to see during fail over, why secondary replica don't show up.
marrowyungSenior Technical architecture (Data)

Author

Commented:
actually for primary replica and secondary replica to switch back and forward successfully, what component of AOG must exist before doing this ? I just want to make sure everyting is ok as I am new to AOG<
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
no need fallback any more..
Correct. There's not even the fallback term in AlwaysOn.

by this you mean in AOG, both primary and secondary replica work as single logical unit, right? so who care about fallback or not, virtual SQL server still here but point to secondary replica ?
Yes but there's no virtual SQL Server either. You have a virtual IP that's used by the Listener.

then I am not sure why my client's case is once primary replica is down secondary replica don't comes up and serve.
That's what I said in my first comment and that's why I suggested you to investigate before asking here in EE because what he said doesn't make any sense.

so for the AOG log, it also under  SSMS, under AlwaysOn High Availability folder ?
No. There's no AOG log. You'll need to check in SQL Server instance log instead. Hope that you read the article I've posted before that says how to troubleshoot AlwaysOn.
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
actually for primary replica and secondary replica to switch back and forward successfully, what component of AOG must exist before doing this ?
Windows cluster, SQL Server service in both nodes, databases online, Listener online. You can also use the AlwaysOn Dashboard to check the health of your AG:
AG_ShowDashboard.PNG
marrowyungSenior Technical architecture (Data)

Author

Commented:
"Windows cluster, SQL Server service in both nodes, databases online"

it is sure that it all exists as if windows cluster and SQL server service is not online, AOG can't be setup, right?

let me check tomorrow.
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
it is sure that it all exists as if windows cluster and SQL server service is not online, AOG can't be setup, right?
What do you think? Tell me how do you want to set up something in SQL Server if the service is offline?
marrowyungSenior Technical architecture (Data)

Author

Commented:
"What do you think? "
yes!

"if the service is offline?"

if other service: Windows cluster, SQL server service is offline, right?
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
Right
marrowyungSenior Technical architecture (Data)

Author

Commented:
tk.s I will do an intensive check tomorrow and friday, will update you what I found out.

the goal is to make sure that if primary replica failed, secondary replica can serve too.
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
the goal is to make sure that if primary replica failed, secondary replica can serve too.
That's the solution provided by  AlwaysOn.
marrowyungSenior Technical architecture (Data)

Author

Commented:
"That's the solution provided by  AlwaysOn."

but I'd like to check now why a configuration doesnt' work.

so one thing, can BPA check out any configuration problem that make AUTO failover doesn't work ? or any other tools can do a quick scan instead of manual?

in AOG, how can I set the AOG parnter so that it is auto failover to secondary replica ?
marrowyungSenior Technical architecture (Data)

Author

Commented:
hi,

the Situtation right now is, there are only 2 x SQL server in the AOG group, the problem is quite funny.

1) on last Friday night team failover SQL server in the AOG  from primary to secondary replica and they do security patching on primary replica. that time the DR replica before the primary replica.
2) they left all operation on DR replica, the new primary replica on friday night as it is too late, let this new primary replica do all operation all over the weekend.
3) During the weekend, the DR has been working well until E:\ full, and DR SQL server/ the new primary replica stop working too.
4) then at that moment we found out that AOG doesn't work by failover back to original primary replica, so all SQL service down.
5) We found that we can turn it back to the primary replica until we break the AOG relationship.

Amazing, we are not going to let it happen again, and experience like this? what configuration is wrong by this information ?

but by the design of AOG, primary and secondary replica can failover/work together if they are using diff network subnet, right? they are in diff data center.

will it be the AD not exists in DR site when primary site AD also failed, so if DR replica in DR site can't find another in the DR sit with diff subnet, DR server can't fallback to primary site ?
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
so one thing, can BPA check out any configuration problem that make AUTO failover doesn't work ? or any other tools can do a quick scan instead of manual?
For years that I don't use BPA so I don't know what kind of test it has for AOG. I just use the AOG Dashboard I've talked about in my comment #42068199.

1) on last Friday night team failover SQL server in the AOG  from primary to secondary replica and they do security patching on primary replica. that time the DR replica before the primary replica.
 2) they left all operation on DR replica, the new primary replica on friday night as it is too late, let this new primary replica do all operation all over the weekend.
Those are the correct steps for applying patches.

3) During the weekend, the DR has been working well until E:\ full, and DR SQL server/ the new primary replica stop working too.
Why disk E: filled up? I think there was a bad capacity management so E: didn't have the necessary free space to let database grow. This is the main issue and not AOG.

4) then at that moment we found out that AOG doesn't work by failover back to original primary replica, so all SQL service down.
 5) We found that we can turn it back to the primary replica until we break the AOG relationship.
I didn't understand these 2 steps. The failback worked or not? If not, did you also check if the disk from the Primary Replica didn't also fill up?

but by the design of AOG, primary and secondary replica can failover/work together if they are using diff network subnet, right? they are in diff data center.
No. There's nothing to do with different network or data centers. We have the same design and it's running quite well.
marrowyungSenior Technical architecture (Data)

Author

Commented:
"Why disk E: filled up? I think there was a bad capacity management s"

can found out later but we are focusing on how to make AOG work in that DR site in diff subnet.

"so E: didn't have the necessary free space to let database grow. This is the main issue and not AOG.
"
so assuming this is true, so if that secondary replica/the new primary has disk space full and make that user database full and can't operate, I expected that the AOG also allow it to failover back to primary ?

"I didn't understand these 2 steps. The failback worked or not?"

it don't .

"If not, did you also check if the disk from the Primary Replica didn't also fill up?"

so you suspect that the old primary don't work too as there might be some problem on primary as well ?


"We have the same design and it's running quite well."

yeah, DR site should be on diff subnet, right? and your case it is perfectly well? both server on the same AOG logical group? so this means AOG can group SQL server by diff subnet IP ?

so I'd like to ask you any reference guide on this type of design like, how many minimum SQL server should be on each site , primary and DR . any other component else need in DR site? really one more AD ? must be on the same domain ?
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
so assuming this is true, so if that secondary replica/the new primary has disk space full and make that user database full and can't operate, I expected that the AOG also allow it to failover back to primary ?
What I meant is if both disks from both nodes are exactly the same size and have exactly the same databases then the disk space full error will occurs in both nodes and this can justify why the AOG stops in both replicas so the solution is to provide more disk space in both nodes.

so you suspect that the old primary don't work too as there might be some problem on primary as well ?
Right. As I just explained above.

yeah, DR site should be on diff subnet, right? and your case it is perfectly well? both server on the same AOG logical group? so this means AOG can group SQL server by diff subnet IP ?
Yes, Yes, Yes and Yes.

any other component else need in DR site? really one more AD ? must be on the same domain ?
No. Irrelevant. No.
marrowyungSenior Technical architecture (Data)

Author

Commented:
hi,

this is what I saw from AOG dashboard:

primary replica AOG dashboard
also any listener owner ? as I am thinking if primary server is doing patching, the listener still hosted on primary replica and it don't response by some reason.

from here I check quotum information:

view cluster quotum infomration
this is the quotum disk information from dashboad:

Quotum disk information
here is the secondary replica's dashboard:

secondary replica dashboard
is it the only place to see all AOG error log ?

view AOG health information
and this is what we have during that down time on sunday:

AOG healthy infomration

anything I can do to check AOG can't failback to primary error message log ?
marrowyungSenior Technical architecture (Data)

Author

Commented:
this is the listener:

number of listener
listener configuration

listener
should be no problem, right ?
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
I can see that AOG is configured to Failover manually. That means that if Primary Replica fails the Failover need to be executed manually by someone. If this is the requirement then it justifies why the Secondary Replica didn't bring online. It's waiting for somebody to perform the Failover.
marrowyungSenior Technical architecture (Data)

Author

Commented:
I just tried to connect to secondary replica from primary replica dashboard and it has error,  any hints by say connection between primary and secondary replica?

when connect to secondary replica
error message:

error message
and this is the dashboard on secondary replica

secondary dashboard
when I click this:

when clicking secondard dashboard
it gives:

when clicking secondary replica dashboard
so any idea?
marrowyungSenior Technical architecture (Data)

Author

Commented:
"I can see that AOG is configured to Failover manually. That means that if Primary Replica fails the Failover need to be executed manually by someone."

yes they manual failover, from primary manual failover to secondary it works as they are doing patches on primary.

" If this is the requirement then it justifies why the Secondary Replica didn't bring online. It's waiting for somebody to perform the Failover."

but what we experienced is, when E:\ of new primary is full, can't even do manual failover except they broke the AOG group and make old primary online again.

this is the whole point.
marrowyungSenior Technical architecture (Data)

Author

Commented:
Victor anything you can see is related to AOG configuration ?

I am thinking about it is other issue, like DNS or network.
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
I just tried to connect to secondary replica from primary replica dashboard and it has error,  any hints by say connection between primary and secondary replica?
When you did that it appear a link to click for details. Did you click on that?

but what we experienced is, when E:\ of new primary is full, can't even do manual failover except they broke the AOG group and make old primary online again.
Did somebody try to solve the disk full issue or not? If not this will continue to happen.

Victor anything you can see is related to AOG configuration ?

 I am thinking about it is other issue, like DNS or network.
Solve first the disk space issue and then check if that solve or not this issue.
marrowyungSenior Technical architecture (Data)

Author

Commented:
"When you did that it appear a link to click for details. Did you click on that?"

yes, just and error
error message
network error, connect from primary to secondary has problem ?

"Did somebody try to solve the disk"

yes.

"Solve first the disk space
 issue and then check if that solve or not this issue."

so this means we try to fallback again when we make the E:\ full ? has to simulate one, right?
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
Ok, that's weird. How dashboard is showing all green but then you're getting an error?
As I understood the Secondary Replica is in another data center in a different network zone and domain, right? Does your user has access to it?

so this means we try to fallback again when we make the E:\ full ?
Correct. Secondary Replica can't work as Primary until have more disk space to grow.
marrowyungSenior Technical architecture (Data)

Author

Commented:
"As I understood the Secondary Replica is in another data center in a different network zone and domain, right? "

yes

"Does your user has access to it?"

does it matter, why ?

"Correct. Secondary Replica can't work as Primary until have more disk space to grow."


nono. I mean simulate agian by failover to secondary, then try to make e:\ full on secondary and see if it can failoveR?
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
does it matter, why ?
If you don't have access to it you can manage the Secondary Replica.

I mean simulate agian by failover to secondary, then try to make e:\ full on secondary and see if it can failoveR?
You can but I think you already know the result, right?
marrowyungSenior Technical architecture (Data)

Author

Commented:
"If you don't have access to it you can manage the Secondary Replica."

manage means what task sorry? it is for DB backup and some report application usage.

"You can but I think you already know the result, right?"

but it is a failure result, I means simulate one more time to see if it happen in the same way ?
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
manage means what task sorry?
Any DBA task.

but it is a failure result, I means simulate one more time to see if it happen in the same way ?
Tests are always welcome and never too much, so go for it.
marrowyungSenior Technical architecture (Data)

Author

Commented:
"If you don't have access to it you can manage the Secondary Replica."

I think the statement is if we dont have access to it we can't manage the secondary replica, right?
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
Ah, right. It was a typo. Sorry about that.
marrowyungSenior Technical architecture (Data)

Author

Commented:
hi,

One thing, is it said that the cluster service on both primary and secondary replica has to be on before fallback from secondary replica can be archive ?

I get away from windows cluster for a while and I am not sure if anything changed in Windows cluster and AOG.
marrowyungSenior Technical architecture (Data)

Author

Commented:
Victor,

any AOG configuation scanning tools to make us quickly know if our AOG configuration is right or not ?
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
any AOG configuation scanning tools to make us quickly know if our AOG configuration is right or not ?
You're making me point back to the Dashboard. You have everything there. What do you need more?
marrowyungSenior Technical architecture (Data)

Author

Commented:
"You're making me point back to the Dashboard"

ok only this ! so if it is all green, then there are no problem  ?
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
Correct.
marrowyungSenior Technical architecture (Data)

Author

Commented:
ahhahh. ok. :::)::):):)

let me check out more.
marrowyungSenior Technical architecture (Data)

Author

Commented:
one thing, what windows/SQL server service on each node  has to start for AOG to run well ?
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
AOG is not a service but a feature. The feature needs SQL Server engine and Windows Cluster to be running in all nodes.
marrowyungSenior Technical architecture (Data)

Author

Commented:
I checked from failover cluster manager and I am not sure if I am right, why I can't see quotum disk from failover cluster manager but AOG dashboard the quotum is ok ( that means ok, right ?)

cluster service.quotum disk
marrowyungSenior Technical architecture (Data)

Author

Commented:
hi,

do you think this tools can do the job once Cluster setjup:

https://www.microsoft.com/en-us/download/details.aspx?id=8529
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
You're expecting to see a Quorum disk but there are more quorum modes.
This is way you should work together with the Windows team. They can easily provide these kind of answers.
marrowyungSenior Technical architecture (Data)

Author

Commented:
hi,

this is the AlwaysON healthy event,  please check if you can see any problem.
AOG-healthy-alert-from-primary-DR.rtf
production-AOG-healthy-log-on-primar.rtf
marrowyungSenior Technical architecture (Data)

Author

Commented:
now i take a look more on AOG healthy event, do you see this before, today it has this kind of message too:

message 2
message 1
I found that these 2 x message keep appearing every day, any idea ?
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
Where did you take those logs from? I'm used to work only with SQL Server and Windows Event Logs.
The only thing I've found that might point to some kind of issue is only a single line pointing to a possible network or firewall issue:
"\hich\af39\dbch\af31505\loch\f39 GLOBAL' with id [A2A94D88-443C-4544-AF1D-54A432D6097A]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance."

You really need to work with your Windows and Network team. Ask them to verify from their side if the connection between the cluster nodes are ok.
marrowyungSenior Technical architecture (Data)

Author

Commented:
"Where did you take those logs from?

the always on health event I shown you before:

View-alwayson-Health-Events.JPG
marrowyungSenior Technical architecture (Data)

Author

Commented:
hi.

"You really need to work with your Windows and Network team. Ask them to verify from their side if the "

I know .

talk to them already and they are checking sth else.
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
Ah, ok. I just never saw it in the format you've sent :)
Looking forward for the feedback from the other teams.
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
By the way, you have here a list of all events associated for AOG.
For the one you've presented, it's informational one, so isn't an error:
"35202 - A connection for availability group '%ls' from availability replica '%ls' with id [%ls] to '%ls' with id [%ls] has been successfully established. This is an informational message only. No user action is required. "
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
Went to investigate the error message that I've found in the logs you sent and looks like it's related to a 3rd party backup software. Check that article to see if it match to your problem.
marrowyungSenior Technical architecture (Data)

Author

Commented:
but 9642 is  sth else.

do you know what that mean s?

this one also good:

https://www.sqlskills.com/blogs/jonathan/new-alwayson_health-extended-events-session-in-sql-server-2012-rc0/
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
I would guess that's a network issue so wait for the feedback from your network team.
Also, this might be an old issue. Do you have the last date and time that error was logged?
marrowyungSenior Technical architecture (Data)

Author

Commented:
"Also, this might be an old issue. Do you have the last date and time that error was logged?"

5 April 2017, these 2 x error keep repeating.

one thing, I want to compare the AOG configuration of both TEST And production, any way for me to export the AOG configuration from QA even both AOG dashboard show green ,which said healthy !
marrowyungSenior Technical architecture (Data)

Author

Commented:
hi,

I am reading this for SQL 2012, see attached:  'Why AlwaysOn is not Always on', and I have sth I don't understand:

1) For Dynamic weighting:

  As in our case, primary replica and secondary replica connect using VPN, this might cause unstable when by some reason, might be network team hide the true.
 
  but anyone use this feature Auto select the failover ?
 
  it said if connection between 2 x nodes is down, both nodes up and running ? will AOG still function well ?

   then as both nodes on, which one to serve ? both serve is primary as connection between primary and secondary dropped and this feature try to bring nodes on primary site and DR site ? very confused.

2) for N+1 configuration, how can we have 2 x active SQL server in the SAME AOG group ? or it mean the primary and read only secondary is the N, the 1 is a never online DR site that need manual failover ?

3) it said for N+1 situtation, listener will fail and any idea how to solve it ?
Why-AlwaysOn-is-not-Always-on.pdf
marrowyungSenior Technical architecture (Data)

Author

Commented:
hi,


 I just check with IT team, as I read white paper on how to work on a mulit-subnet/site SQL 2012 failover case and they talk about quorum. our team said starting from SQL 2014 Windows cluster don't need Quorum anymore, is that right? then how can SQL server 2012 in the AOG see the quorum and then vote for which one is the primary ?

our IT team only say starting from SQL 2014 AOG don't need quorum, is that right?

that's why from failover cluster manager I can't see the Quorum disk resource, right?

I am thinking one more improvement for a SQL 2012 AOG, the quorum should be on a disk resource in a third datacenter location, not on the same site as the primary and secondary replica, then how can we do it if since SQL 2014 the Windows cluster don't need a Quorum disk any more ?
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
5 April 2017, these 2 x error keep repeating.
That was 5 days ago. Any new log entry for this error? If affirmative, when was the last one?

our team said starting from SQL 2014 Windows cluster don't need Quorum anymore, is that right?
Is not the SQL Server that doesn't need the Quorum but the Windows Cluster. By the way, I'm assuming you mean a clustered disk resource that acts as a Quorum disk.

that's why from failover cluster manager I can't see the Quorum disk resource, right?
That's because your Windows team didn't create a clustered disk resource for the Quorum. You need to talk with them why they decided like that but for AG doesn't matter if it's a disk resource or not.

I am thinking one more improvement for a SQL 2012 AOG, the quorum should be on a disk resource in a third datacenter location, not on the same site as the primary and secondary replica, then how can we do it if since SQL 2014 the Windows cluster don't need a Quorum disk any more ?
It's explained in the Idera's Whitepaper:
"Node Majority: voting is based on the number of active nodes
(...)
Node & File Share Majority: voting is based on the number of active nodes plus one vote for a file share resource
(...)
Node & Disk Majority: voting is based on the number of active nodes plus one vote for a shared disk cluster resource
(...)
Disk-only: no quorum required, active node is determined by the shared disk cluster resource"
marrowyungSenior Technical architecture (Data)

Author

Commented:
"Is not the SQL Server that doesn't need the Quorum but the Windows Cluster. By the way, I'm assuming you mean a clustered disk resource that acts as a Quorum disk."

hi, yes I ask the team too but the point is, I don't see quotum disk on 2 x team's Failover cluster manager,

for the incident we have, now I have feedback from IT team that, they also agree there are 3 x problem I checked out.
1) cluster resource can't start as some account autherication is wrong.
2)network seems problem
3) as 1) , some cluster resource failed.

for 3), IT administrator said as the windows log's disk volumn is full, so windows cluster service fail, is windows cluster depends no log ? if log disk full, it can't write log it fail?


"I'm assuming you mean a clustered disk resource that acts as a Quorum disk.

it doens't matter right?

"Disk-only: no quorum required, active node is determined by the shared disk cluster resource"

are you talking about this is the main reason ?

but  I am asking if windows cluster don't need quotum anymore as we can't relocate quorum disk resource to a third location ?
marrowyungSenior Technical architecture (Data)

Author

Commented:
also actually do'nt udnerstand what it means":

""Node Majority: voting is based on the number of active nodes
(...)
Node & File Share Majority: voting is based on the number of active nodes plus one vote for a file share resource
(...)
Node & Disk Majority: voting is based on the number of active nodes plus one vote for a shared disk cluster resource
(...)
Disk-only: no quorum required, active node is determined by the shared disk cluster resource"
Vitor MontalvãoIT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017

Commented:
I don't see quotum disk on 2 x team's Failover cluster manager
If they didn't configure one then you won't see it for sure.

3), IT administrator said as the windows log's disk volumn is full, so windows cluster service fail, is windows cluster depends no log ? if log disk full, it can't write log it fail?
They have a disk only for Windows logs? Anyway, if it's full why can't they just clean it?

it doens't matter right?
Yes, it matters. I need to know the context you're talking about.

but  I am asking if windows cluster don't need quotum anymore as we can't relocate quorum disk resource to a third location ?
But was there any quorum disk before? Because if wasn't then you don't have nothing to relocate.

also actually do'nt udnerstand what it means:
I took that from the Idera's article that you posted. The explanation is there and I can't explain better. Try your IT team to see if they can explain it better.
marrowyungSenior Technical architecture (Data)

Author

Commented:
good day Victor,

"If they didn't configure one then you won't see it for sure."

I agree but without quorum how they form cluster ? or as you said windows cluster 2012 don't need quorum any more

"They have a disk only for Windows logs? Anyway, if it's full why can't they just clean it?"

I found this: https://support.microsoft.com/en-us/help/251284/cluster-server-cannot-start-if-the-quorum-disk-space-is-full 

but this link don't tell which version of Windows cluster it is but based on technology, I think it is the same across the all version of Windows cluster, agree?

Why they don't clean it up, they store in E:\ as they need a bigger volume to store all log because of the PCI complaince. but they don't have a purging process for the log. that DR DB's disk drive is smaller than the primary replica's disk.

 

"Yes, it matters. I need to know the context you're talking about."

  Quorum disk has to configure/add as clustered disk resource, right?

that's why I said why matter?

"But was there any quorum disk before? Because if wasn't then you don't have nothing to relocate.

I knew what you mean but as I said (my cluster knowledge still on Windows 2003), I am wondering if the AOG is configured correctly and I read Idera's PDF, I think for multi -subnet configuration they need quorum configured in a third location so that in case the whole primary site lost, quorum disk resource still there and DR DB know the cluster still alive,
marrowyungSenior Technical architecture (Data)

Author

Commented:
"I took that from the Idera's article that you posted. The explanation is there and I can't explain better. Try your IT team to see if they can explain it better."

I just didn't know why the quroum disk on third datacenter is related that statment.
IT Engineer
CERTIFIED EXPERT
Distinguished Expert 2017
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION
marrowyungSenior Technical architecture (Data)

Author

Commented:
tks anwyay.

Gain unlimited access to on-demand training courses with an Experts Exchange subscription.

Get Access
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Empower Your Career
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE

Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a sample view!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.