Avatar of ipsec600
ipsec600
 asked on

DAG Failed

We were facing problem to take exchange bakup for last couple of days, for that logged a case to backup vendor, and as per their suggestion, we had to reboot one of the DAG member server sitting in the secondary site. We put that server in maintenace mode, and performed reboot, and after reboot put that server out of maintenance mode. Found that every DB copy across the DAG is health state, but found cluster name DAGBL become failed,after opening failover cluster I have found that one of the IP address become offline, while i tried to bring it online receiving the following error.
Error "The operation has failed" An error occured while trying to bring the resource "IPV4 static address 1(Cluster Group) online. While i clicked details found the error code: Ox80071397.
The operation failed because either the specified cluster node is not the owner of the resource or the node is not a possible owner of the resource.

I have checked failover cluster properties that from for possbile owner four servers are selected like server01, server02,server03,server04

i am using exchannge 2010 rollup2, SP3, & primary site server01, server02 member of DAG, secondary site: server03 is member of DAG, server 04 is holding only mailbox role but not in DAG.

So far i have chcked FSW is online and Cluster netwrork is showing up and online for server01, server02,server03,server04. and all DBs are in healthy state across the DAG. In this situation, can you please advice what can be done to resolve the issue.
Exchange

Avatar of undefined
Last Comment
ipsec600

8/22/2022 - Mon
Mahesh

Please correct me if my understanding  is wrong..
You mean to say, that server4 is got removed from DAG automatically?
Can you run below shell command to verify
Get-DatabaseAvailabilityGroup -Identity DAGNAME | Format-List Servers

If that's the case I don't think there is any option other than forcefully removing passive copies if any on the server4 and add server again.

Your DAG will still remain online due to voters majority

Mahesh
Adam Farage

Verify that all roles are out of maintenance, and if they are or you cannot get them out of maintenance try moving the primary active manager to a different node.

You can run the following to find out which role has the PAM (primary active manager) role, which technically is the cluster owner:

Get-DatabaseAvailabilityGroup -Status | Select Name, PrimaryActiveManager

Open in new window


Once you do that, simply move the PAM role to a different node in your primary datacenter:

Cluster.exe "DAGName" /moveto:newnode

Open in new window


Give that a shot and let us know how you make out.
ipsec600

ASKER
Hello Mahesh, thank you for your post, actually I have been facing issues with my DAG, which is becoming offline, but server04 is ok.

Hello evrydayzawrkday, thank you for your advice, please find the below status now:

While I open failover cluster, I found that DAGBL(DAG name) was in offline state while I opened this post, in the DAG there is two interface, one is showing online, and he other one is showing offline, and while I tried to bring online the sencond interface(offline) receiving the below error:

Error "The operation has failed" An error occured while trying to bring the resource "IPV4 static address 1(Cluster Group) online. While i clicked details found the error code: Ox80071397.
The operation failed because either the specified cluster node is not the owner of the resource or the node is not a possible owner of the resource.

I have found the followings during while I was receiving error:
For DAGBL(DAG name), PAM was server01 which is from primary site.

Then we gracefully rebooted all the exchange servers and found the followings:
DAGBL is offline but after 2 to 3 hrs it automatically become online, but PAM is from secondary site which is server04.

Then I run the Validate this cluster from failvoer cluster console and found the following warning message for a cluster resouce IP address which is basically the interface which is offiline:

This resource is marked with a state of 'Offline'. The functionality that this resource provides is not available while it is in the offline state. The resource may be put in this state by an administrator or program. It may also be a newly created resource which has not been put in the online state or the resource may be dependent on a resource that is not online. Resources can be brought online by choosing the 'Bring this resource online' action in Failover Cluster Manager.

In sumamry, after rebooting all the exchange severs, DAGBL(DAG name) is online but one of the interface is still offline, and receving the above warning message. I am not sure is it by design, because in DAG, out of  two interfaces, one inteface is active & other interface is offline. At this moment, all the Database staus is healty from all DAG members servers.
Also note that while PAM was in server01(primary site) then interface02 was offline from DAG, but interface01 was online. Now while PAM is in server04(secondary site) then interface02 online, but interface01 is offline , meaning that DAG interfaces are automatically switching over, and becoming offline/Online based on PAM. In this situation, What can be done to bring that interface online, please advice.
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
Mahesh

Which network interface are going down ? on replication network or Mapi network?
If you could please share any screen shot, it will be great ?

If you move PAM role across servers, does one interface is going offline in case of all servers ?

You are using two network cards per DAG member, right?
How many network cards do you have per server ?
Are the all DAG members are virtualized ?

Mahesh
ipsec600

ASKER
Hi Mahesh, sorry for replying late due to vacation, Please find the below details:

All the DAG members are virtualized, and using one interface, for that both Mapi and replication network is using only one interface.

I will move PAM role across servers, to observer interface status, and update you. Would you please clarify the below: when I run the failover cluster dependancy report, receivig the below output:

Cluster:       DAGBL
Resource:       Name: DAGBL
Started                     12/15/2013 11:18:08 PM
Completed      12/15/2013 11:18:08 PM
'IP Address: 172.22.228.220' has no required dependencies.
'IP Address: 172.25.229.1' has no required dependencies.
'Name: DAGBL' dependencies are 'IP Address: 172.22.228.220' or 'IP Address: 172.25.229.1'.
'Cluster Name' required dependencies are IP Address.

represents 'AND' relationship: all child resources must be on-line

represents 'OR' relationship: at least one child resource must be on-line

I have noticed that in the DAG properties "OR" relationship is defined, does it mean that only one interface will remain online between the two interfaces, if it is so then in my case DAG is behaving as usual otherwise, I will have work on it to bring two interfaces online.
ASKER CERTIFIED SOLUTION
Mahesh

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
ipsec600

ASKER
Hi Mahesh, thanks a lot for your excellent clarification, according to blog " You can set "OR" relationship there in dependencies http://www.shudnow.net/2010/09/27/exchange-2010-site-resilience-multiple-dag-ips-and-cluster-resources/", my DAG environment is behaving normally, that's what I was looking for. and for the recommendation, sure, I will proceed to implement two network card for DAG considering redundancy.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.