Solved

Exchange 2010 Dag enviroment issues

Posted on 2013-01-05
22
619 Views
Last Modified: 2013-07-07
Dear Experts,

I am currently having some serious problems with my exchange 2010 dag environment running on 5 exchange mailbox servers and 14 databases. 3 mailbox servers holding database copies and 2 mailbox server are being used for only heartbeat for now.
 

2 exchange mailbox server (ex01 and ex03) are on the 10.78.133.0 subnet.  (10.78.133.143, 10.78.133.150)
1 Exchange mailbox server (ex02) is on the 10.65.65.0 subnet    (10.65.65.100) (All 3 mailbox servers are on one site)
hw01 - mailbox server (seperate site)
nw01 - mailbox server (seperate site)

The cluster is on a node majority mode

dag ips 10.78.133.50
10.65.65.50

Earlier this morning, ex02 ip went offline with the following event log 1135 and 10.78.78.50 dag ip came online and 10.65.65.0 was offline. As a result all of the databases on 10.65.65.100 went in a failed state and 10.65.65.100 ip went unavailable  and was giving errors like the network manager could not be intialized.

after sometime and a reboot ex02 came back online, but ex03  10.78.78.150 ip became unavailable and its database went into failed state. I am not sure about ex03,  why did it go out and how can i bring the ip to an available state

Would need your help in the matter?

thank you

mshaikh22
0
Comment
Question by:mshaikh22
  • 15
  • 7
22 Comments
 
LVL 36

Expert Comment

by:ArneLovius
ID: 38748354
I would guess that the cluster IP has moved to the other subnet, you should be able to see this in cluster manager

to move it back, you would need to update the cluster from an elevated command prompt

cluster.exe <DAG F.Q.D.N.> group "cluster group" /moveto:<server name>
cluster.exe <DAG F.Q.D.N.> group "available storage" /moveto:<server name>

Open in new window


if the domain name was domain.internal, the  DAG name was DAG-01 and the server was server-1, they would look like

cluster.exe DAG-01.domain.internal group "cluster group" /moveto:server-1
cluster.exe DAG-01.domain.internal group "available storage" /moveto:server-1

Open in new window


However, this does not cover the databases being in a failed state. I would guess that "something else" has happened as well, such as losing the witness share at the same time.
0
 

Author Comment

by:mshaikh22
ID: 38748364
thank you ArneLovius

I put in the following command

cluster.exe DAG-01.domain.internal group "cluster group"

I am getting the following  

System error 1331 has occurred (0x00000533).
Logon failure: account currently disabled.

how can i find out the account related to this.
0
 

Author Comment

by:mshaikh22
ID: 38748369
I locked out my account. its fine.
now
0
 

Author Comment

by:mshaikh22
ID: 38753837
Sorry about that. The issue regarding the cluster group has not been resolved. the symptons are the same.


ex03 node is still unable in the cluster. It has not moved to another subnet

In failover cluster manager

cluster network 1 says

 10.78.133.143 - online
10.78.133.150  - unavailable

in daggroupavailabilitynetwork section its showing ex03 ip as unavailable also.

I dont see much in event logs, the cluster service keep stopping.


How can we resolve this issue?
0
 
LVL 36

Expert Comment

by:ArneLovius
ID: 38754270
can you post screengrabs from the cluster manager
0
 

Author Comment

by:mshaikh22
ID: 38754319
Please find Failover Cluster Manager screenshot

cluster network 1 says

 10.78.133.143 - online
10.78.133.150  - unavailable
fc.png
0
 

Author Comment

by:mshaikh22
ID: 38754382
I even followed steps laid in the technet post, but it didnt bring the cluster resource back online, even by unchecking the client option and re checking it.


http://blogs.technet.com/b/timmcmic/archive/2010/05/12/cluster-core-resources-fail-to-come-online-on-some-exchange-2010-database-availability-group-dag-nodes.aspx
fc2.png
0
 
LVL 36

Expert Comment

by:ArneLovius
ID: 38754434
are you using different MAPI and replication networks ?

I'm not sure what you meant by "2 mailbox server are being used for only heartbeat for now" I they are not active mailbox servers, then remove them from the DAG.

Where is your file share witness ?
0
 

Author Comment

by:mshaikh22
ID: 38754543
we are using a team nic that does mapi and replication together.

there is no file witness - its configured as node majority model (which works on a n+1 model)


bg ad site - ex01 ex03 same subnet
bg ad site - different subnet - l ex01
h ad site h ex01
n ad site n ex01
0
 

Author Comment

by:mshaikh22
ID: 38758697
I tried changing the ip of ex03 to a different subnet. I noticed that nothing changed on the cluster and the new cluster network is not showing.

Would really appreciate your help with this.

Regards,

Mansoor
0
 
LVL 36

Expert Comment

by:ArneLovius
ID: 38759492
when you have 5 live servers, the witness is not used, but as soon as a server goes down and you had an even number of live servers, the witness was required, and this lack of witness is the probable cause of your failure

I would suggest that you configured the witness.
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 

Author Comment

by:mshaikh22
ID: 38759523
I keep getting this error


Node 'EX03' failed to establish a communication session while joining the cluster. This was due to an authentication failure. Please verify that the nodes are running compatible versions of the cluster service software.
0
 
LVL 36

Expert Comment

by:ArneLovius
ID: 38759616
I would check time sync between the servers.

Have you added the witness?

The witness can be on any member server, but not a domain controller or a DFS share.
0
 

Author Comment

by:mshaikh22
ID: 38759641
file share witness is on configured to be on cas01
and cas02

but the failover cluster manager is based on node majority model
0
 

Author Comment

by:mshaikh22
ID: 38759647
time is synced between all servers
0
 
LVL 36

Expert Comment

by:ArneLovius
ID: 38759921
Done the witness yet ?
0
 

Author Comment

by:mshaikh22
ID: 38759969
cluster does not use its use node majority model.
witness was configured prior to changing the model

How can I solve 1570 event error
0
 

Author Comment

by:mshaikh22
ID: 38770277
hi experts

we removed ex03 from the dag and left it for a day, but cant re add to the dag, we are getting the following error message. would appreciate your help on this, #

A server-side database availability group administrative operation failed. Error: The operation failed. CreateCluster errors may result from incorrectly configured static addresses. Error: An error occurred while attempting a cluster operation. Error: Cluster API '"AddClusterNode() (MaxPercentage=100) failed with 0x5b4. Error: This operation returned because the timeout period expired"' failed

An Active Manager operation failed. Error: An error occurred while attempting a cluster operation. Error: Cluster API '"AddClusterNode() (MaxPercentage=100) failed with 0x5b4. Error: This operation returned because the timeout period expired"' failed..
0
 
LVL 36

Expert Comment

by:ArneLovius
ID: 38770347
I am going to guess that the cluster IP address does not match the active cluster host.
0
 

Author Comment

by:mshaikh22
ID: 38770840
the dag ip is the same as the server ip, as it was failed over.

dag ips 10.78.133.50  online
10.65.65.50 offline
0
 

Accepted Solution

by:
mshaikh22 earned 0 total points
ID: 39295889
Issue was resolved by removing the server from the dag and re adding it as a different servername. Exchange had to be reinstalled and the server had to be removed and added back to the domain.
0
 

Author Closing Comment

by:mshaikh22
ID: 39305126
Couldn't get a solution for the issue
0

Featured Post

Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Join & Write a Comment

Not sure what the best email signature size is? Are you worried about email signature image size? Follow this best practice guide.
Follow this checklist to learn more about the 15 things you should never include in an email signature from personal quotes, animated gifs and out-of-date marketing content.
This tutorial will give a an overview on how to deploy remote agents in Backup Exec 2012 to new servers. Click on the Backup Exec button in the upper left corner. From here, are global settings for the application such as connecting to a remote Back…
In this Micro Video tutorial you will learn the basics about Database Availability Groups and How to configure one using a live Exchange Server Environment. The video tutorial explains the basics of the Exchange server Database Availability grou…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now