Solved

Exchange 2010 Dag enviroment issues

Posted on 2013-01-05
22
628 Views
Last Modified: 2013-07-07
Dear Experts,

I am currently having some serious problems with my exchange 2010 dag environment running on 5 exchange mailbox servers and 14 databases. 3 mailbox servers holding database copies and 2 mailbox server are being used for only heartbeat for now.
 

2 exchange mailbox server (ex01 and ex03) are on the 10.78.133.0 subnet.  (10.78.133.143, 10.78.133.150)
1 Exchange mailbox server (ex02) is on the 10.65.65.0 subnet    (10.65.65.100) (All 3 mailbox servers are on one site)
hw01 - mailbox server (seperate site)
nw01 - mailbox server (seperate site)

The cluster is on a node majority mode

dag ips 10.78.133.50
10.65.65.50

Earlier this morning, ex02 ip went offline with the following event log 1135 and 10.78.78.50 dag ip came online and 10.65.65.0 was offline. As a result all of the databases on 10.65.65.100 went in a failed state and 10.65.65.100 ip went unavailable  and was giving errors like the network manager could not be intialized.

after sometime and a reboot ex02 came back online, but ex03  10.78.78.150 ip became unavailable and its database went into failed state. I am not sure about ex03,  why did it go out and how can i bring the ip to an available state

Would need your help in the matter?

thank you

mshaikh22
0
Comment
Question by:mshaikh22
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 15
  • 7
22 Comments
 
LVL 37

Expert Comment

by:ArneLovius
ID: 38748354
I would guess that the cluster IP has moved to the other subnet, you should be able to see this in cluster manager

to move it back, you would need to update the cluster from an elevated command prompt

cluster.exe <DAG F.Q.D.N.> group "cluster group" /moveto:<server name>
cluster.exe <DAG F.Q.D.N.> group "available storage" /moveto:<server name>

Open in new window


if the domain name was domain.internal, the  DAG name was DAG-01 and the server was server-1, they would look like

cluster.exe DAG-01.domain.internal group "cluster group" /moveto:server-1
cluster.exe DAG-01.domain.internal group "available storage" /moveto:server-1

Open in new window


However, this does not cover the databases being in a failed state. I would guess that "something else" has happened as well, such as losing the witness share at the same time.
0
 

Author Comment

by:mshaikh22
ID: 38748364
thank you ArneLovius

I put in the following command

cluster.exe DAG-01.domain.internal group "cluster group"

I am getting the following  

System error 1331 has occurred (0x00000533).
Logon failure: account currently disabled.

how can i find out the account related to this.
0
 

Author Comment

by:mshaikh22
ID: 38748369
I locked out my account. its fine.
now
0
Office 365 Advanced Training for Admins

Special Offer:  Buy 1 course, get 2nd free!  Buy the 'Managing Office 365 Identities & Requirements' course w/ Accelerated TestPrep, and automatically receive the 'Enabling Office 365 Services' course FREE!

 

Author Comment

by:mshaikh22
ID: 38753837
Sorry about that. The issue regarding the cluster group has not been resolved. the symptons are the same.


ex03 node is still unable in the cluster. It has not moved to another subnet

In failover cluster manager

cluster network 1 says

 10.78.133.143 - online
10.78.133.150  - unavailable

in daggroupavailabilitynetwork section its showing ex03 ip as unavailable also.

I dont see much in event logs, the cluster service keep stopping.


How can we resolve this issue?
0
 
LVL 37

Expert Comment

by:ArneLovius
ID: 38754270
can you post screengrabs from the cluster manager
0
 

Author Comment

by:mshaikh22
ID: 38754319
Please find Failover Cluster Manager screenshot

cluster network 1 says

 10.78.133.143 - online
10.78.133.150  - unavailable
fc.png
0
 

Author Comment

by:mshaikh22
ID: 38754382
I even followed steps laid in the technet post, but it didnt bring the cluster resource back online, even by unchecking the client option and re checking it.


http://blogs.technet.com/b/timmcmic/archive/2010/05/12/cluster-core-resources-fail-to-come-online-on-some-exchange-2010-database-availability-group-dag-nodes.aspx
fc2.png
0
 
LVL 37

Expert Comment

by:ArneLovius
ID: 38754434
are you using different MAPI and replication networks ?

I'm not sure what you meant by "2 mailbox server are being used for only heartbeat for now" I they are not active mailbox servers, then remove them from the DAG.

Where is your file share witness ?
0
 

Author Comment

by:mshaikh22
ID: 38754543
we are using a team nic that does mapi and replication together.

there is no file witness - its configured as node majority model (which works on a n+1 model)


bg ad site - ex01 ex03 same subnet
bg ad site - different subnet - l ex01
h ad site h ex01
n ad site n ex01
0
 

Author Comment

by:mshaikh22
ID: 38758697
I tried changing the ip of ex03 to a different subnet. I noticed that nothing changed on the cluster and the new cluster network is not showing.

Would really appreciate your help with this.

Regards,

Mansoor
0
 
LVL 37

Expert Comment

by:ArneLovius
ID: 38759492
when you have 5 live servers, the witness is not used, but as soon as a server goes down and you had an even number of live servers, the witness was required, and this lack of witness is the probable cause of your failure

I would suggest that you configured the witness.
0
 

Author Comment

by:mshaikh22
ID: 38759523
I keep getting this error


Node 'EX03' failed to establish a communication session while joining the cluster. This was due to an authentication failure. Please verify that the nodes are running compatible versions of the cluster service software.
0
 
LVL 37

Expert Comment

by:ArneLovius
ID: 38759616
I would check time sync between the servers.

Have you added the witness?

The witness can be on any member server, but not a domain controller or a DFS share.
0
 

Author Comment

by:mshaikh22
ID: 38759641
file share witness is on configured to be on cas01
and cas02

but the failover cluster manager is based on node majority model
0
 

Author Comment

by:mshaikh22
ID: 38759647
time is synced between all servers
0
 
LVL 37

Expert Comment

by:ArneLovius
ID: 38759921
Done the witness yet ?
0
 

Author Comment

by:mshaikh22
ID: 38759969
cluster does not use its use node majority model.
witness was configured prior to changing the model

How can I solve 1570 event error
0
 

Author Comment

by:mshaikh22
ID: 38770277
hi experts

we removed ex03 from the dag and left it for a day, but cant re add to the dag, we are getting the following error message. would appreciate your help on this, #

A server-side database availability group administrative operation failed. Error: The operation failed. CreateCluster errors may result from incorrectly configured static addresses. Error: An error occurred while attempting a cluster operation. Error: Cluster API '"AddClusterNode() (MaxPercentage=100) failed with 0x5b4. Error: This operation returned because the timeout period expired"' failed

An Active Manager operation failed. Error: An error occurred while attempting a cluster operation. Error: Cluster API '"AddClusterNode() (MaxPercentage=100) failed with 0x5b4. Error: This operation returned because the timeout period expired"' failed..
0
 
LVL 37

Expert Comment

by:ArneLovius
ID: 38770347
I am going to guess that the cluster IP address does not match the active cluster host.
0
 

Author Comment

by:mshaikh22
ID: 38770840
the dag ip is the same as the server ip, as it was failed over.

dag ips 10.78.133.50  online
10.65.65.50 offline
0
 

Accepted Solution

by:
mshaikh22 earned 0 total points
ID: 39295889
Issue was resolved by removing the server from the dag and re adding it as a different servername. Exchange had to be reinstalled and the server had to be removed and added back to the domain.
0
 

Author Closing Comment

by:mshaikh22
ID: 39305126
Couldn't get a solution for the issue
0

Featured Post

Office 365 Training for IT Pros

Learn how to provision Office 365 tenants, synchronize your on-premise Active Directory, and implement Single Sign-On.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Background Information Recently I have fixed file server permission issues for one of my client. The client has 1800 users and one Windows Server 2008 R2 domain joined file server with 12 TB of data, 250+ shared folders and the folder structure i…
This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
This video demonstrates how to sync Microsoft Exchange Public Folders with smartphones using CodeTwo Exchange Sync and Exchange ActiveSync. To learn more about CodeTwo Exchange Sync and download the free trial, go to: http://www.codetwo.com/excha…
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question