We help IT Professionals succeed at work.

Exchange 2010 Dag enviroment issues

686 Views
Last Modified: 2013-07-07
Dear Experts,

I am currently having some serious problems with my exchange 2010 dag environment running on 5 exchange mailbox servers and 14 databases. 3 mailbox servers holding database copies and 2 mailbox server are being used for only heartbeat for now.
 

2 exchange mailbox server (ex01 and ex03) are on the 10.78.133.0 subnet.  (10.78.133.143, 10.78.133.150)
1 Exchange mailbox server (ex02) is on the 10.65.65.0 subnet    (10.65.65.100) (All 3 mailbox servers are on one site)
hw01 - mailbox server (seperate site)
nw01 - mailbox server (seperate site)

The cluster is on a node majority mode

dag ips 10.78.133.50
10.65.65.50

Earlier this morning, ex02 ip went offline with the following event log 1135 and 10.78.78.50 dag ip came online and 10.65.65.0 was offline. As a result all of the databases on 10.65.65.100 went in a failed state and 10.65.65.100 ip went unavailable  and was giving errors like the network manager could not be intialized.

after sometime and a reboot ex02 came back online, but ex03  10.78.78.150 ip became unavailable and its database went into failed state. I am not sure about ex03,  why did it go out and how can i bring the ip to an available state

Would need your help in the matter?

thank you

mshaikh22
Comment
Watch Question

CERTIFIED EXPERT

Commented:
I would guess that the cluster IP has moved to the other subnet, you should be able to see this in cluster manager

to move it back, you would need to update the cluster from an elevated command prompt

cluster.exe <DAG F.Q.D.N.> group "cluster group" /moveto:<server name>
cluster.exe <DAG F.Q.D.N.> group "available storage" /moveto:<server name>

Open in new window


if the domain name was domain.internal, the  DAG name was DAG-01 and the server was server-1, they would look like

cluster.exe DAG-01.domain.internal group "cluster group" /moveto:server-1
cluster.exe DAG-01.domain.internal group "available storage" /moveto:server-1

Open in new window


However, this does not cover the databases being in a failed state. I would guess that "something else" has happened as well, such as losing the witness share at the same time.

Author

Commented:
thank you ArneLovius

I put in the following command

cluster.exe DAG-01.domain.internal group "cluster group"

I am getting the following  

System error 1331 has occurred (0x00000533).
Logon failure: account currently disabled.

how can i find out the account related to this.

Author

Commented:
I locked out my account. its fine.
now

Author

Commented:
Sorry about that. The issue regarding the cluster group has not been resolved. the symptons are the same.


ex03 node is still unable in the cluster. It has not moved to another subnet

In failover cluster manager

cluster network 1 says

 10.78.133.143 - online
10.78.133.150  - unavailable

in daggroupavailabilitynetwork section its showing ex03 ip as unavailable also.

I dont see much in event logs, the cluster service keep stopping.


How can we resolve this issue?
CERTIFIED EXPERT

Commented:
can you post screengrabs from the cluster manager

Author

Commented:
Please find Failover Cluster Manager screenshot

cluster network 1 says

 10.78.133.143 - online
10.78.133.150  - unavailable
fc.png

Author

Commented:
I even followed steps laid in the technet post, but it didnt bring the cluster resource back online, even by unchecking the client option and re checking it.


http://blogs.technet.com/b/timmcmic/archive/2010/05/12/cluster-core-resources-fail-to-come-online-on-some-exchange-2010-database-availability-group-dag-nodes.aspx
fc2.png
CERTIFIED EXPERT

Commented:
are you using different MAPI and replication networks ?

I'm not sure what you meant by "2 mailbox server are being used for only heartbeat for now" I they are not active mailbox servers, then remove them from the DAG.

Where is your file share witness ?

Author

Commented:
we are using a team nic that does mapi and replication together.

there is no file witness - its configured as node majority model (which works on a n+1 model)


bg ad site - ex01 ex03 same subnet
bg ad site - different subnet - l ex01
h ad site h ex01
n ad site n ex01

Author

Commented:
I tried changing the ip of ex03 to a different subnet. I noticed that nothing changed on the cluster and the new cluster network is not showing.

Would really appreciate your help with this.

Regards,

Mansoor
CERTIFIED EXPERT

Commented:
when you have 5 live servers, the witness is not used, but as soon as a server goes down and you had an even number of live servers, the witness was required, and this lack of witness is the probable cause of your failure

I would suggest that you configured the witness.

Author

Commented:
I keep getting this error


Node 'EX03' failed to establish a communication session while joining the cluster. This was due to an authentication failure. Please verify that the nodes are running compatible versions of the cluster service software.
CERTIFIED EXPERT

Commented:
I would check time sync between the servers.

Have you added the witness?

The witness can be on any member server, but not a domain controller or a DFS share.

Author

Commented:
file share witness is on configured to be on cas01
and cas02

but the failover cluster manager is based on node majority model

Author

Commented:
time is synced between all servers
CERTIFIED EXPERT

Commented:
Done the witness yet ?

Author

Commented:
cluster does not use its use node majority model.
witness was configured prior to changing the model

How can I solve 1570 event error

Author

Commented:
hi experts

we removed ex03 from the dag and left it for a day, but cant re add to the dag, we are getting the following error message. would appreciate your help on this, #

A server-side database availability group administrative operation failed. Error: The operation failed. CreateCluster errors may result from incorrectly configured static addresses. Error: An error occurred while attempting a cluster operation. Error: Cluster API '"AddClusterNode() (MaxPercentage=100) failed with 0x5b4. Error: This operation returned because the timeout period expired"' failed

An Active Manager operation failed. Error: An error occurred while attempting a cluster operation. Error: Cluster API '"AddClusterNode() (MaxPercentage=100) failed with 0x5b4. Error: This operation returned because the timeout period expired"' failed..
CERTIFIED EXPERT

Commented:
I am going to guess that the cluster IP address does not match the active cluster host.

Author

Commented:
the dag ip is the same as the server ip, as it was failed over.

dag ips 10.78.133.50  online
10.65.65.50 offline
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION

Author

Commented:
Couldn't get a solution for the issue

Gain unlimited access to on-demand training courses with an Experts Exchange subscription.

Get Access
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Empower Your Career
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE

Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.