[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 694
  • Last Modified:

Windows 2008 R2 cluster + exchange DAG node issues after reboot

Hi guys,

We are currently running exchange 2010 in a DAG setup using 2 servers(windows 2008R2 ent) in a cluster and have been for over a year now.
We now have a problem, where we have rebooted one of the 2 nodes in an win 2008R2sp1 exchange 2010 DAG setup, and once rebooted, the node is failing to join the cluster!

Event logs reported issues with not being able to see the share file witness server, but in failover cluster manager under Cluster Core Resources, is is showing as both cluster name - DAG and witness server online! but the node under nodes, is unavailable.

If someone could please help, as this is a production server so we are keen to get this working again

We have not attempted to reboot the other node in case we end up in a worse situation than we are now.

Many thanks

Jim
0
macleandata
Asked:
macleandata
  • 11
  • 8
2 Solutions
 
Larry LarmeuManaging DirectorCommented:
Have you tried removing the node from the DAG and adding it back?

Also make sure you can ping/browse the file share witness server from the affected node.
0
 
macleandataAuthor Commented:
Hi,

Thanks for your prompt response.

We can ping/browse the file share, but we have not yet removed the node from the dag, to be honest, as our knowledge of cluster/DAG is limited, we did not want to make the situation any worse!  So to clarify, from EMC remove each failed copy from each database (7 in total) then in manage database availability group memebership remove the node?

Thanks

Jim
0
 
Larry LarmeuManaging DirectorCommented:
Yes, that would be the process to remove the node from the cluster.  Should not affect your other node that is running the active copies.  Before you do that you may want to try flipping your primary and secondary witness servers or specifying a new witness server to see if that helps.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
macleandataAuthor Commented:
Ok thanks,  we do not currently have the secondary witness server setup, something we were looknig to do.  Ok I'll try this first then remove/add the node back in.
Thanks
0
 
macleandataAuthor Commented:
Hi,

Ok, both options did not work, when removing the node we had this error generated:

Summary: 1 item(s). 0 succeeded, 1 failed.
Elapsed time: 00:00:08


MHMEXCH20
Failed

Error:
There was a problem changing the quorum model for database availability group DAG1. Error: An Active Manager operation failed. Error: An error occurred while attempting a cluster operation. Error: Cluster API '"SetClusterQuorumResource() failed with 0x1725. Error: A quorum of cluster nodes was not present to form a cluster"' failed..
Click here for help... http://technet.microsoft.com/en-US/library/ms.exch.err.default(EXCHG.140).aspx?v=14.1.285.0&t=exchgf1&e=ms.exch.err.Ex7B51A5

Warning:
The operation wasn't successful because an error was encountered. You may find more details in log file "C:\ExchangeSetupLogs\DagTasks\dagtask_2012-09-05_16-05-42.023_remove-databaseavailabiltygroupserver.log".


Exchange Management Shell command attempted:
Remove-DatabaseAvailabilityGroupServer -MailboxServer 'MHMEXCH20' -Identity 'DAG1'

Elapsed Time: 00:00:09


Thanks

Jim
0
 
macleandataAuthor Commented:
Just to update you:

Just loked into this error and found this site - http://exchangeserverpro.com/unable-remove-failed-server-dag-exchange-server-2010

we have managed to remove the node from the DAG sucessfully now, we are just adding it back in
0
 
Larry LarmeuManaging DirectorCommented:
Try this from the shell:

Remove-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer MHMEXCH20 -ConfigurationOnly
0
 
Larry LarmeuManaging DirectorCommented:
Ah - beat me to it.
0
 
macleandataAuthor Commented:
Good old google ;-) ok thanks

Ok adding the node failed :-(

Error + I have attached the log file mentioned in the error, not sure you you could look throguh to see if you spot anything that sticks out as an issue:


MHMEXCH20
Failed

Error:
A server-side database availability group administrative operation failed. Error: The operation failed. CreateCluster errors may result from incorrectly configured static addresses. Error: An error occurred while attempting a cluster operation. Error: Cluster API '"AddClusterNode() (MaxPercentage=100) failed with 0x5b4. Error: This operation returned because the timeout period expired"' failed. [Server: MHMEXCH21.themovefactory.com]

An Active Manager operation failed. Error: An error occurred while attempting a cluster operation. Error: Cluster API '"AddClusterNode() (MaxPercentage=100) failed with 0x5b4. Error: This operation returned because the timeout period expired"' failed..

This operation returned because the timeout period expired
Click here for help... http://technet.microsoft.com/en-US/library/ms.exch.err.default(EXCHG.140).aspx?v=14.1.285.0&t=exchgf1&e=ms.exch.err.ExC9C315

Warning:
The operation wasn't successful because an error was encountered. You may find more details in log file "C:\ExchangeSetupLogs\DagTasks\dagtask_2012-09-05_16-14-16.997_add-databaseavailabiltygroupserver.log".


Exchange Management Shell command attempted:
Add-DatabaseAvailabilityGroupServer -MailboxServer 'MHMEXCH20' -Identity 'DAG1'



Many thanks

Jim
logfile.log
0
 
Larry LarmeuManaging DirectorCommented:
Looking at the log and doing some research it looks like most people's recommendation for this error is to dissolve the DAG and create a new DAG with a different name.  Seems like the cluster configuration has some kind of corruption.
0
 
macleandataAuthor Commented:
;-( ok sounds quite drastic, do you have any suggestions for a clean dissolve?  we also have a problem at the moment where we can't backup either at the moment as backup exec 2012 only sees the DAG.

If you're able to assist i anyway, that would be appreciated

Thanks for your help so far, really is appreciated!

Jim
0
 
Larry LarmeuManaging DirectorCommented:
Why are you not able to backup the working node?
0
 
Simon Butler (Sembee)ConsultantCommented:
What is the situation with the DAG at the moment? Does Exchange still see the DAG? Does it see the members?
You need to get the members out, which means you need to remove the database copies.
Once you have got the DAG out then you can recreate it.

A lot of these problems are due to DNS issues, where the DAG name doesn't resolve correctly.

Simon.
0
 
macleandataAuthor Commented:
Backup exec fails on backing up the DAG selection, but if you drill into the physical server, you only see 3 folders Address messagemanager and replay, although  have just restarted the BE agent on the active DB server and I'm now seeing database locations and logs files. Not the information store though which I can now see on the node which is now out of the DAG, so I'll backup the DB and logs tonight, at least we will have a DB copy of some sorts.

If we were to reboot the Active node, how will this affect the db's +dag when rebooted?  Iknow a dag can withstand a single node fail (with witness and second node live) but not sure if a reboot of the remaining node would cause problems?

Wading through info re - disolving the dag and rebuilding again!!

Jim
0
 
Larry LarmeuManaging DirectorCommented:
Are you sure your DAG networks are set up correctly?  Can you ping the DAG cluster IP addresses and ping DAG1?
0
 
macleandataAuthor Commented:
Hi both,

Yes DAG1 resolves to IP and pings, which is currently looking at the arp table is as expected, pinging the active server.

Simon - current status of the DAG, is that it can see the DAG and the only node now remaining, we removed the failing node, first deleting the db copies.

With regards to the backup, I would've expected the dag to continue to backup the remaining node with active db on which points to an issue with the DAG.

Thanks

Jim
0
 
Larry LarmeuManaging DirectorCommented:
I hate to tell you to proceed without a backup.  Do you have a way of doing a snapshot or something like that before you proceed?
0
 
macleandataAuthor Commented:
Stilling looking at another way using BE2012 to backup the live node another way, but for tonight the only thing I can see is to backup the .edb and log files just so we at least have I hope something if anything should go wrong!
0
 
macleandataAuthor Commented:
Small update,  I can now see the information store in the DAG in BE2012 :-) which is looking at the active node phewwww, so at least we can now get a clean backup + logs will be flushed.

Will spend time today diagnosing the current DAG to see if we can re-introduce the failed node, although it's not like we are adding a clean rebuilt node it has been apart of the DAG.

Thanks so far for everyones help, if anyone has any other ideas, last resort we will rebuild the DAG I guess.
0
 
macleandataAuthor Commented:
MAnaged to enentually break the DAG and rebuild
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 11
  • 8
Tackle projects and never again get stuck behind a technical roadblock.
Join Now