Wicked-Vengence
asked on
Issue with Exchange 2010 DAG
Currently having an issue with our Exchange Environment which was previously working. We have 2 Exchange 2010SP1 MBX role (no rollup) servers running on ESX 4.1. They are set up in a D.A.G. and File Service Witness and currently, one of the nodes will not come online (event errors 1564, 1069, 1573 and 1570 from the failover cluster service) as the Network Nodes are showing as unavailable so it will not come online.
They have two seperate NIC's on different VLAN's (both NIC's are showing unavailable for the cluster node) for the DAG and I can ping the IP's on both as well as confirm that they can communicate properly through the VLAN's. I also ran a cluster validation test and the network portion tests ok. No events are showing for the NIC's.
They have two seperate NIC's on different VLAN's (both NIC's are showing unavailable for the cluster node) for the DAG and I can ping the IP's on both as well as confirm that they can communicate properly through the VLAN's. I also ran a cluster validation test and the network portion tests ok. No events are showing for the NIC's.
Can you share the errors messages with the description.
Check the following
Heartbit network IPs and Ping are ok
LAN network IPs and Ping Ok
Did you use Multicast NLB in DAG Nodes? Check any firewall blocking in switch side
Run Exchange BPA to find out detailed error and post here pls
Setup Affinity rule for Exchange DAG server to stick to two Specific Hosts not any hosts
Check ESXi Network firewall policy.... Do not notify switch option unticked in vSwitch security
Check symantec end point protection /Antivirus program blocking communication
Untick IPV6 from DAG servers
Heartbit network IPs and Ping are ok
LAN network IPs and Ping Ok
Did you use Multicast NLB in DAG Nodes? Check any firewall blocking in switch side
Run Exchange BPA to find out detailed error and post here pls
Setup Affinity rule for Exchange DAG server to stick to two Specific Hosts not any hosts
Check ESXi Network firewall policy.... Do not notify switch option unticked in vSwitch security
Check symantec end point protection /Antivirus program blocking communication
Untick IPV6 from DAG servers
I would update with the rollups first.
and as araberuni suggested, untick IPV6
and as araberuni suggested, untick IPV6
ASKER
It starts with...
Error 1564: File share witness resource 'File Share Witness (\\Servername\DAG File share witness resource 'File Share Witness (\\Servername\DAG File Share)' failed to arbitrate for the file share '\\corp-hub-01.corporate.a droot.com\ Corporate. corporate. adroot.com '. Please ensure that file share '\\Servername\DAG File Share' exists and is accessible by the cluster.oot.com)' failed to arbitrate for the file share '\\Servername\DAG File Share'. Please ensure that file share '\\Servername\DAG File Share' exists and is accessible by the cluster.
Then...
Error 1069: Cluster resource 'File Share Witness (\\Server Name\DAG File Share)' in clustered service or application 'Cluster Group' failed.
Then...
Error 1573: Node 'ServerName' failed to form a cluster. This was because the witness was not accessible. Please ensure that the witness resource is online and available.
Last...
Error 1570: Node 'ServerName' failed to establish a communication session while joining the cluster. This was due to an authentication failure. Please verify that the nodes are running compatible versions of the cluster service software.
It does these errors almost constantly...
Heartbit network IPs and Ping are ok: Can ping perfectly fine and also reaches gateway fine.
LAN network IPs and Ping Ok: Can ping perfectly fine and also reaches gateway fine.
Did you use Multicast NLB in DAG Nodes? Check any firewall blocking in switch side: There are no blocking rules on the Switch Side (please note that I did migrate this to the same ESX server as the working node). There is no NLB that I am aware of (I'm assuming that you are referring to Windows NLB here, which isn't an installed feature on the server).
Run Exchange BPA to find out detailed error and post here pls: Running the tool, I got warnings on driver dates (running the appropriate VMTools) and the only error was under networking saying 'Performance Data cannot be accessed' Error: Unknown Error (0xc0000bb8)
Setup Affinity rule for Exchange DAG server to stick to two Specific Hosts not any hosts: Done.
Check ESXi Network firewall policy.... Do not notify switch option unticked in vSwitch security: Done
Check symantec end point protection /Antivirus program blocking communication: Using MS Forefront and not being blocked.
Untick IPV6 from DAG servers: Already done.
Error 1564: File share witness resource 'File Share Witness (\\Servername\DAG File share witness resource 'File Share Witness (\\Servername\DAG File Share)' failed to arbitrate for the file share '\\corp-hub-01.corporate.a
Then...
Error 1069: Cluster resource 'File Share Witness (\\Server Name\DAG File Share)' in clustered service or application 'Cluster Group' failed.
Then...
Error 1573: Node 'ServerName' failed to form a cluster. This was because the witness was not accessible. Please ensure that the witness resource is online and available.
Last...
Error 1570: Node 'ServerName' failed to establish a communication session while joining the cluster. This was due to an authentication failure. Please verify that the nodes are running compatible versions of the cluster service software.
It does these errors almost constantly...
Heartbit network IPs and Ping are ok: Can ping perfectly fine and also reaches gateway fine.
LAN network IPs and Ping Ok: Can ping perfectly fine and also reaches gateway fine.
Did you use Multicast NLB in DAG Nodes? Check any firewall blocking in switch side: There are no blocking rules on the Switch Side (please note that I did migrate this to the same ESX server as the working node). There is no NLB that I am aware of (I'm assuming that you are referring to Windows NLB here, which isn't an installed feature on the server).
Run Exchange BPA to find out detailed error and post here pls: Running the tool, I got warnings on driver dates (running the appropriate VMTools) and the only error was under networking saying 'Performance Data cannot be accessed' Error: Unknown Error (0xc0000bb8)
Setup Affinity rule for Exchange DAG server to stick to two Specific Hosts not any hosts: Done.
Check ESXi Network firewall policy.... Do not notify switch option unticked in vSwitch security: Done
Check symantec end point protection /Antivirus program blocking communication: Using MS Forefront and not being blocked.
Untick IPV6 from DAG servers: Already done.
I believe I got the solution for you http://blogs.technet.com/b/timmcmic/archive/2010/05/12/cluster-core-resources-fail-to-come-online-on-some-exchange-2010-database-availability-group-dag-nodes.aspx
Once this is sorted use these guides to upgrade exchange to SP2
http://microsoftguru.com.au/2012/04/01/error-message-when-you-try-to-install-exchange-server-2010-sp2-authorizationmanager-check-failed/
http://microsoftguru.com.au/2011/12/12/exchange-2010-sp2-is-available-for-download/
Once this is sorted use these guides to upgrade exchange to SP2
http://microsoftguru.com.au/2012/04/01/error-message-when-you-try-to-install-exchange-server-2010-sp2-authorizationmanager-check-failed/
http://microsoftguru.com.au/2011/12/12/exchange-2010-sp2-is-available-for-download/
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
Looking at the errors, DAG lost the conectivity to the FSW and then it failed.
Run the cmd to set the fsw again:
Set-databaseavailabilitygr oupserver -witnessserver 'servername' -witnessdirectory "directory name"
Run the cmd to set the fsw again:
Set-databaseavailabilitygr
you need to check three things here:
1) Go to file share and check if its still shared and Exchange Trusted Sub System is having full permission to the directory and also add same account to the file share server administrators group.
2) Disable IP V6 completely by using below fix it.
http://support.microsoft.com/kb/929852
3) Also check if all Network cards and Ip's configured properly and windows firewall is disable.
Ranjan
MCITP:Exch2007 & 2010
1) Go to file share and check if its still shared and Exchange Trusted Sub System is having full permission to the directory and also add same account to the file share server administrators group.
2) Disable IP V6 completely by using below fix it.
http://support.microsoft.com/kb/929852
3) Also check if all Network cards and Ip's configured properly and windows firewall is disable.
Ranjan
MCITP:Exch2007 & 2010
ASKER
Confirmed that the Exchange Trusted Sub System has full access to the share.
Performed the steps listed to disable IPv6.
Firewall is not enabled on NIC's and IP's are correct. As noted before, I can ping and go out on both NIC's here. Also, I am able to browse to the FSW share from either NIC/Network perfectly fine.
I also did do the Set-Database command but still not working.
I have stopped and restarted the Cluster as well.
I will note (speculation here) that although the NIC's are up and functioning fine, in the Cluster Network is shows both NIC/Networks as unavailable in the cluster manager. I'm guessing that the cluster is seeing the networks as offline and therefor cannot reach the FSW. I'm just not sure why the Cluster Networks are showing as offline.
The only other thing that I have noticed... When I went into Cluster Networks to disable 'Allow Clients to connect to this network' I would uncheck the box, get the prompt saying 'This network is no longer available...' but when I go back into the property page, it would show as being checked again...
Performed the steps listed to disable IPv6.
Firewall is not enabled on NIC's and IP's are correct. As noted before, I can ping and go out on both NIC's here. Also, I am able to browse to the FSW share from either NIC/Network perfectly fine.
I also did do the Set-Database command but still not working.
I have stopped and restarted the Cluster as well.
I will note (speculation here) that although the NIC's are up and functioning fine, in the Cluster Network is shows both NIC/Networks as unavailable in the cluster manager. I'm guessing that the cluster is seeing the networks as offline and therefor cannot reach the FSW. I'm just not sure why the Cluster Networks are showing as offline.
The only other thing that I have noticed... When I went into Cluster Networks to disable 'Allow Clients to connect to this network' I would uncheck the box, get the prompt saying 'This network is no longer available...' but when I go back into the property page, it would show as being checked again...
I would suggest to configure with one network card as there is little tricky configuring two network card, to set replication and MAPI traffic also change binding order in network advanced properties and disable register DNS option with replication network.
Any how i suspect there is problem with the cluster network configuration,
I assume you have full right to configure cluster, then remove the existing DAG and cluster resource and while configuring DAG it will automatically setup cluster resources.
Any how i suspect there is problem with the cluster network configuration,
I assume you have full right to configure cluster, then remove the existing DAG and cluster resource and while configuring DAG it will automatically setup cluster resources.
Here is a link that I suggest you read through if you haven't yet.
It's a good step by step and fairly complete guide on how to configure the NIC's and such for a good setup.
It could help uncover your issue...
http://www.msexchange.org/articles_tutorials/exchange-server-2010/high-availability-recovery/uncovering-exchange-2010-database-availability-groups-dags-part2.html
It's a good step by step and fairly complete guide on how to configure the NIC's and such for a good setup.
It could help uncover your issue...
http://www.msexchange.org/articles_tutorials/exchange-server-2010/high-availability-recovery/uncovering-exchange-2010-database-availability-groups-dags-part2.html
ASKER
Never got the networking fixed. Wound up going with removing the DAG and recreating it.
For your information, we had the same problem and it was caused by the Network Threat Protection of Symantec EndPoint Protection recently installed on one of the nodes.