Solved

Exchange 2010 DAG Node Down And Cannot Get It Back Up

Posted on 2013-06-19
13
2,866 Views
Last Modified: 2013-07-13
Hello,

I am an extremely unhappy camper right now.  Last week I set up a DAG for Exchange 2010 SP3 running on Win 2012 Server.   Worked fine.   Set up like so:

ex01 - 172.16.1.141 - mapi network
            10.10.10.141 - replication network
MBX, CAS, HUB

ex02 - 172.16.1.66 - mapi network
            10.10.10.66 - replication network
MBX, CAS, HUB

srvr01 - 172.16.1.217 - File Witness Server
No Exchange loaded

exo1 mbx database replicated to ex02
ex02 mbx database replicated to ex01

was working fine.   tested switchover, worked fine.  booted ex01 and failed over to ex02 with no problem.    then all put back to normal.

Today I replaced a network switch which ex01 10.10.10.141 was connected.
Just powered off, replaced it and plugged the LAN back into it.

ex01 mbx database successfully failed over to ex02.  Running fine.   HOWEVER,
in EMC, there are now no network interfaces, where before they were there.  The MAPI Network and Replication network subnets status are both unknown.   In the Failover Management Console, Node ex01 is DOWN.    When I validate the cluster, it says the connections exist between the nodes, but it cannot ping ex01 to ex02 or ex02 to ex01 on the replication (10.0.0.0/8) subnet.    

I have done everything I can think up to get this network connection working.... booting, powering off, uninstalling the NIC and reconfiguring it.   Both ex servers show node ex01 as down and no NIC info for the interfaces.    Need some help.  Please advise.    Thanks.
0
Comment
Question by:rstuemke
13 Comments
 
LVL 4

Expert Comment

by:Alexander Kireev
ID: 39261526
Hello,

Could you send an answer of cmdlet "Get-DatabaseAvailabilityGroupNetwork | fl"?

Did you follow instructions about network configuration? Article - Table 1.
https://www.simple-talk.com/sysadmin/exchange/exchange-2010-dag-creation-and-configuration-part-1/

Replication network must have clear check box "Register this connection’s addresses in DNS".
0
 
LVL 12

Expert Comment

by:SreRaj
ID: 39261538
Hi,

From Ex01, try to ping to the gateway IP Address for subnet 10.0.0.0/8. Verify the connectivity is there. Also verify after replacing the switch, gateway for 10.0.0.0/8 is still connected to the switch.

Also, could you please confirm all the IP Addresses you have mentioned earlier is statically configured and switch is not using any DHCP Server for IP Allocation.
0
 
LVL 8

Expert Comment

by:I Qasmi
ID: 39261675
You need to check for the preferred network connections on each server.
Chances are there there might be the old one or disconnected one set as preferred LAN connection for network access can cause the failure.

Cross check and verify that the NIC you have installed has been set on top most priority

Also open Cluster Failover manager and toggle to Cluster core resources  and check
whether all the Cluster core resources under the network are up ,

If not then try bringing the resource online by right click > Bring online and check

check this also

http://blogs.technet.com/b/timmcmic/archive/2010/05/12/cluster-core-resources-fail-to-come-online-on-some-exchange-2010-database-availability-group-dag-nodes.aspx

http://workinghardinit.wordpress.com/2010/06/18/exchange-2010-dag-issue-cluster-ip-address-resource-cluster-ip-address-cannot-be-brought-online/
0
 

Author Comment

by:rstuemke
ID: 39262543
Thanks for all the responses.    I will answer each one in a separate post.

chestor02 -

showing DAG first  1730W436 is the bad boy.

[PS] C:\>get-databaseavailabilitygroup | fl


RunspaceId                             : e44896bc-ca9d-4895-b901-12d593239662
Name                                   : CTCCDAG
Servers                                : {1730W436QPS1, 1730W50RXBX1}
WitnessServer                          : 1730wc4xmth1.calvaryspringfield.org
WitnessDirectory                       : C:\CTCCDAG Witness Directory
AlternateWitnessServer                 : 1730wcl2hyh1.calvaryspringfield.org
AlternateWitnessDirectory              : C:\CTCCDAG Witness Directory
NetworkCompression                     : InterSubnetOnly
NetworkEncryption                      : InterSubnetOnly
DatacenterActivationMode               : Off
StoppedMailboxServers                  : {}
StartedMailboxServers                  : {}
DatabaseAvailabilityGroupIpv4Addresses : {172.16.1.159}
DatabaseAvailabilityGroupIpAddresses   : {172.16.1.159}
AllowCrossSiteRpcClientAccess          : False
OperationalServers                     :
PrimaryActiveManager                   :
ServersInMaintenance                   :
ServersInDeferredRecovery              :
ThirdPartyReplication                  : Disabled
ReplicationPort                        : 0
NetworkNames                           : {}
WitnessShareInUse                      :
AdminDisplayName                       :
ExchangeVersion                        : 0.10 (14.0.100.0)
DistinguishedName                      : CN=CTCCDAG,CN=Database Availability Groups,CN=Exchange Administrative Group (F
                                         YDIBOHF23SPDLT),CN=Administrative Groups,CN=Calvary Temple,CN=Microsoft Exchan
                                         ge,CN=Services,CN=Configuration,DC=calvaryspringfield,DC=org
Identity                               : CTCCDAG
Guid                                   : 9e311866-c047-44c1-bf11-814ead816c9f
ObjectCategory                         : calvaryspringfield.org/Configuration/Schema/ms-Exch-MDB-Availability-Group
ObjectClass                            : {top, msExchMDBAvailabilityGroup}
WhenChanged                            : 6/19/2013 1:34:56 PM
WhenCreated                            : 6/13/2013 9:24:21 AM
WhenChangedUTC                         : 6/19/2013 6:34:56 PM
WhenCreatedUTC                         : 6/13/2013 2:24:21 PM
OrganizationId                         :
OriginatingServer                      : 1730W6FZNQW1.calvaryspringfield.org
IsValid                                : True

Cannot get the network to display on either server.   Both get this error.

[PS] C:\>get-databaseavailabilitygroupnetwork | fl
A server-side administrative operation has failed. 'GetDagNetworkConfig' failed on the server. Error: The NetworkManage
r has not yet been initialized. Check the event logs to determine the cause. [Server: 1730W436QPS1.calvaryspringfield.o
rg]
    + CategoryInfo          : NotSpecified: (0:Int32) [Get-DatabaseAvailabilityGroupNetwork], DagNetworkRpcServerExcep
   tion
    + FullyQualifiedErrorId : C67769A,Microsoft.Exchange.Management.SystemConfigurationTasks.GetDatabaseAvailabilityGr
   oupNetwork
    + PSComputerName        : 1730w50rxbx1.calvaryspringfield.org


Yes, I used that same URL to set up my network.   HOWEVER, there is another item that showed up (enabled) in the replication network adapter list called MICROSOFT FAILOVER CLUSTER VIRTUAL ADAPTER PERFORMANCE FILTER.   I tried it leaving it enabled and disabling it, but it made no difference.
Went thru and check my network and it is set up just like the URL indicates

Registration To DNS box was cleared when replication network setup.  Remains unchecked.

From a command prompt on each node, I tried to ping the other 10.10.10.xxx network.   Prior to me replacing the switch, I could ping with no problem.  Now the pings time out.
Put back the old switch and tried this whole thing again, yet it still would not ping.
0
 

Author Comment

by:rstuemke
ID: 39262619
Ok....  SreRaj,

1) There is no gateway for subnet 10.0.0.0/8.   There are only two devices with 10.10.10.x addresses, and that is the two nodes on the REPLICATION NETWORK.   The 10.10.10.x network shares the same switches as the 172.16.1.x network, for which there is a gateway at 172.16.1.254.

2) Confirmed all IP addresses are static.   No DHCP.
0
 

Author Comment

by:rstuemke
ID: 39262677
Ok........  iQasmi

1) Network connection order is MAPI, then REPLICATION on both nodes.
    No old LAN connections in the list

2)  Cluster Core Resources - File Share Witness - Online
                                             Name - CTCCDAG - Online
                                             IP Address - 172.16.1.159 - Online


checking provided URLs.
0
Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

 

Author Comment

by:rstuemke
ID: 39262835
Okay, reviewed the articles.     Checked the Cluster Networks.  
Cluster Network 1 - Allow Clients To Connect Thru This Network - CHECKED
Cluster Network 2 - Allow Clients To Connect Thru This Network - NOT CHECKED

Tried changing the setting on Network 2, but even though I changed it, and applied it, it would not remain set.


Some how, the replacement of the switch "disturbed" a perfectly good DAG set up...... bummer
0
 

Author Comment

by:rstuemke
ID: 39262869
Here is some additional information I did not get earlier:

[PS] C:\Windows\system32>get-databaseavailabilitygroupnetwork | fl


RunspaceId         : 1710c4b6-db83-4bff-8a1d-dae65ac421f0
Name               : MAPI Network
Description        :
Subnets            : {{172.16.0.0/16,Unknown}}
Interfaces         : {}
MapiAccessEnabled  : False
ReplicationEnabled : True
IgnoreNetwork      : False
Identity           : CTCCDAG\MAPI Network
IsValid            : True

RunspaceId         : 1710c4b6-db83-4bff-8a1d-dae65ac421f0
Name               : Replication Network
Description        :
Subnets            : {{10.0.0.0/8,Unknown}}
Interfaces         : {}
MapiAccessEnabled  : False
ReplicationEnabled : True
IgnoreNetwork      : False
Identity           : CTCCDAG\Replication Network
IsValid            : True


The MAPIACCESSENABLED : FALSE on the MAPI network is a concern?  Do not know how to change that????
0
 

Author Comment

by:rstuemke
ID: 39263740
UPDATE

Installed Exchange 2010 MBX server on another file server, running Win 2012 Server.
Set up DAG Replication network on separate NIC as 10.10.10.135   (MAPi is 172.16.1.135)
 Exchange installed successfully.  Server working fine at this point.   Could not ping any other 10.10.10.x machine.   Timed out.   So wondering if there is an underlying issue with our network and the Exchange Servers are just presenting the symptoms of the problems.    Any help is greatly appreciated.

Tried to join this MBX server to the existing DAG.  It failed.  Timed out trying to connect to the main Exchange Server:


1730W107048721
Failed

Error:
A database availability group administrative operation failed.
Error: The operation failed. CreateCluster errors may result from incorrectly
configured static addresses.
Error: An error occurred while attempting a cluster operation.
Error: Cluster API '"AddClusterNode() (MaxPercentage=100) failed with 0x5b4.
Error: This operation returned because the timeout period expired"' failed.
[Server: 1730W50RXBX1.calvaryspringfield.org]
0
 
LVL 12

Expert Comment

by:SreRaj
ID: 39280360
I see there is two interfaces on the server and each interface is having IP address which belongs to a different subnet. Are you connecting both of these interfaces to the same switch. If both interfaces are connected to the same switch, does the switch have different vlans added in its configuration for each of these subnets and does the required port mappings are done for these vlans?
0
 

Author Comment

by:rstuemke
ID: 39281888
the different interfaces are connected to the same LAN and sets of switches.   There is no VLAN configured.   The interfaces had previously been working on the orginal DAG a couple of weeks ago.  At that time, I used 10.10.0.0/16 for the replication network and it was plugged in the same switches as the 172.16.1.0/24 network.   It was working fine then.... all members of the DAG up, all networks and network interfaces up and my mailbox replicated on EX02........  my problems started when I replaced a network switch connected to EX02's replication network.   Things have never worked correctly since then.    In fact I have another open question dedicated to that problem, so I won't go into details here.
0
 

Accepted Solution

by:
rstuemke earned 0 total points
ID: 39308515
Finally was able to delete the MBX copies and remove the servers from the DAG.   Worked thru problem little by little.
0
 

Author Closing Comment

by:rstuemke
ID: 39323044
No one was really able to help me on this one.
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Following basic email etiquette rules will help you write a professional email and achieve a good, lasting impression with your contacts.
Scam emails are a huge burden for many businesses. Spotting one is not always easy. Follow our tips to identify if an email you receive is a scam.
The video tutorial explains the basics of the Exchange server Database Availability groups. The components of this video include: 1. Automatic Failover 2. Failover Clustering 3. Active Manager
This video discusses moving either the default database or any database to a new volume.

929 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now