Solved

Exchange 2010 DAG Node Down And Cannot Get It Back Up

Posted on 2013-06-19
13
2,796 Views
Last Modified: 2013-07-13
Hello,

I am an extremely unhappy camper right now.  Last week I set up a DAG for Exchange 2010 SP3 running on Win 2012 Server.   Worked fine.   Set up like so:

ex01 - 172.16.1.141 - mapi network
            10.10.10.141 - replication network
MBX, CAS, HUB

ex02 - 172.16.1.66 - mapi network
            10.10.10.66 - replication network
MBX, CAS, HUB

srvr01 - 172.16.1.217 - File Witness Server
No Exchange loaded

exo1 mbx database replicated to ex02
ex02 mbx database replicated to ex01

was working fine.   tested switchover, worked fine.  booted ex01 and failed over to ex02 with no problem.    then all put back to normal.

Today I replaced a network switch which ex01 10.10.10.141 was connected.
Just powered off, replaced it and plugged the LAN back into it.

ex01 mbx database successfully failed over to ex02.  Running fine.   HOWEVER,
in EMC, there are now no network interfaces, where before they were there.  The MAPI Network and Replication network subnets status are both unknown.   In the Failover Management Console, Node ex01 is DOWN.    When I validate the cluster, it says the connections exist between the nodes, but it cannot ping ex01 to ex02 or ex02 to ex01 on the replication (10.0.0.0/8) subnet.    

I have done everything I can think up to get this network connection working.... booting, powering off, uninstalling the NIC and reconfiguring it.   Both ex servers show node ex01 as down and no NIC info for the interfaces.    Need some help.  Please advise.    Thanks.
0
Comment
Question by:rstuemke
13 Comments
 
LVL 4

Expert Comment

by:Alexander Kireev
Comment Utility
Hello,

Could you send an answer of cmdlet "Get-DatabaseAvailabilityGroupNetwork | fl"?

Did you follow instructions about network configuration? Article - Table 1.
https://www.simple-talk.com/sysadmin/exchange/exchange-2010-dag-creation-and-configuration-part-1/

Replication network must have clear check box "Register this connection’s addresses in DNS".
0
 
LVL 12

Expert Comment

by:SreRaj
Comment Utility
Hi,

From Ex01, try to ping to the gateway IP Address for subnet 10.0.0.0/8. Verify the connectivity is there. Also verify after replacing the switch, gateway for 10.0.0.0/8 is still connected to the switch.

Also, could you please confirm all the IP Addresses you have mentioned earlier is statically configured and switch is not using any DHCP Server for IP Allocation.
0
 
LVL 8

Expert Comment

by:I Qasmi
Comment Utility
You need to check for the preferred network connections on each server.
Chances are there there might be the old one or disconnected one set as preferred LAN connection for network access can cause the failure.

Cross check and verify that the NIC you have installed has been set on top most priority

Also open Cluster Failover manager and toggle to Cluster core resources  and check
whether all the Cluster core resources under the network are up ,

If not then try bringing the resource online by right click > Bring online and check

check this also

http://blogs.technet.com/b/timmcmic/archive/2010/05/12/cluster-core-resources-fail-to-come-online-on-some-exchange-2010-database-availability-group-dag-nodes.aspx

http://workinghardinit.wordpress.com/2010/06/18/exchange-2010-dag-issue-cluster-ip-address-resource-cluster-ip-address-cannot-be-brought-online/
0
 

Author Comment

by:rstuemke
Comment Utility
Thanks for all the responses.    I will answer each one in a separate post.

chestor02 -

showing DAG first  1730W436 is the bad boy.

[PS] C:\>get-databaseavailabilitygroup | fl


RunspaceId                             : e44896bc-ca9d-4895-b901-12d593239662
Name                                   : CTCCDAG
Servers                                : {1730W436QPS1, 1730W50RXBX1}
WitnessServer                          : 1730wc4xmth1.calvaryspringfield.org
WitnessDirectory                       : C:\CTCCDAG Witness Directory
AlternateWitnessServer                 : 1730wcl2hyh1.calvaryspringfield.org
AlternateWitnessDirectory              : C:\CTCCDAG Witness Directory
NetworkCompression                     : InterSubnetOnly
NetworkEncryption                      : InterSubnetOnly
DatacenterActivationMode               : Off
StoppedMailboxServers                  : {}
StartedMailboxServers                  : {}
DatabaseAvailabilityGroupIpv4Addresses : {172.16.1.159}
DatabaseAvailabilityGroupIpAddresses   : {172.16.1.159}
AllowCrossSiteRpcClientAccess          : False
OperationalServers                     :
PrimaryActiveManager                   :
ServersInMaintenance                   :
ServersInDeferredRecovery              :
ThirdPartyReplication                  : Disabled
ReplicationPort                        : 0
NetworkNames                           : {}
WitnessShareInUse                      :
AdminDisplayName                       :
ExchangeVersion                        : 0.10 (14.0.100.0)
DistinguishedName                      : CN=CTCCDAG,CN=Database Availability Groups,CN=Exchange Administrative Group (F
                                         YDIBOHF23SPDLT),CN=Administrative Groups,CN=Calvary Temple,CN=Microsoft Exchan
                                         ge,CN=Services,CN=Configuration,DC=calvaryspringfield,DC=org
Identity                               : CTCCDAG
Guid                                   : 9e311866-c047-44c1-bf11-814ead816c9f
ObjectCategory                         : calvaryspringfield.org/Configuration/Schema/ms-Exch-MDB-Availability-Group
ObjectClass                            : {top, msExchMDBAvailabilityGroup}
WhenChanged                            : 6/19/2013 1:34:56 PM
WhenCreated                            : 6/13/2013 9:24:21 AM
WhenChangedUTC                         : 6/19/2013 6:34:56 PM
WhenCreatedUTC                         : 6/13/2013 2:24:21 PM
OrganizationId                         :
OriginatingServer                      : 1730W6FZNQW1.calvaryspringfield.org
IsValid                                : True

Cannot get the network to display on either server.   Both get this error.

[PS] C:\>get-databaseavailabilitygroupnetwork | fl
A server-side administrative operation has failed. 'GetDagNetworkConfig' failed on the server. Error: The NetworkManage
r has not yet been initialized. Check the event logs to determine the cause. [Server: 1730W436QPS1.calvaryspringfield.o
rg]
    + CategoryInfo          : NotSpecified: (0:Int32) [Get-DatabaseAvailabilityGroupNetwork], DagNetworkRpcServerExcep
   tion
    + FullyQualifiedErrorId : C67769A,Microsoft.Exchange.Management.SystemConfigurationTasks.GetDatabaseAvailabilityGr
   oupNetwork
    + PSComputerName        : 1730w50rxbx1.calvaryspringfield.org


Yes, I used that same URL to set up my network.   HOWEVER, there is another item that showed up (enabled) in the replication network adapter list called MICROSOFT FAILOVER CLUSTER VIRTUAL ADAPTER PERFORMANCE FILTER.   I tried it leaving it enabled and disabling it, but it made no difference.
Went thru and check my network and it is set up just like the URL indicates

Registration To DNS box was cleared when replication network setup.  Remains unchecked.

From a command prompt on each node, I tried to ping the other 10.10.10.xxx network.   Prior to me replacing the switch, I could ping with no problem.  Now the pings time out.
Put back the old switch and tried this whole thing again, yet it still would not ping.
0
 

Author Comment

by:rstuemke
Comment Utility
Ok....  SreRaj,

1) There is no gateway for subnet 10.0.0.0/8.   There are only two devices with 10.10.10.x addresses, and that is the two nodes on the REPLICATION NETWORK.   The 10.10.10.x network shares the same switches as the 172.16.1.x network, for which there is a gateway at 172.16.1.254.

2) Confirmed all IP addresses are static.   No DHCP.
0
 

Author Comment

by:rstuemke
Comment Utility
Ok........  iQasmi

1) Network connection order is MAPI, then REPLICATION on both nodes.
    No old LAN connections in the list

2)  Cluster Core Resources - File Share Witness - Online
                                             Name - CTCCDAG - Online
                                             IP Address - 172.16.1.159 - Online


checking provided URLs.
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 

Author Comment

by:rstuemke
Comment Utility
Okay, reviewed the articles.     Checked the Cluster Networks.  
Cluster Network 1 - Allow Clients To Connect Thru This Network - CHECKED
Cluster Network 2 - Allow Clients To Connect Thru This Network - NOT CHECKED

Tried changing the setting on Network 2, but even though I changed it, and applied it, it would not remain set.


Some how, the replacement of the switch "disturbed" a perfectly good DAG set up...... bummer
0
 

Author Comment

by:rstuemke
Comment Utility
Here is some additional information I did not get earlier:

[PS] C:\Windows\system32>get-databaseavailabilitygroupnetwork | fl


RunspaceId         : 1710c4b6-db83-4bff-8a1d-dae65ac421f0
Name               : MAPI Network
Description        :
Subnets            : {{172.16.0.0/16,Unknown}}
Interfaces         : {}
MapiAccessEnabled  : False
ReplicationEnabled : True
IgnoreNetwork      : False
Identity           : CTCCDAG\MAPI Network
IsValid            : True

RunspaceId         : 1710c4b6-db83-4bff-8a1d-dae65ac421f0
Name               : Replication Network
Description        :
Subnets            : {{10.0.0.0/8,Unknown}}
Interfaces         : {}
MapiAccessEnabled  : False
ReplicationEnabled : True
IgnoreNetwork      : False
Identity           : CTCCDAG\Replication Network
IsValid            : True


The MAPIACCESSENABLED : FALSE on the MAPI network is a concern?  Do not know how to change that????
0
 

Author Comment

by:rstuemke
Comment Utility
UPDATE

Installed Exchange 2010 MBX server on another file server, running Win 2012 Server.
Set up DAG Replication network on separate NIC as 10.10.10.135   (MAPi is 172.16.1.135)
 Exchange installed successfully.  Server working fine at this point.   Could not ping any other 10.10.10.x machine.   Timed out.   So wondering if there is an underlying issue with our network and the Exchange Servers are just presenting the symptoms of the problems.    Any help is greatly appreciated.

Tried to join this MBX server to the existing DAG.  It failed.  Timed out trying to connect to the main Exchange Server:


1730W107048721
Failed

Error:
A database availability group administrative operation failed.
Error: The operation failed. CreateCluster errors may result from incorrectly
configured static addresses.
Error: An error occurred while attempting a cluster operation.
Error: Cluster API '"AddClusterNode() (MaxPercentage=100) failed with 0x5b4.
Error: This operation returned because the timeout period expired"' failed.
[Server: 1730W50RXBX1.calvaryspringfield.org]
0
 
LVL 12

Expert Comment

by:SreRaj
Comment Utility
I see there is two interfaces on the server and each interface is having IP address which belongs to a different subnet. Are you connecting both of these interfaces to the same switch. If both interfaces are connected to the same switch, does the switch have different vlans added in its configuration for each of these subnets and does the required port mappings are done for these vlans?
0
 

Author Comment

by:rstuemke
Comment Utility
the different interfaces are connected to the same LAN and sets of switches.   There is no VLAN configured.   The interfaces had previously been working on the orginal DAG a couple of weeks ago.  At that time, I used 10.10.0.0/16 for the replication network and it was plugged in the same switches as the 172.16.1.0/24 network.   It was working fine then.... all members of the DAG up, all networks and network interfaces up and my mailbox replicated on EX02........  my problems started when I replaced a network switch connected to EX02's replication network.   Things have never worked correctly since then.    In fact I have another open question dedicated to that problem, so I won't go into details here.
0
 

Accepted Solution

by:
rstuemke earned 0 total points
Comment Utility
Finally was able to delete the MBX copies and remove the servers from the DAG.   Worked thru problem little by little.
0
 

Author Closing Comment

by:rstuemke
Comment Utility
No one was really able to help me on this one.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Follow this checklist to learn more about the 15 things you should never include in an email signature from personal quotes, animated gifs and out-of-date marketing content.
This process describes the steps required to Import and Export data from and to .pst files using Exchange 2010. We can use these steps to export data from a user to a .pst file, import data back to the same or a different user, or even import data t…
In this video we show how to create an email address policy in Exchange 2013. We show this process by using the Exchange Admin Center. Log into Exchange Admin Center.:  First we need to log into the Exchange Admin Center. Navigate to the Mail Flow…
To show how to generate a certificate request in Exchange 2013. We show this process by using the Exchange Admin Center. Log into Exchange Admin Center.:  First we need to log into the Exchange Admin Center. Navigate to the Servers >> Certificates…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now