Solved

how to set up exchange 2010 / outlook 2007 fail over to other DAG member

Posted on 2014-03-31
18
410 Views
Last Modified: 2014-07-08
Greetings,

I have a two-member DAG, ex1 and ex2. Both have CAS, MBX, and HUB. I have the AD CAS object pointing to ex1 in DNS (15 minute TTL). I also have a split DNS with autodiscover.domain.com and webmail.domain.com pointing to ex1. No load balancer. Firewall rules all point to ex1 (ports 80, 443, 25, 110, 143). Same exact firewall rules are in place but not enabled for ex2.

VMware host failed. Ex1 moved to other host (vMotion, on EMC SAN). Ex2 was on the failed host's local disks.

Most outlook 2007 clients connected to ex2, which was down (a few in the office, myself included, were connected to ex1). Mobile devices stayed connected since they connect to the webmail server, which is configured on ex1.

Emails sent from mobile devices seemed to have been delivered while only ex1 was still functional. After power cycling the failed VMware host, ex2 powered back on and outlook clients reconnected. Once both servers were back up, emails were routed through ex2, where they stuck in queue. This was not noticed until about 8 PM that evening. I rebooted both ex1 and ex2. New emails began to work but the other emails were stuck in queue still. They expired. Most were resent through outlook the next morning as they were in the user's sent items folder. I do not understand why these emails never made it off the ex2 server to our smart host. Emails usually route through this server without issue (I thought that was due to exchange using a HUB other than itself if available, but not sure about that since it appears most users connect to ex2 in Outlook).

I need help setting up / understanding exactly what I need to do to have outlook reconnect to the functional server, whichever it may be, either over https or TCP/IP. I know I'd need to change the CAS DNS setting to the functional server's IP. But what about autodiscover.domain.com and webmail.domain.com? They point to ex1. Should those point to ex2 if ex1 is down?

What about the firewall rules? Should I enable the rules for ex2 that mirror the rules for ex1? Or will that cause conflicts? Should I only enable the ex2 firewall rules if ex1 is down? I was going to NAT the private IP of ex1 and ex2 to the same public IP for outbound. Right now I have the private IP of ex2 going to a different public IP (and that rule is not enabled, which may explain why emails never made it to the smart host). For inbound I was going to NAT the public IP to ex1, then change to ex2 as needed, unless the rules can all be enabled simultaneously without issue (not sure if inbound can be NAT'd to both servers private IP without conflict or other issues).

What would users need to do with outlook 2007? Close and reopen? Reboot? Go into tools and do a repair?

I really appreciate any help. I have asked similar questions before, and have reviewed them, but unfortunately I am having difficulty putting it all together. I feel as though I am missing some configuration settings, as well as a clear method of executing the manual fail over. To note, I do not mind changing DNS, firewall rules, or anything else manually. Just looking to know exactly what to do.

Thanks again.
0
Comment
Question by:rpliner
  • 11
  • 6
18 Comments
 
LVL 63

Accepted Solution

by:
Simon Butler (Sembee) earned 400 total points
Comment Utility
You really need a load balancer if you want everything automatic.
Otherwise at a minimum you need an RPC CAS Array, then use generic hosts for everything else. Run it with a TTL time of ten minutes, then any changes you make will be effective immediately.

An RPC CAS Array is internal only, existing is DNS only so is very easy to change.

In the meantime, a repair of the Outlook profile should bring the clients online, but I would setup the CAS Array address first, then the repair will switch them to using that address instead.

http://semb.ee/casarray

Simon.
0
 
LVL 7

Author Comment

by:rpliner
Comment Utility
Thanks for replying Simon. I do have a CAS array in place (simply cas.domain.local) and it points to one of the exchange servers with a TTL of 15. I'll make it 10 minutes. I will point this to the other DAG member should one go down. What about webmail.domain.com and autodiscover.domain.com? Would those need to be updated as well when I update the CAS array IP in DNS?

thanks again
0
 
LVL 4

Assisted Solution

by:aa-denver
aa-denver earned 100 total points
Comment Utility
There are a lot of issues here.  To get complete redundancy of the CAS role for Outlook, OWA, ActiveSync, your 2 CAS servers should be configured in a load balanced array.  Microsoft used to recommend using Windows Network Load Balancing WNLB, but they have backed off on that.  The main problems were with big arrays, many people were NLBing 10 servers, etc. and arrays across AD sites and WAN sites.   For 2 servers in the same AD site, I would try WNLB.   The net result is that you have a 3rd IP address, that is shared between the two CAS servers.  This is the address you use for clients internally and also the same IP address you NAT out to the public IP address.

Here are some articles that might help.

http://www.msexchange.org/articles-tutorials/exchange-server-2007/planning-architecture/uncovering-new-rpc-client-access-service-exchange-2010-part3.html

http://social.technet.microsoft.com/Forums/exchange/en-US/8b38148b-6d8d-43f2-9432-c35e7e6a912e/cas-nlb-best-practice

http://www.stevieg.org/2010/11/exchange-team-no-longer-recommend-windows-nlb-for-client-access-server-load-balancing/

So an NLB takes care of the CAS role.  Now you have to make the HUB role redundant.   Here is an article on making the hub smtp role redundant.

http://technet.microsoft.com/en-us/library/ff634392(v=exchg.141).aspx

There is also a newer feature called shadow redundancy.  Here is a reference on that.

http://michaelvh.wordpress.com/2012/06/19/enhancing-hub-transport-resiliency-by-enabling-shadow-redundancy-promotion/
0
 
LVL 7

Author Comment

by:rpliner
Comment Utility
thanks aa-denver. I will look over these links as soon as I can. However, I thought I couldn't use WNLB since I have a DAG set up.

I used to see shadow redundancy in EMC queue viewer on ex2. Now I don't. Another weird thing is that I see on ex1 my smarthost connector in queue viewer and it shows SmartHostConnectorDelivery, but now I also see 4 of my databases and they show MapiDelivery. I have never seen that before the failure last week.
0
 
LVL 7

Author Comment

by:rpliner
Comment Utility
I cannot run WNLB due to DAG. I ran get-transportconfig | fl shadow* on both servers and they both showed it was enabled. Wonder why I don't see the shadow folder / icon in the queue viewer any more. I used to. Now it is populated with the names of 4 of my databases.
0
 
LVL 63

Expert Comment

by:Simon Butler (Sembee)
Comment Utility
What you are seeing is what I would expected to see, that includes not seeing the shadow queue.

Shadow queue is rarely seen - if you were seeing it often then that could have been a sign of a delivery problem.

Cut the TTL time down on all of your Exchange DNS records, both internally and externally. Then if you have to fail over you can change them and everyone is back running very quickly.

With HA, the requirement is not always to have automatic failover, but simply to have a procedure to follow - if this happens, do this.

Simon.
0
 
LVL 7

Author Comment

by:rpliner
Comment Utility
Thx Simon. Now I'm not as concerned about shadow. I did increase the  RAM on both servers when I was rebooting them so maybe that alleviated the need for shadow. Either way, I agree with your last statement and is my goal here. I don't mind manually doing the failover, I just don't want have to think about. I'm trying to document the steps now to better deal with it in the future.

Thx as always.
0
 
LVL 7

Author Comment

by:rpliner
Comment Utility
to be clear, upon the primary server failing, change CAS, webmail, and autodiscover to the working, secondary exchange server. Have users run a repair from within outlook to connect to the working server.

So what do I do with the firewall? Should I enable the rules that mirror the current primary exchange server now or enable them for the secondary server as needed? I was going to NAT the private IPs of each server to the same public IP for outgoing. What then for inbound mail? It goes from our smarthost to us.

thanks again
0
 
LVL 63

Expert Comment

by:Simon Butler (Sembee)
Comment Utility
You shouldn't need to repair the Outlook profile. If you have changed the DNS to the second server that is all that you need to do.
The only reason to repair the Outlook profile would be because they haven't picked up the CAS array.

For email, that is easy. You can have email delivered to any server in the platform. Therefore just give each server its own address, a single Send Connector to go to your smart host and have both servers listed as the destination. If one server goes down, email will flow.

Simon.
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 7

Author Comment

by:rpliner
Comment Utility
Thanks Simon. I already have a send connector on each server going to the smart host. I will check it to make certain both servers are listed. However, couldn't I just NAT both exchange servers to the same public IP (which is what is configured on the smart host to accept mail from us) rather than using a different public IP for each server?

I have no idea what they are picking up. Based on the failure, about 30% of the users in the office were, and stayed, connected to the primary server (which was moved through vMotion when the host failed). Everyone else was on the failed exchange server and didn't reconnect until the host, and thus the secondary exchange server, was back up. I didn't change DNS during this though so I can't say for sure that they wouldn't have reconnected if I had. When I manually type in cas.domain.local in outlook account settings, it redirects and auto fills the name of the primary exchange server (the one which the CAS array points to in DNS). I thought users would need to do a repair after I changed DNS to point to the secondary server, or do a repair to switch to the primary if the user was connected to the secondary (which happens to be the one that failed the other day).

Also, still confused about the firewall rules and whether I should enable them now or as needed for the secondary server.
0
 
LVL 63

Expert Comment

by:Simon Butler (Sembee)
Comment Utility
I have seen odd things happen when you try to NAT two servers to the same IP address, so it is never something that I want to do. As SMTP is both inbound and outbound then it can cause problems.
I don't think NAT has any kind of availability mechanism, so how are you going to get inbound email to work? Inbound can only go to one server.

Your best option for a reliable system is two external IP addresses and handle the two servers as two separate systems.

As for the CAS Array issue - I would check the properties of every mailbox database to ensure the correct RPC Server is configured. It should be your CAS Array address. That is the only reason I can think that it would be returning the incorrect information via Autodiscover.

Simon.
0
 
LVL 7

Author Comment

by:rpliner
Comment Utility
excellent. thanks Simon. All the current, disabled firewall rules for the secondary server are set up with a different public IP, so I will leave it that way instead of NATing one.

One thing with this though - if I use two IPs, one to each server, do I enable the firewall for the secondary server now or upon the primary failing? Not sure if the secondary server (which is in a DAG and holds copies of databases) can relay inbound email to the primary server.

I will also check each database for the RPC server and point it to the CAS array if it isn't already. I may not do anything until this weekend (in case I screw up something). Will post back.

Thanks again.
0
 
LVL 7

Author Comment

by:rpliner
Comment Utility
RPC client access server returns the secondary server for four databases, which is where they are mounted. One database returns the primary.

should I run the set-mailboxdatabase "database name" -rpcclientaccessserver "CAS array DNS name" on each database? What about placing "webmail.domain.com" for the -rpcclientaccessserver instead of the CAS array?

thanks for your help
0
 
LVL 63

Expert Comment

by:Simon Butler (Sembee)
Comment Utility
The RPC CAS array address should not be used for anything else and should not resolve externally at all. Otherwise it confuses Outlook.

You need to run that command on every database for the clients to use the CAS array.

Simon.
0
 
LVL 7

Author Comment

by:rpliner
Comment Utility
Thx Simon. I will do that. Should they then run a repair to update outlook? I'm going to do this over the weekend but would like to let everyone know for when they come back Monday morning. I think I'll need to update the prf file on the remote server as well.

Still need to figure out firewall but will ask another question about that.
0
 
LVL 63

Expert Comment

by:Simon Butler (Sembee)
Comment Utility
Once you have set the CAS Array, a repair will be required to use it.
Clients will continue to work though, so you can set it and do some testing before telling everyone to do the change.

Simon.
0
 
LVL 7

Author Comment

by:rpliner
Comment Utility
Great. Thanks again for all the help.
0
 
LVL 7

Author Comment

by:rpliner
Comment Utility
had a SAN issue that I had to deal with so just getting back to this. I think I figured out the steps.

I have already set internal DNS TTL to 10 minutes. I am going to change the rpcclientaccessserver to the CAS array.

-MX records point to spam soap - leave alone

-External DNS points webmail to public IP - leave alone

-Internal DNS
-- point CAS to operational server
-- point webmail to operational server
-- point autodiscover to operational server

-Firewall
-- change rules (NAT private to public & public to private changed to reflect operational server's private IP)
-- change OWA, which has its own rule, to operational server's private IP
-- leave public IP as is
-- leave services as they are


I believe this should do it. MX stays the same since it points to Spam Soap. External DNS points webmail.domain.com to public IP which is configured on firewall, so leave that alone. Change internal DNS for CAS, webmail, and autodiscover to operational server. Go to firewall and change Exchange server NATs to private IP of operational server, leave public IP as is. Change OWA rule's private IP to that of the operational server, leave public as is.

If I do this, spam soap will recognize the public IP it is receiving email from for us as it is already configured. OWA will have the same public IP in external DNS so once the firewall NAT is reconfigured with the operational server's private IP, mobile devices will connect (internal DNS will have already been changed to the operational server).

The only other thing I can think of is correctly configuring OWA on the secondary server. Right now webmail.domain.com is only configured on the internal URL under Server Config>Client Access>OWA (default website)>properties>general tab, not external as is the primary server. So I think I need to enter webmail.domain.com into the external URL field. I see the wildcard cert on the secondary server, but no services have been assigned so I will need to mirror the services configured on the primary server. Once that is done, I believe webmail.domain.com will resolve to the secondary server and no cert issues will arise.

I realize the last paragraphs may be a bit convoluted but please let me know if this looks correct.

Thanks again
0

Featured Post

Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

Join & Write a Comment

We are happy to announce a brand new addition to our line of acclaimed email signature management products – CodeTwo Email Signatures for Office 365.
HOW TO: Connect to the VMware vSphere Hypervisor 6.5 (ESXi 6.5) using the vSphere (HTML5 Web) Host Client 6.5, and perform a simple configuration task of adding a new VMFS 6 datastore.
This Micro Tutorial walks you through using a remote console to access a server and install ESXi 5.1. This example is showing remote access and installation using a Dell server. The hypervisor is the very first component of your virtual infrastructu…
how to add IIS SMTP to handle application/Scanner relays into office 365.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now