I have a two-member DAG, ex1 and ex2. Both have CAS, MBX, and HUB. I have the AD CAS object pointing to ex1 in DNS (15 minute TTL). I also have a split DNS with autodiscover.domain.com and webmail.domain.com pointing to ex1. No load balancer. Firewall rules all point to ex1 (ports 80, 443, 25, 110, 143). Same exact firewall rules are in place but not enabled for ex2.
VMware host failed. Ex1 moved to other host (vMotion, on EMC SAN). Ex2 was on the failed host's local disks.
Most outlook 2007 clients connected to ex2, which was down (a few in the office, myself included, were connected to ex1). Mobile devices stayed connected since they connect to the webmail server, which is configured on ex1.
Emails sent from mobile devices seemed to have been delivered while only ex1 was still functional. After power cycling the failed VMware host, ex2 powered back on and outlook clients reconnected. Once both servers were back up, emails were routed through ex2, where they stuck in queue. This was not noticed until about 8 PM that evening. I rebooted both ex1 and ex2. New emails began to work but the other emails were stuck in queue still. They expired. Most were resent through outlook the next morning as they were in the user's sent items folder. I do not understand why these emails never made it off the ex2 server to our smart host. Emails usually route through this server without issue (I thought that was due to exchange using a HUB other than itself if available, but not sure about that since it appears most users connect to ex2 in Outlook).
I need help setting up / understanding exactly what I need to do to have outlook reconnect to the functional server, whichever it may be, either over https or TCP/IP. I know I'd need to change the CAS DNS setting to the functional server's IP. But what about autodiscover.domain.com and webmail.domain.com? They point to ex1. Should those point to ex2 if ex1 is down?
What about the firewall rules? Should I enable the rules for ex2 that mirror the rules for ex1? Or will that cause conflicts? Should I only enable the ex2 firewall rules if ex1 is down? I was going to NAT the private IP of ex1 and ex2 to the same public IP for outbound. Right now I have the private IP of ex2 going to a different public IP (and that rule is not enabled, which may explain why emails never made it to the smart host). For inbound I was going to NAT the public IP to ex1, then change to ex2 as needed, unless the rules can all be enabled simultaneously without issue (not sure if inbound can be NAT'd to both servers private IP without conflict or other issues).
What would users need to do with outlook 2007? Close and reopen? Reboot? Go into tools and do a repair?
I really appreciate any help. I have asked similar questions before, and have reviewed them, but unfortunately I am having difficulty putting it all together. I feel as though I am missing some configuration settings, as well as a clear method of executing the manual fail over. To note, I do not mind changing DNS, firewall rules, or anything else manually. Just looking to know exactly what to do.