Lost client connectivity in Exchange 2013

We have updated to Exchange 2013 CU6 w/ coexistance with 2007. We've migrated all resouces and settings to exchange 2013 and have shutdown the 2007 environment (3 weeks ago) with no negative impact.

An issue we've been dealing with for about 2.5 months is client connectivity. Randomly all outlook users will disconnect and email will no longer be delivered to mobile devices. There are no log entries leading up to the failure or during.

To resolve the issue I do the following:
Reboot both CAS servers
Make the active node passive and the passive node active
Client connectivity restored

**Note: After talking with multiple VARs and collegues I now know that NLB clusters are not recommended by Microsoft and we should be using hardware for load balancing. We're in the process of purchasing a hardware load balancer, but we're 2 month from having one in house.

Environment:

Front End- 2 virtual Windows server 2012 R2 on VMware 5.1. Each have 2 cpu and 8gb ram. Running Exchange server 2013 cu6 Standard in A active/passive NLB configuration (unicast).

Back End- 2 virtual Windows server 2012 R2 on VMware 5.1. Each 12 cpu and 32gb ram. Running Exchange server 2013 cu6 Standard in a DAG.
Questions:
1. Has anyone seen this behavior before?
2. Any suggestions on tools/troubleshooting?
3. Recommendations?

We've used the following VMware links to configure and QC server settings:

3) VMware: Articles and Blog

• Microsoft Network Load Balancing Multicast and Unicast operation modes (1006580)
http://kb.vmware.com/selfservice/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=1006580

• Sample Configuration - Network Load Balancing (NLB) unicast mode configuration (1006778)
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006778
Comments:
If your Windows Network Load Balancing is configure with unicast communication mode, review the section on RARP packets in this article .
The transmission of RARP packets is prevented on the portgroup / virtual switch as explained in the later part of the article.
Microsoft NLB not working properly in Unicast Mode (1556)
http://kb.vmware.com/selfservice/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=1556
Check out this blog:
Back To Basics: Configuring Standard vSwitch (Part Two of Three)
http://blogs.vmware.com/smb/2014/04/back-basics-configuring-standard-vswitch-part-two-three.html
Scroll to the “Notify Switches” section where you will see:
Microsoft NLB software when configured in a unicast mode is incompatible with Notify Switches set to Yes.

• Sample Configuration - Network Load Balancing (NLB) Multicast Mode Configuration (1006558)
http://kb.vmware.com/selfservice/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=1006558
MwaddamsAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
1. Has anyone seen this behavior before?

More times, that I've had hot dinners, and regular question on EE!

2. Any suggestions on tools/troubleshooting?

Switch to multicast.

3. Recommendations?

firstly to pick holes, I would not use NLB in unicast mode, switch to Multicast, this is what is recommended in a VMware environment.

and when you have switched to Multicast, it is important to follow the documents you have referenced to statically allocate ARP MAC and IP addresses in your routers, and every physical port and uplink you would expect multicast traffic to appear on needs to be in the config!

OF ALL the NLB configurations we visit on a monthly basis, this is the reason for failure, someone does not configure the physical switches!

If you cannot do the above, abandoned NLB, and wait for your hardware load balancer, or you could try...

http://www.zenloadbalancer.com/

which is a virtual appliance!
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
MwaddamsAuthor Commented:
Andrew,

Thank you for taking the time to respond to my question.
We've decided to abandon NLB and do the following:

Build a new standalone CAS server
Point the firewall to the new server
Shutdown CAS Array
Purchase and install a hardware load balancer
Stand up additional CAS servers as necessary after load balancer installed.

By simply removing NLB from the environment do you believe all of our random client connectivity outages will go away?
0
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Yes, otherwise you've got another issue!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VMware

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.