Outlook stop connecting after putting exchange server behind hardware load balancer.

Posted on 2013-06-25
Last Modified: 2013-07-09
we are running exchange 2010 SP2 rollup 6 and hardware load balancer from F5.
we have 1 site which contains 4 exchagnge servers and each is HUB+CAS+MBX. F5 running as load balancer/CAS array which distribute the connections to CAS servers in round robin fashion.
Yesterday, we have an issue where multiple users (not all) start reporting that their outlook keep stuck on "trying to connect" we have check all 4 CAS server and 1 server was not having any active connection with it, we asked our network team to remove that server from load balancing pool and immediately after that everythign start working.
That CAS server was containing few databases which was working mounted and healthy during that time.
After 24 hours of smooth operation we again put that server in POOL and same thing start happening.

Need your expert comments and advice on this.

NOTE: I have'nt explained EDGE as i think its not related to this issue.
Question by:pdixit1977
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5

Expert Comment

ID: 39276534

Sorry if the questions or remarks seem a bit basic, but im assuming some things since i do not (and cannot) have all the details.

I think the clients somehow have a direct redirection towards the single server.
Once you put it behind the loadbalancer it seems unavailable to them.
Could you tell me if you are using the loadbalancer's dns name in the clients' outlook profile?
Ie. cas.yourdomain.local ? In stead of yourserver.yourdomain.local?

Second, is the loadbalancer passing through traffic to this server when its behind the loadbalancer ? (is it also configured in the loadbalancer).

Third, I suppose you have tried rebooting computers when the server is put behind the loadbalancer. I have experienced in a smaller environment with 2 exchange 2010 servers in a dag behind a kemp loadbalancer that when we had some issues on one server and switched that we needed reboots of clients. Perhaps this is also the case in your scenario. Moving them behind the loadbalancer might need some reconnecting (by rebooting or logging of and on for the users).. But havent experienced this often.

Author Comment

ID: 39277886
Yes, we are using DNS name ( cas.yourdomain.local ) of our loadbalancer in clients outlook profile.
Second, there is a pool created in load balancer and all of our 4 CAS servers IP mentioned in that pool.

And we are using CISCO load balancer, not F5.

Expert Comment

ID: 39278074
if you can eliminate the load balancer then...
it seems the clients somehow keep connecting directly to the specified server in stead of the cas dns . ive had some issues with clients (although their profiles had been changed to connect to the cas dns/load balancer) .
after inspection we noticed direct connections from some clients
the problem was a regkey still retaining the hostname of one of the servers
by changing the regkey those clients did not have a problem anymore and connected through the loadbalancer.

The key in question is this one.. could you check subkey values on a client who's had issues ?

HKEY_CURRENT_USER\Software\Microsoft\Windows NT\CurrentVersion\Windows Messaging Subsystem\Profiles\1\13dbb0c8aa05101a9bb000aa002fc45a
MIM Survival Guide for Service Desk Managers

Major incidents can send mastered service desk processes into disorder. Systems and tools produce the data needed to resolve these incidents, but your challenge is getting that information to the right people fast. Check out the Survival Guide and begin bringing order to chaos.


Author Comment

ID: 39279119
i think we are getting excurse from main issue..

My issue is : 4 CAS servers runnng behind load balancer. 1000s of users start reporting issue that their outlook showing disconnected/trying to connect. we found one CAS with 1-2 active connections, when we remove that server from load balancer everything start working fine. we run this server seperately by putting HOST entry in few affected users outlook (load balancer DNS name to IP address of this CAS server) everything works fine but issue reoccur once we move it back behind load balancer.

There is no suspected logs, events on this server so what we should check becuase something is wrong with this server only as other 3 CAS are working fine behind load balancer and load balancer's health and config already get verified with vendor.

Author Comment

ID: 39279144
As far as changes are concerned, only 1 change was done in infrastructure which was rollup 6 installation just a day before this issue however other servers are also having same rollup update and running absolutely fine.

Expert Comment

ID: 39279349
could you look at the number of connections made towards that one server ie. with tcpview (sysinternals) ? this way you can see if the issue is being caused by it being available without the load balancer or the issue is some kind of conflict between the 4 cas servers behind the load balancer and putting the 5th also in the same position.
what im trying to say is .. when the server is at its current place (not behind the load balancer) do the 1000's of connections go towards the 4 cas servers or do the connections go towards that server (not over the load balancer)..

another question.. does the server (not behind loadbalancer) have any mounted critical db's ?
i suppose you also have a dag ?

Author Comment

ID: 39279837
as of now all connections will go on 4 servers running behind load balancer because all clients looks to mycas.mydomain.local which is the DNS name of load balancer and only those users in which we have made custom host file entry are coming on this server.
Yes, this server is catering databases which are running all the time. No matter it is behind OR infront of load balancer.

Expert Comment

ID: 39280922
has your network team encountered any errors on the network when you put 'the one' behind the load balancer? (clients or servers trying and failing to connect)
could you give some more information on the load balancer config regarding the cas virtual services?

Author Comment

ID: 39291710
No, there is no alerts on network stack, they have got confirmation from CISCO.

Our conclusioin till now is, our CAS is somehow not accepting more than 5-10 RPC connection requests. however it works fine with its databases because in that case RPC connections going to other CAS servers and those CAS servers connecting it with SMTP/Other protocals.

As far as changes is concern, only rollup 6 was installed.
LVL 81

Expert Comment

by:David Johnson, CD, MVP
ID: 39295211
Let me see if I've got things straight

Load Balancer (cas.yourdomain.local  -> round robins -> Exchange1
                                                                                              -> Exchange2
                                                                                              -> Exchange3

This works fine but if
Breaks everything..

Databases are connected via shared storage pool that all exchange servers access ?

Is this supposition correct

Author Comment

ID: 39299698
your supposition is absolutely correct. I dont know if this is connected to it or not but This is happening since last week just after installation of rollup 6 on all exchange servers.

Accepted Solution

zackmccracken earned 250 total points
ID: 39300886
pdixit, im at a loss. sorry for not being able to help you any further.

Author Closing Comment

ID: 39311081
however my issue is not resolved but i appreciate coninued help on this. I raised the same on some other portals but no luck....thanks

Featured Post

Ransomware: The New Cyber Threat & How to Stop It

This infographic explains ransomware, type of malware that blocks access to your files or your systems and holds them hostage until a ransom is paid. It also examines the different types of ransomware and explains what you can do to thwart this sinister online threat.  

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A hard and fast method for reducing Active Directory Administrators members.
A 2007 NCSA Cyber Security survey revealed that a mere 4% of the population has a full understanding of firewalls. As business owner, you should be part of that 4% that has a full understanding.
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…
Monitoring a network: how to monitor network services and why? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the philosophy behind service monitoring and why a handshake validation is critical in network monitoring. Software utilized …

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question