Solved

Outlook stop connecting after putting exchange server behind hardware load balancer.

Posted on 2013-06-25
15
1,068 Views
Last Modified: 2013-07-09
Hi,
we are running exchange 2010 SP2 rollup 6 and hardware load balancer from F5.
we have 1 site which contains 4 exchagnge servers and each is HUB+CAS+MBX. F5 running as load balancer/CAS array which distribute the connections to CAS servers in round robin fashion.
Yesterday, we have an issue where multiple users (not all) start reporting that their outlook keep stuck on "trying to connect" we have check all 4 CAS server and 1 server was not having any active connection with it, we asked our network team to remove that server from load balancing pool and immediately after that everythign start working.
That CAS server was containing few databases which was working mounted and healthy during that time.
After 24 hours of smooth operation we again put that server in POOL and same thing start happening.

Need your expert comments and advice on this.


NOTE: I have'nt explained EDGE as i think its not related to this issue.
0
Comment
Question by:pdixit1977
  • 7
  • 5
15 Comments
 
LVL 3

Expert Comment

by:zackmccracken
Comment Utility
Hi..

Sorry if the questions or remarks seem a bit basic, but im assuming some things since i do not (and cannot) have all the details.

I think the clients somehow have a direct redirection towards the single server.
Once you put it behind the loadbalancer it seems unavailable to them.
Could you tell me if you are using the loadbalancer's dns name in the clients' outlook profile?
Ie. cas.yourdomain.local ? In stead of yourserver.yourdomain.local?

Second, is the loadbalancer passing through traffic to this server when its behind the loadbalancer ? (is it also configured in the loadbalancer).

Third, I suppose you have tried rebooting computers when the server is put behind the loadbalancer. I have experienced in a smaller environment with 2 exchange 2010 servers in a dag behind a kemp loadbalancer that when we had some issues on one server and switched that we needed reboots of clients. Perhaps this is also the case in your scenario. Moving them behind the loadbalancer might need some reconnecting (by rebooting or logging of and on for the users).. But havent experienced this often.
0
 

Author Comment

by:pdixit1977
Comment Utility
zackmccracken:
Yes, we are using DNS name ( cas.yourdomain.local ) of our loadbalancer in clients outlook profile.
Second, there is a pool created in load balancer and all of our 4 CAS servers IP mentioned in that pool.

And we are using CISCO load balancer, not F5.
0
 
LVL 3

Expert Comment

by:zackmccracken
Comment Utility
if you can eliminate the load balancer then...
it seems the clients somehow keep connecting directly to the specified server in stead of the cas dns . ive had some issues with clients (although their profiles had been changed to connect to the cas dns/load balancer) .
after inspection we noticed direct connections from some clients
the problem was a regkey still retaining the hostname of one of the servers
by changing the regkey those clients did not have a problem anymore and connected through the loadbalancer.

The key in question is this one.. could you check subkey values on a client who's had issues ?

HKEY_CURRENT_USER\Software\Microsoft\Windows NT\CurrentVersion\Windows Messaging Subsystem\Profiles\1\13dbb0c8aa05101a9bb000aa002fc45a
0
 

Author Comment

by:pdixit1977
Comment Utility
i think we are getting excurse from main issue..

My issue is : 4 CAS servers runnng behind load balancer. 1000s of users start reporting issue that their outlook showing disconnected/trying to connect. we found one CAS with 1-2 active connections, when we remove that server from load balancer everything start working fine. we run this server seperately by putting HOST entry in few affected users outlook (load balancer DNS name to IP address of this CAS server) everything works fine but issue reoccur once we move it back behind load balancer.

There is no suspected logs, events on this server so what we should check becuase something is wrong with this server only as other 3 CAS are working fine behind load balancer and load balancer's health and config already get verified with vendor.
0
 

Author Comment

by:pdixit1977
Comment Utility
As far as changes are concerned, only 1 change was done in infrastructure which was rollup 6 installation just a day before this issue however other servers are also having same rollup update and running absolutely fine.
0
 
LVL 3

Expert Comment

by:zackmccracken
Comment Utility
could you look at the number of connections made towards that one server ie. with tcpview (sysinternals) ? this way you can see if the issue is being caused by it being available without the load balancer or the issue is some kind of conflict between the 4 cas servers behind the load balancer and putting the 5th also in the same position.
what im trying to say is .. when the server is at its current place (not behind the load balancer) do the 1000's of connections go towards the 4 cas servers or do the connections go towards that server (not over the load balancer)..

another question.. does the server (not behind loadbalancer) have any mounted critical db's ?
i suppose you also have a dag ?
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:pdixit1977
Comment Utility
as of now all connections will go on 4 servers running behind load balancer because all clients looks to mycas.mydomain.local which is the DNS name of load balancer and only those users in which we have made custom host file entry are coming on this server.
Yes, this server is catering databases which are running all the time. No matter it is behind OR infront of load balancer.
0
 
LVL 3

Expert Comment

by:zackmccracken
Comment Utility
has your network team encountered any errors on the network when you put 'the one' behind the load balancer? (clients or servers trying and failing to connect)
could you give some more information on the load balancer config regarding the cas virtual services?
0
 

Author Comment

by:pdixit1977
Comment Utility
No, there is no alerts on network stack, they have got confirmation from CISCO.

Our conclusioin till now is, our CAS is somehow not accepting more than 5-10 RPC connection requests. however it works fine with its databases because in that case RPC connections going to other CAS servers and those CAS servers connecting it with SMTP/Other protocals.

As far as changes is concern, only rollup 6 was installed.
0
 
LVL 78

Expert Comment

by:David Johnson, CD, MVP
Comment Utility
Let me see if I've got things straight

Load Balancer (cas.yourdomain.local  -> round robins -> Exchange1
                                                                                              -> Exchange2
                                                                                              -> Exchange3

This works fine but if
->Exchange1
->Exchange2
->Exchange3
->Exchange4
Breaks everything..

Databases are connected via shared storage pool that all exchange servers access ?

Is this supposition correct
0
 

Author Comment

by:pdixit1977
Comment Utility
ve3ofa:
your supposition is absolutely correct. I dont know if this is connected to it or not but This is happening since last week just after installation of rollup 6 on all exchange servers.
0
 
LVL 3

Accepted Solution

by:
zackmccracken earned 250 total points
Comment Utility
pdixit, im at a loss. sorry for not being able to help you any further.
0
 

Author Closing Comment

by:pdixit1977
Comment Utility
however my issue is not resolved but i appreciate coninued help on this. I raised the same on some other portals but no luck....thanks
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Article by: btan
Provide an easy one stop to quickly get the relevant information on common asked question on Ransomware in Expert Exchange.
Disabling the Directory Sync Service Account in Office 365 will stop directory synchronization from working.
how to add IIS SMTP to handle application/Scanner relays into office 365.
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now