Citrix server can't take it for 30 minutes it after a BSOD
Posted on 2011-09-19
We have a Citrix 4.5 farm with over a dozen Win 2003 x64 servers with 12Gb of ram, each with dual quadcore xeon 3Ghz
It doesnt happen often, but each we have a BSOD on one of them during office hours we seem to face the same problem each time.
By the time the 50 to 60 users who were on that server, realize their session is gone (5 minutes or so) they try to start their applications again. Ofcourse by that time the crashed server has booted up already and accepting new connections.
The citrix loadbalancer then sends all these users which dont have a session yet, to that one server which is empty. (or still with less resources in use in comparison to the others in the farm)
And this one citrix server, freshly booted gets all these new users all at once, and he cant take it anymore for 20-30 minutes.
End-users think something is wrong on their end, click the application again. The first session isnt known yet in the sessions list, so gets another session, again on that one server which already busy enough, making it worse.
It helps putting that one server 'offline' in the farm, but I have to be fast doing that and most of the times I realize what happened it's already (or near) up to speed again.
Any suggestions how to best deal with this?