Link to home
Start Free TrialLog in
Avatar of jkeagle13
jkeagle13

asked on

Forefront TMG Drain-Stopped Error

Hello,

I am running two VMs with Forefront TMG 2010 on Windows Server 2008 R2, pointing at a number of redundant web servers.

I am seeing a puzzling phenomenon where users will report sporadic HTTP 500 errors. These can be traced to a failure on one of the two load-balanced Forefront servers. The error repeats itself frequently on a daily basis, stating:

Forefront TMG stopped forwarding web requests through the web publishing rule <RULE> to the server farm <FARM> because all farm servers are either being drain-stopped or are out of service.

The puzzling part is that the two servers which split the load will error at different times; thus almost entirely ruling out a network or downline server failure in my mind. There is no pattern to the failure. TMG 1 may report that error for ten second at 10:35am while TMG 2 reports no issues, only to have TMG 2 report issues at 11:59am and TMG 1 work fine during the same period.

The inconsistency makes this especially hard to troubleshoot. One factor that seems to have some influence is the load it is under. While the issue doesn't entirely go away during off-peak loads, it definitely appears to subside.

I have tried correlating all possible logs I can get my hands on to see if some event is happening in parallel to cause this issue, to no avail.

There is surprising little information in web searches either; this is apparently an issue that is not widely experienced. There is a patch from ISA 2006 that I discovered at: http://support.microsoft.com/kb/945524. It seems to indicate a similar problem. However, I can't find a corresponding patch for Forefront TMG 2010.

One post I read online seemed to indicate incompatibilities with the virtual machine environment. I am open to the possibility; however, I would think the error would be far more prevalent in the Internet community if that were the actual cause.

Thanks!
ASKER CERTIFIED SOLUTION
Avatar of Keith Alabaster
Keith Alabaster
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jkeagle13
jkeagle13

ASKER

We use a Cisco load-balancer in front of the TMG servers to balance inbound traffic.

We then use TMG to balance traffic outbound to the web servers.

I don't believe it is an issue with the load balancing, however. Even if it traffic patterns were skewed slightly, the logs shows that even during the period where one server is showing the error, the other server is still receiving and processing traffic without problems. It isn't as if one TMG server is going offline, forcing an unmanageable amount of traffic onto the remaining server.

The error that indicates that the farm services are drain-stopped or out of service isn't correct either. We can manually ping the servers while one TMG shows this error without problem. Furthermore, the other TMG continues to successfully route traffic, ruling out an issue with the farm servers or network connectivity.

Thank you!