asked on

TMG Load Balanced Web Farm Issues

Hello Experts,

Currently we have an ISA Server 2006 enterprise array with 2 members which is being used to publish various external sites, some of which are load balanced by ISA.

This has been working perfectly well for some time. However we are currently in the testing phase of our migration to TMG as a short-ish term solution until we identify a suitable replacement solution. We have 2 TMG servers which are members of an enterprise array which is being managed by an EMS. NLB is enabled in multicast mode. All three machines are running the latest build of TMG, 7.0.9193.644.

The firewall policy has been successfully exported from the ISA array and applied to TMG. All the non-load balanced rules are working as expected. However, all of the load balanced rules (using TMG web farms) are not.

After some research it appears that the issue relates to the Web Farm ‘Proxy Requests to Published Servers’ setting in the firewall rule, specifically, where these requests appears to come from. Under ISA, this setting was set to ‘Original client who sent the request’ and it worked as expected. In TMG, using identical settings, if we go to the site we receive the following error from the browser:

“Error Code: 500 Internal Server Error. The remote server has been paused or is in the process of being started. (70)”

However, if this setting is changed to make requests appear to originate from the ‘Forefront TMG computer’ the site is available. The web servers have their default gateway set to the internal NLB address of the array.

Having these requests appear to originate from the firewall does not work for us as our applications rely on knowing the external IP of the client to function correctly. We need to get to the bottom of why this doesn’t work in TMG, but worked perfectly in ISA.

Has anyone encountered this issue before or perhaps could shed some light on a possible solution?

Bembi

You have to put TMG in single affinity mode.
This makes sure that one session is handled over the same node...

To change the setting, goto Networking - Networks tab...
From the task pane select "Enable Network Load Balancing Integration.
Select the interface - Configure NLB Settings...
Enter the IP and Mask for the primary VIP.

After you changed everything, wait for replication and restart the machines...

pxuser

ASKER

Hi Bembi,

It is my understanding that single affinity mode is the only mode of opperation when using NLB in TMG. NLB has already been configured in the way you described.

The issue does seem to be related to load balancing though.

After disabling the NIC's of one array member to ensure that all requests hit the remaining TMG server, the site we are trying to access becomes available. The ‘Proxy Requests to Published Servers’ setting seems to have no bearing on the result as setting it to either ‘Original client who sent the request’ or ‘Forefront TMG computer’ works.

Bembi

You are talking about a web farm, means you have several web servers in the farm?
if yes, do you see the web farm under
Firewall Polices - Toolbox - Network Objects - Server Farm?

According to this, you may have a look to this issue:
http://robinminto.com/blog/post/2011/10/09/Microsoft-Threat-Management-Gateway-web-farm-publishing-issue-The-remote-server-has-been-paused-or-is-in-the-process-of-being-started

pxuser

ASKER

Hi Bembi,

Yes, we are trying to load balance between two web servers, both of which have been specified in the farm. The web farm is listed under Network Objects - Server Farm.

I did read that article previously when first trying to troubleshoot the issue. The situation we have is slightly different. Our web servers all use their own internal IP address. Also, we are using HTTPS on port 443.

I have attempted the solution given in the article; specifying the server by host name and specifying the host header. However, the error we receive is still the same.

Bembi

I'm out of ideas now, so let me make some more general statements, maybe this gives the right hint...
And also you may create a new questions, here we have to many response items, so that other might not Experts involve anymore... You may include a picture of the setup to make clear, how it looks like...

May understanding is...
internet -- NLB (TMG Ext) -- 2 x TMG -- NLB (TMG int) -- NLB (Farm ext) -- 2 x Webserver.

The major point with "make requests appear to originate from the ‘Forefront TMG" is, that a device behind it sees the IP of TMG rather than the external requester. The effect is, that the response contains the IP of the TMG as target address and TMG translates it back into the external address. So a classic reverse proxy.
If the setting is not set, the web server sets the original requester IP into the response package so all involved devices (TMG and all NLB) have to route the response package over the correct path.

TMG will block any response without initial package (i.e. the request came over TMG1, the response is sent via TMG2) incoming as outgoing, and due to this, the TMG which has handled the request has to handle the response.
If you have a test client, you may enable the TMG live logs to see, which TMG is involved, If there are no incoming requests, there must not be any outgoing requests.

TMG doesn't really act as a real load balanced array, the array is more to distribute the same settings on both TMG, but they work more independent from each other. Also the Microsoft NLB doesn't work as real load balancer....

The web servers send out their traffic following their default gateway....

Another option for this configuration may be this construction...

Internet - NLB -- TMG1 -- web 1
Internet - NLB -- TMG2 -- web 2

Means to leave the inner NLB out. The default gateway of the two web servers point now to their correspondent TMG. This way it is not possible that the traffic takes a other route. But less reliable as if one web server is dead, the TMG maybe alive and will continue to distribute traffic from the NLB perspective. Windows NLB relies on the ping. As long as the next hop can be pinged, the target is declared as alive.

pxuser

ASKER

Thanks for the response Bembi,

After looking at packet captures in our test environment we have been seeing exactly what you described.

"TMG will block any response without initial package (i.e. the request came over TMG1, the response is sent via TMG2) incoming as outgoing, and due to this, the TMG which has handled the request has to handle the response."

Currently we have a case open with Microsoft, If we get to the bottom of the issue I will be sure to report back.

ASKER CERTIFIED SOLUTION

pxuser

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Bembi

Hello,

yes, this would be as well my last proposal, but not a solution but more a workaround...
Beside this, Windows NLB is NOT a real load balancer in the sense of the words but more a failover, so the question at the end is, what is your target you want to reach with the solution.

If you want to have a real load balancing and not only a fail over you need third party NLBs anyway. If the target is for a redundancy, you may decide to live with my last proposal and the proposal f MS, to leave the web NLB out and use only an NLB in front of TMG. This way you have a half redundancy, one of the TMG can fail, but NLB will not recognize a fail of one of the web servers.

Full redundancy in that case only with a least a third party NLB in front of the web servers or full load balance and redundancy with two third party NLBs.

Please close your question now, otherwise other can not see the results as a kind of solution, you may also please the EE Admins to use your own statement as solution, as I guess it is worth to keep this threat.

pxuser

ASKER

As good a solution as it is possible to reach.