Link to home
Start Free TrialLog in
Avatar of Veex
Veex

asked on

Relayd sends to disabled hosts

Hi,

I'm currently using Relayd as a load balancer to a group of web servers.  I have it set up to use redirects, as I need to use the sticky-address feature.   However, when I disable one of the hosts, any existing connections STILL GO TO THAT HOST even though it's disabled.

I've tried flushing the pf state table, but no luck.   I'm guessing that if I took out the sticky-address directive, or moved to using relays instead of redirects it would also work as the sticky-address directive does not apply to relays.

So, is there a way to prevent existing connections from going to a disabled host using relayd?  Or is this a bug in relayd/freebsd?  

Further, if I need to go to relays vs. redirects, is there a performance hit?   With relays, will connections continue to go to the same host automatically unless it's disabled?
SOLUTION
Avatar of Duncan Roe
Duncan Roe
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Veex
Veex

ASKER

I'm using a FreeBSD box as a load balancer, or more simply a reverse proxy, by using Relayd.  I can set a "sticky-address" option so that further requests from the same client go to the same host.  Without that setting, the requests get round-robin to only the active hosts.  With sticky-address set,  successive requests all go to the same host.

The problem I'm having is that even when Relayd marks a host as disabled, clients with any connections that were already established prior to the host becoming disabled will still go to that same host for successive requests.
If new requests are piggy-backed onto existing connections then you will get that. Otherwise Relayd has a problem.
You can verify what is happening by using tcpdump or wireshark on a client system.
Hypothesis: if a host physically goes down (e.g. power loss or cable breaks) then connections with that host will not close. This may confuse Relayd.
Are you sure that Relayd has marked the host as down?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Well done! Neat getting TCP retries to open a new connection - but how does the client cope?
Avatar of Veex

ASKER

Client does not cope well!  If a request comes in and then the server is disabled and the state cleared, the states for the request are lost and the client browser will just sit there and wait.  Any subsequent requests from the browser will be successful, which is better than the original problem I was having.

Im going to keep this question open while I look for a more elegant solution.
Your open connection is the problem. Once Relayd becomes aware that a host is down, it needs to close its connection with the host and its matching with the client (they are separate connections). The close to the host will put that socket into close_wait state which will time out eventually. The close with the client will work straight away, with a better result than now.

I.e. instead of placing a new call on receiving [some number of] TCP retries, send a Reset. Simultaneously, close the connection to the offending host.
Avatar of Veex

ASKER

Thanks Duncan,

I would have expected that's what Relayd should be doing on it's own, but I think it's actually just creating temporary rules, adding them to the packet filter, and then passing the traffic off so that the relayd daemon isn't actually handling the TCP handshaking.  

I don't know the inner workings of Relayd and I'm making this assumption based on a few mentions here and there as I've been looking around.  I was hoping to hear something definitive from the community, but I'm not having luck there either.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Veex

ASKER

Still looking for a better solution, but using this as a temporary work around.
Avatar of Veex

ASKER

Looks like the piece I was missing was the interval timeout.  This limits how long expired states stay around.   I'm guessing the expired states were somehow causing the clients to continue to go to the same hosts (which were now disabled) because of the expired states.   Setting the interval to 0 fixed that:


set timeout interval 0