Veex
asked on
Relayd sends to disabled hosts
Hi,
I'm currently using Relayd as a load balancer to a group of web servers. I have it set up to use redirects, as I need to use the sticky-address feature. However, when I disable one of the hosts, any existing connections STILL GO TO THAT HOST even though it's disabled.
I've tried flushing the pf state table, but no luck. I'm guessing that if I took out the sticky-address directive, or moved to using relays instead of redirects it would also work as the sticky-address directive does not apply to relays.
So, is there a way to prevent existing connections from going to a disabled host using relayd? Or is this a bug in relayd/freebsd?
Further, if I need to go to relays vs. redirects, is there a performance hit? With relays, will connections continue to go to the same host automatically unless it's disabled?
I'm currently using Relayd as a load balancer to a group of web servers. I have it set up to use redirects, as I need to use the sticky-address feature. However, when I disable one of the hosts, any existing connections STILL GO TO THAT HOST even though it's disabled.
I've tried flushing the pf state table, but no luck. I'm guessing that if I took out the sticky-address directive, or moved to using relays instead of redirects it would also work as the sticky-address directive does not apply to relays.
So, is there a way to prevent existing connections from going to a disabled host using relayd? Or is this a bug in relayd/freebsd?
Further, if I need to go to relays vs. redirects, is there a performance hit? With relays, will connections continue to go to the same host automatically unless it's disabled?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
If new requests are piggy-backed onto existing connections then you will get that. Otherwise Relayd has a problem.
You can verify what is happening by using tcpdump or wireshark on a client system.
Hypothesis: if a host physically goes down (e.g. power loss or cable breaks) then connections with that host will not close. This may confuse Relayd.
Are you sure that Relayd has marked the host as down?
You can verify what is happening by using tcpdump or wireshark on a client system.
Hypothesis: if a host physically goes down (e.g. power loss or cable breaks) then connections with that host will not close. This may confuse Relayd.
Are you sure that Relayd has marked the host as down?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Well done! Neat getting TCP retries to open a new connection - but how does the client cope?
ASKER
Client does not cope well! If a request comes in and then the server is disabled and the state cleared, the states for the request are lost and the client browser will just sit there and wait. Any subsequent requests from the browser will be successful, which is better than the original problem I was having.
Im going to keep this question open while I look for a more elegant solution.
Im going to keep this question open while I look for a more elegant solution.
Your open connection is the problem. Once Relayd becomes aware that a host is down, it needs to close its connection with the host and its matching with the client (they are separate connections). The close to the host will put that socket into close_wait state which will time out eventually. The close with the client will work straight away, with a better result than now.
I.e. instead of placing a new call on receiving [some number of] TCP retries, send a Reset. Simultaneously, close the connection to the offending host.
I.e. instead of placing a new call on receiving [some number of] TCP retries, send a Reset. Simultaneously, close the connection to the offending host.
ASKER
Thanks Duncan,
I would have expected that's what Relayd should be doing on it's own, but I think it's actually just creating temporary rules, adding them to the packet filter, and then passing the traffic off so that the relayd daemon isn't actually handling the TCP handshaking.
I don't know the inner workings of Relayd and I'm making this assumption based on a few mentions here and there as I've been looking around. I was hoping to hear something definitive from the community, but I'm not having luck there either.
I would have expected that's what Relayd should be doing on it's own, but I think it's actually just creating temporary rules, adding them to the packet filter, and then passing the traffic off so that the relayd daemon isn't actually handling the TCP handshaking.
I don't know the inner workings of Relayd and I'm making this assumption based on a few mentions here and there as I've been looking around. I was hoping to hear something definitive from the community, but I'm not having luck there either.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Still looking for a better solution, but using this as a temporary work around.
ASKER
Looks like the piece I was missing was the interval timeout. This limits how long expired states stay around. I'm guessing the expired states were somehow causing the clients to continue to go to the same hosts (which were now disabled) because of the expired states. Setting the interval to 0 fixed that:
set timeout interval 0
set timeout interval 0
ASKER
The problem I'm having is that even when Relayd marks a host as disabled, clients with any connections that were already established prior to the host becoming disabled will still go to that same host for successive requests.