Solved

Relayd sends to disabled hosts

Posted on 2013-01-04
11
615 Views
Last Modified: 2016-02-11
Hi,

I'm currently using Relayd as a load balancer to a group of web servers.  I have it set up to use redirects, as I need to use the sticky-address feature.   However, when I disable one of the hosts, any existing connections STILL GO TO THAT HOST even though it's disabled.

I've tried flushing the pf state table, but no luck.   I'm guessing that if I took out the sticky-address directive, or moved to using relays instead of redirects it would also work as the sticky-address directive does not apply to relays.

So, is there a way to prevent existing connections from going to a disabled host using relayd?  Or is this a bug in relayd/freebsd?  

Further, if I need to go to relays vs. redirects, is there a performance hit?   With relays, will connections continue to go to the same host automatically unless it's disabled?
0
Comment
Question by:Veex
  • 6
  • 5
11 Comments
 
LVL 34

Assisted Solution

by:Duncan Roe
Duncan Roe earned 500 total points
Comment Utility
TCP segments on an existing connection have no choice but to keep trying to go to the same host.
Or do you mean something else?
0
 

Author Comment

by:Veex
Comment Utility
I'm using a FreeBSD box as a load balancer, or more simply a reverse proxy, by using Relayd.  I can set a "sticky-address" option so that further requests from the same client go to the same host.  Without that setting, the requests get round-robin to only the active hosts.  With sticky-address set,  successive requests all go to the same host.

The problem I'm having is that even when Relayd marks a host as disabled, clients with any connections that were already established prior to the host becoming disabled will still go to that same host for successive requests.
0
 
LVL 34

Expert Comment

by:Duncan Roe
Comment Utility
If new requests are piggy-backed onto existing connections then you will get that. Otherwise Relayd has a problem.
You can verify what is happening by using tcpdump or wireshark on a client system.
Hypothesis: if a host physically goes down (e.g. power loss or cable breaks) then connections with that host will not close. This may confuse Relayd.
Are you sure that Relayd has marked the host as down?
0
 

Assisted Solution

by:Veex
Veex earned 0 total points
Comment Utility
Hi Duncan,

Thanks for your replies.  I was able to overcome this, and I'll explain what happened.


Relayd creates redirect rules for PF, which have the effect of load balancing connections to servers behind the load balancer as designed.  This works well.  

When a host is disabled in relayd, new connections  go to the other enabled hosts, but exisiting connections making additional requests continue to go to the original host they connected to.   I believe this is what you were alluding to.  

The problem I'm having is that PF ( the FreeBSD packet filter I'm using) checks the state table before the ruleset so that any existing connections don't need to re-traverse the rules.  This is done for efficiency.  I found a way to kill all states for the disabled host, which I've added to my script that handles the enabling/disabling.

That command is:

pfctl -k 0.0.0.0/0 -k <disabled_host_ip>

Once the states are killed, TCP retries create a new connection which goes to one of the enabled servers.
0
 
LVL 34

Expert Comment

by:Duncan Roe
Comment Utility
Well done! Neat getting TCP retries to open a new connection - but how does the client cope?
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:Veex
Comment Utility
Client does not cope well!  If a request comes in and then the server is disabled and the state cleared, the states for the request are lost and the client browser will just sit there and wait.  Any subsequent requests from the browser will be successful, which is better than the original problem I was having.

Im going to keep this question open while I look for a more elegant solution.
0
 
LVL 34

Expert Comment

by:Duncan Roe
Comment Utility
Your open connection is the problem. Once Relayd becomes aware that a host is down, it needs to close its connection with the host and its matching with the client (they are separate connections). The close to the host will put that socket into close_wait state which will time out eventually. The close with the client will work straight away, with a better result than now.

I.e. instead of placing a new call on receiving [some number of] TCP retries, send a Reset. Simultaneously, close the connection to the offending host.
0
 

Author Comment

by:Veex
Comment Utility
Thanks Duncan,

I would have expected that's what Relayd should be doing on it's own, but I think it's actually just creating temporary rules, adding them to the packet filter, and then passing the traffic off so that the relayd daemon isn't actually handling the TCP handshaking.  

I don't know the inner workings of Relayd and I'm making this assumption based on a few mentions here and there as I've been looking around.  I was hoping to hear something definitive from the community, but I'm not having luck there either.
0
 
LVL 34

Accepted Solution

by:
Duncan Roe earned 500 total points
Comment Utility
I thought you'd done that (getting retries to open a new connection). If that's what Relayd does, you need to file a bug report.
0
 

Author Closing Comment

by:Veex
Comment Utility
Still looking for a better solution, but using this as a temporary work around.
0
 

Author Comment

by:Veex
Comment Utility
Looks like the piece I was missing was the interval timeout.  This limits how long expired states stay around.   I'm guessing the expired states were somehow causing the clients to continue to go to the same hosts (which were now disabled) because of the expired states.   Setting the interval to 0 fixed that:


set timeout interval 0
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Let’s list some of the technologies that enable smooth teleworking. 
If you're not part of the solution, you're part of the problem.   Tips on how to secure IoT devices, even the dumbest ones, so they can't be used as part of a DDoS botnet.  Use PRTG Network Monitor as one of the building blocks, to detect unusual…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now