Forum Discussion

Roger_Wolfson_8's avatar
Roger_Wolfson_8
Icon for Nimbostratus rankNimbostratus
Aug 26, 2006

Advice on "persist hash" load balancing

Hi, I've been reading through past post about load balancing certain requests based on part of the request data using the "persist hash" method, but I haven't been able to implement it successfully. This is on v9.2.3.

I'll use the URL of this forum-post page as an example:

http://devcentral.f5.com/Default.aspx?tabid=28&forumid=5&view=post

I want to pick a member node of the pool based on the forumid value in the query string, to optimize caching on the web servers. A given user will be bounced between servers as they click between forums, but all requests for forum 5 will end up at a given node. For simplicity I'd rather use the "persist hash" method so I don't have to maintain a list of the actual nodes inside the irule.

Now that I've started testing my irule, however, I find that I'm getting assigned to the same server (at least for several minutes) regardless of page, as if it were persisting based on my client IP or other data rather than what I want.

The config:

A virtual server with one pool of two nodes, Round Robin, no default persistence profile.

My irule (I'm a TCL newbie, so feel free to point out errors):


when HTTP_REQUEST
{
if {[string tolower [HTTP::uri]] starts_with "/default.aspx"}
{
set forumid "none"
if {[string first forumid= [HTTP::query]] != -1}
{
set query [HTTP::query]
regexp -nocase forumid=(\d*) $query forumid
}
if {$forumid != "none"}
{
persist hash "$forumid"
}
}
}

I've tried to use the log local0. "something" syntax I've seen here to help debug, but I can't find where the output lands - I don't see it in the Logs section of the web UI. (I can probably optimize out the first test for the forumid substring, but I'll take care of that later). I've also tried the simpler version


when HTTP_REQUEST
{
persist hash [HTTP::query]
}

and this too seems to lock me to a single server regardless of query string, for a while before it switches. It seems like severing my TCP connection between my browser and the site (the bigip device) helps break the binding to a particular node, but even then the resulting one isn't deterministic based on the query string param.

Any advice on either what I'm doing wrong or how to better debug this?

Thanks!
  • The log function to local0 should be putting log entries in the Local Traffic section of the Logs in the GUI, or /var/log/ltm if you SSH in.

     

     

    Are you using a OneConnect profile? OneConnect attempts to "piggyback" separate connections to the servers upon an already open tcp socket to minimize bandwidth usage between BIG-IP and the servers. That may be producing similar results to what you are seeing.

     

     

    I also recall seeing on a thread here somewhere that -nocase wasn't a valid operator but I can't find it at the moment....

     

     

    Denny
  • Thanks for the tips. I tweaked some log level settings and got it logging.

    I checked and OneConnect is disabled, so that's not an issue here.

    I also had to tweak the regexp to conform to the tcl implementation, and got my match variable set correctly. However, the same problem is appearing where any request that hits the persist command gets routed to the same server regardless of hash value. It comes down to this:

    persist hash "$idmatch"
    log local0. "persisting on key $idmatch"

    I've also tried it without quotes in the first line. The log reads:

    HTTP_REQUEST: persisting on key 1570

    HTTP_REQUEST: persisting on key 1571

    HTTP_REQUEST: persisting on key 1572

    etc., yet all these requests are going to the first node in the pool. I've tried several dozen hash keys so the odds are miniscule of them all hash-modding to the same one of two servers. Requests that bypass this command seem to load-balance correctly.
  • Deb_Allen_18's avatar
    Deb_Allen_18
    Historic F5 Account
    Actually, you might NEED to enable OneConnect.

     

     

    Without OneConnect enabled, only the first request in a Keep-Alive connection is parsed for persistence data, so if multiple requests are sent on the same Keep-Alive connection, LTM will persist them all to the same destination as the first.

     

     

    A OneConnect profile with mask of 255.255.255.255 will allow parsing of all requests and serverside connections will only be re-used for the same client.

     

     

    HTH

     

    /deb

     

     

  • Thanks Deb, this seems to resolve the problem! However, is this a viable option to use in a production configuration, or does it have measurable impact against our previous benefit of connection marshalling? If I understand correctly the last thing you said, the total number of connections across all webserver nodes will now equal the total number of client connections to the Bigip device, where we're currently seeing only a small fraction of the total client connections appear on the webservers.

     

     

    While this would probably be an acceptable tradeoff for the benefit we can get, I want to make sure that's an accurate picture of it, and that this isn't discouraged behavior.

     

     

    Thanks!

     

     

    Roger
  • Deb_Allen_18's avatar
    Deb_Allen_18
    Historic F5 Account
    Hi Roger,

     

     

    Glad I could help. That's a common trap.

     

     

    Without OneConnect enabled, I can't explain whatever connection pooling you saw.

     

     

    However, OneConnect with either mask is viable in production, and either will be more efficient than none at all, since handshake overhead for your servers will be reduced.

     

     

    With OneConnect configured with the default mask of 0.0.0.0, any idle serverside connection may be re-used for any new clientside request, significantly reducing the number of serverside connections.

     

     

    However, re-used serverside connections retain the source IP of the original client, which results in some very misleading server log entries unless you are also SNATing all connections.

     

     

    Without SNAT, OneConnect with a host mask (255.255.255.255) keeps the source address info in the server logs consistent with reality.

     

     

    If you're already SNATing, the 0.0.0.0 mask will result in more efficient connection pooling.

     

     

    HTH

     

    /deb
  • unRuleY_95363's avatar
    unRuleY_95363
    Historic F5 Account
    I just wanted to clarify, that each HTTP request is parsed and inspected, regardless of whether or not the OneConnect profile is attached. However, without OneConnect, a new LB decision is not made unless the pool changes. So, even though you mave now have new criteria that should dictate a new pool member be used, it does not detach to previous one. However, one of the aspects of OneConnect is that each request is load-balanced to a potentially new member (depending on the lb mode) and persistence is taken into account during the load-balancing decision.

     

     

    HTH.