Forum Discussion

U_franco_117956's avatar
U_franco_117956
Icon for Nimbostratus rankNimbostratus
Jan 22, 2014

LTM tries to persist to pool member down

Hi.

We need help to troubleshoot this curious behavior in our ltm guest on Viprion 2400. We are noticing high cpu utilization on ltm guest when web administrators deactivate HTTP service in one of some web servers. Taking sniffer traces in ltm guest with tcpdump, we think perhaps we have found the root cause.

We are using cookie persistence inserted by ltm guest with the following irule. This irule encrypts server ip address which end user has been load balanced and it´s passed to end user browser as cookie value. On http requests this cookie is unecrypted and traffic is forced to that unencrypted server ip address with node command:

when CLIENT_ACCEPTED {
    set default_pool [LB::server pool]
}

when HTTP_REQUEST {

    set need_cookie 1
    if {[string match {*[a-zA-Z]*} [HTTP::host]]} {
            set dominio [domain [HTTP::host] 3]
        } else {
            set dominio ""
    }
    if { [HTTP::cookie exists "FVECIDO"] } {
        set decrypted [HTTP::cookie decrypt "FVECIDO" "skyisblue"]
        if { ($decrypted ne "") } {
            set persist_node [HTTP::cookie "FVECIDO"]
            foreach member [active_members -list $default_pool] {
                set node [lindex $member 0]
                if { $node eq $persist_node } {
                    set need_cookie 0
                    node $persist_node [lindex $member 1]
                }
            }
        }
    }
}

when LB_FAILED {
    set need_cookie 1
    LB::reselect
}

when HTTP_RESPONSE {
   if { $need_cookie } {
       HTTP::cookie insert name "FVECIDO" value [IP::remote_addr] path / domain $dominio
       HTTP::cookie encrypt "FVECIDO" "skyisblue"
       HTTP::cookie expires "FVECIDO" 14400
   }
}

Well, on sniffer traces we see a lot of connection attempts to server which web administrators have disabled HTTP service. We are noticing almost 7000-8000 SYN packets per second to server whose HTTP service is down. Obviously these TCP connection attempts are reseted by the server. The cuestion is, is clear this behaviour could be reasonable meanwhile configured HTTP monitor notices HTTP service is down, but... is reasonable such amount of TCP connection attempts per second?? Bear in mind this amount of connection attempts is per server. If web administrators stop two o three web servers, this connections ammount is x2 , x3...

Moreover, we see these connections attempts even http monitor has detected the http server down ¿¿?? is this an expected behavior??

Perhaps we need to modify or add anything in our irule. We are using 11.3 HF7.

Thanks

10 Replies

  • Hamish's avatar
    Hamish
    Icon for Cirrocumulus rankCirrocumulus

    I'm pretty sure if you select the node yourself, it'll over-ride the action on service down... (ISTR striking this one myself). It is certainly documented as bypassing any node selection logic.

     

    You may want to verify the pool member status before setting the node to access.

     

    H

     

  • A few questions:

     

    1. Is there anything that your custom cookie persistence provides that cannot be done by the built-in cookie persistence? Aside from setting a persistent cookie?

       

    2. How is your monitor configured, and does it aggressively and accurately mark the member down during an outage?

       

    3. What do you have in the pool's "Action On Service Down" property?

       

  • BinaryCanary_19's avatar
    BinaryCanary_19
    Historic F5 Account

    I'd say ditch the irules and use the encryption functionality provided by the LTM:

     

    http://support.f5.com/kb/en-us/solutions/public/14000/700/sol14784.html?sr=34585117

     

    You can encrypt persistence cookies too.

     

  • BinaryCanary_19's avatar
    BinaryCanary_19
    Historic F5 Account

    If a pool member is down, the only traffic it should be getting should be monitor traffic. So if you are seeing connection attempts at this scale (F5 would typically only retransmit SYN packets 3 times, so this would suggest quite heavy traffic if you're seeing thousands per second), something is forcing traffic to be sent there regardless.

     

    Hence, my proposal to ditch the irule that is trying to do what I think the box can be configured to do naturally.

     

  • Hi.

     

    Thanks a lot for your responses.

     

    We can´t ditch the irule because we need it to maintain HTTP-to-HTTPS persistece. Built-in persistence doesn´t implement match across services option, that is the reason we use this irule.

     

    Our monitor is not aggresive. It is an HTTP monitor testing in 15 sec. steps

     

    Action on service down option is configured with default value none. Perhaps we had to try with Reselect, Am I wrong??

     

    Thanks

     

    • BinaryCanary_19's avatar
      BinaryCanary_19
      Historic F5 Account
      Correct me if I'm wrong. [thought adventure] Cookie persistence encodes the selected pool member into the cookie value. The client presents this cookie to the server each time it makes a request, the Bigip decodes this and sends the request to the pool member found. Browsers send this cookie whenever they are making a request to a site with the same domain name. https only modifies the scheme, not the domain name. ------- I would be very surprised if cookie persistence needed any "modification" in order to match across services. I think it should, as long as the domain name remains the same. But I'd have to test to know for sure.
    • U_franco_117956's avatar
      U_franco_117956
      Icon for Nimbostratus rankNimbostratus
      Hi again. Thanks for your response. "Cookie persistence encodes the selected pool member into the cookie value. The client presents this cookie to the server each time it makes a request, the Bigip decodes this and sends the request to the pool member found. Browsers send this cookie whenever they are making a request to a site with the same domain name." Yes, you are right, this is what we were looking for, and irule is working fine. Cookie persistence encodes selected pool member in cookie value. by this way the same pool member is selected when we change from HTTP to HTTPS (HTTP and HTTPS vservers has the same virtual IP and the same nodes) You don´t bear in mind domain name info, We are thinking to remove it. "https only modifies the scheme, not the domain name" I´m not sure about what do want to say to me about it, but domain never is modified. Perhaps our issue is related with this sol10386 http://support.f5.com/kb/en-us/solutions/public/10000/300/sol10386.html We are trying to verify it using propossed workaround. What do you think?? B.R.
    • Kevin_Davies_40's avatar
      Kevin_Davies_40
      Icon for Nacreous rankNacreous
      What aFanen01 is trying to say is it does not matter if you call the site using http or https it will persist across services naturally so the iRule is not required. Cookie persistence is independent of protocol in this case. HTTP or HTTPS does not matter, as long as the destination uses the same pool name then the built in cookie persistence will do what you want.
  • I'd say that it depends on a few factors. The LB_FAILED event, as this article relates:

     

    ...is triggered when the BIG-IP LTM is ready to send the request to a pool member and one has not been chosen (the system failed to select a pool or a pool member), is unreachable (when no route to the chosen pool member exists), or the selected pool member is non-responsive (fails to respond to a connection request).

     

    That suggests an inability to communicate with the pool member at lower levels, like the TCP handshake. If you're only disabling the HTTP service, then the box is still technically alive and on the network, and would be accessible at layer 4. I'm curious if you ever trigger the LB_FAILED event when the HTTP service is down? The "Action on Service Down" option in the pool properties, however, suggests monitoring status. So if a monitor marks a pool member down, the Action on Service Down property would take effect and either do nothing (default), drop/reject, or reselect a new pool member. I would recommend at this point to at least try setting the Reselect option and then doing something more aggressive in your monitor.