Forum Discussion

Steve_Scott_873's avatar
Steve_Scott_873
Historic F5 Account
Jan 28, 2010

Unexpected TMOS utalsiation hit

I have a bigip 6400 platform running LTM v 9.4.7

 

 

We have an iRule to direct HTTP requests coming into the F5 from the datacentre out to the appropriate external service (For XML messaging). This is a lot cleaner than having 250 vips for 250 different services.

 

The rule matches the HTTP host to a pool defined on the server, so if we need to update a host, we can do this via the web interface, without making any changes to the iRule by hand (= lower risk, easier operation).

 

Slap the iRule on the virtual server and a serverssl profile on there and you've got encrypted traffic going out.

 

 

We've also got some error handling to deal with people trying to get to services they shouldn't from this particular vip, and error messages if the external service is down.

 

 

Here is the code:

 

 iRule to direct HTTP requests based on hostname    
   when HTTP_REQUEST {   
   Extract hostname - needs to be lower case to match pool name   
   set host [string tolower [HTTP::host]]   
    Check if the hostname (and therefore the pool name the request will be sent to   
    end with correct domain - Prevent using this VS to hop out a different pool   
   if { ($host ends_with ".bob.com") and !($host contains ".test.")} {   
   if { [catch { pool $host } ] } {   
    no matching pool name - so move on to error   
   HTTP::respond 404 content "Endpoint not defined"     
   }   
   } else {   
      
   HTTP::respond 403 content "Invalid URL"     
   }   
      
   }   
      
   when LB_FAILED {   
   HTTP::respond 504 content "Endpoint Unavailable"     
   }

 

 

So far, so good. We've tested, we've run iRule benchmarking and its reasonably efficient...

 

 

config  cat /proc/cpuinfo   
   model name      : AMD Opteron(tm) Processor 246   
   stepping        : 1   
   cpu MHz         : 1992.276   
      
   config  bigpipe rule Prod_Spine_Generic_Routing show all   
   RULE Prod_Spine_Generic_Routing   
   +-> HTTP_REQUEST   3123 total   0 fail   0 abort   
   |   |     Cycles (min, avg, max) = (17437, 51642, 92343)   
   +-> LB_FAILED   0 total   0 fail   0 abort   
       |     Cycles (min, avg, max) = (0, 0, 0)

 

 

When that's spreadsheeted it comes back with 40,000 TPS, which is perfectly reasonable for the amount of traffic we're expecting.

 

 

So at that point we said, we've got this sorted, put it into production and carried on to the next piece of work.

 

 

We've now got people using this VIP, and during their first bulk load run they got to the dizzying heights of 12 HTTP requests per second. During this period the TMOS utilisation moved from its usual 1-2% to 25%, and when they backed off to 6 TPS it reduced to 10-12%.

 

 

Clearly somethings not right here, but the timing stats (Produced from production data) show its all fine. There has to be some sort of hidden cost somewhere, but I can't see any obvious place its coming from.

 

Our reseller isn't being very helpful, so I'm between a rock and a hard place here.

 

 

Any thoughts would be most gratefully received...

14 Replies

  • hoolio's avatar
    hoolio
    Icon for Cirrostratus rankCirrostratus
    I talked with Chris this morning and discussed some testing options that the two of you can try together. If the troubleshooting gets stuck he'll let me know and I can try to help where I can.

     

     

    Aaron
  • Steve_Scott_873's avatar
    Steve_Scott_873
    Historic F5 Account
    Well, it appears the iRule timings are quite correct, the inefficiency is coming from server certificate checking being enabled on the serverssl profile. My understanding was this was almost entirely dealt with in hardware, so i weren't expecting problems there.

     

     

    Hopefully I can get that raised as a support case, the capacity there is somewhat lower than advertised
  • Hamish's avatar
    Hamish
    Icon for Cirrocumulus rankCirrocumulus

     

    is cert checking even supposed to be accelerated (Do you mean the 'Server Authentication' option?)

     

     

    Handshakes and bulk crypto is for SOME ciphers. Not all (See SOL6739 for a list of fully accelerated cipher). SOL6808 lists the accelerated (native) and unaccelerated (compat) ciphers for v9.x +

     

     

    Oh... Have you tried setting oneconnect? Are you getting http keepalives or not?

     

     

    H
  • Steve_Scott_873's avatar
    Steve_Scott_873
    Historic F5 Account
    I do indeed mean the server authentication option.

     

     

    In our case, we have a mix of native and compat ciphers enabled. We were using native ciphers, however it seems that having compat ciphers enabled meant that the session cache was not working correctly, and sessions were not being resumed - obviously more work.

     

    Changing to native only (Disabling DHE and DH ciphers in our case) seems to have the session cache back working, and even if the session cache size is set to 0 then its still pushing 10-15% with 120 TPS rather than 50% with compat there. (Again, we were using AES256+SHA, so its on the native fully accelerated list, it seems merely having compats there in a serverssl profile is enough to cause problems)

     

     

    Couldn't find anything on this in the knowdgebase, either before or after i've found the solution