Forum Discussion
iRule performs 100x slower under load
Hello,
I have an iRule (BIGIP v9.4.8) whose HTTP_REQUEST timing performance drops by 100x when under load. How can I diagnose what's happening? Under what conditions can an iRule slow down (in terms of cycles/request) when there's more significant traffic (200-1000 HTTP requests per second)?
HTTP_REQUEST performance in a development environment (BIG IP 3400 w/ v9.4.8) ran with timing stats of roughly 1,000,000 cycles/request average and 10,000,000 cycles/request max. By commenting out pieces of code, we found that 90% of that time is spent on decrypting cookies. According to the "F5DevCentral_iRulesRuntimeCalculator" (see "Evaluating iRule Performance") the 10^6 cycles only represent 357 microseconds. No big deal. The max of 10^7 is a little worry but is still only 3.5 milliseconds.
"b rule show all" output from developer environment (BIG IP v9.4.8, BIG IP 3400)
+-> HTTP_REQUEST 7252 total 0 fail 0 abort
| | Cycles (min, avg, max) = (46052, 838937, 9343832)
+-> HTTP_RESPONSE 7079 total 0 fail 0 abort
| | Cycles (min, avg, max) = (34872, 849654, 9031528)
When we deploy this rule into production (BIG IP v9.4.8, BIG IP 3400), the performance falls off a cliff:
+-> HTTP_REQUEST 4530 total 2 fail 0 abort
| | Cycles (min, avg, max) = (239321, 103.3M, 200.0M)
+-> HTTP_RESPONSE 4406 total 8 fail 0 abort
| | Cycles (min, avg, max) = (390987, 132.1M, 197.9M)
(Note: the 7252 requests in DEV were from an entire day of testing where the 4530 requests in prod came from only a couple of minutes.)
We've done fairly comprehensive comparisons between dev and production and I don't see any significant configuration differences. The CPU is about 30% busy. The memory (through 'top') is running at 50% (of 2GB). This box, like the dev box, serves a large of virtual servers but under greater load.
I've gone through the "Ten Steps to iRules Optimization" and the "Evaluating iRule Performance" documents linked to from here: SOL11769. However, I haven't yet found info that would explain the timing behavior differences between dev and prod.
From other posts on DEVCENTRAL, I've found the 'tmctl' and 'b memory' commands to see if I have a memory leak.
Commenting out sections of the iRule step by step in the development environment, reseting statistics and re-exercising the rule, I can account for all of the "cycles" seen in the DEV environment. But there is a 100x slow down when the rule is facing heavier traffic in production. What tricks do you know to figure out why?
Thanks for your time,
-- Chris
21 Replies
- The_Bhattman
Nimbostratus
Hi Chris,
Can you post up the iRule that is slowing down under load? A second pair of eyes on this forumn can probably spot something in the code that may have been overlooked.
Bhattman - Chris_Huston_16
Nimbostratus
Hi Bhattman. I can't post the script - makes things difficult I know. But given the situation, what tricks would you use to find the problem? What culprits would you suspect?
-- Chris - The_Bhattman
Nimbostratus
Hi Chris,
I suppose I would do a ramp up test. Specifically determine if the performance degradation is linear or does it degrade when it reaches a specific specific amount.
Other areas that I would check is to do a bug scrub of the version you are using. Another item that I would check would be the ECV. Several years go my operational counterparts noticed in one of our v9 ADCs performance degredation was attributed to aggressive monitoring that was applied overtime. He noticed only when there was load applied. The fix was to adjust the monitoring to default levels and everything worked out.
Other than that I would definitely get F5 support involved which they can take a deeper look into what's happening.
I hope this helps,
Bhattman - Chris_Huston_16
Nimbostratus
Hi Bhattman,
Thanks. I'll run down each of those items. I'm almost done with some JMeter test plans to scale up a test load and will let you know how it goes.
Support: they sent me here.
Appreciate the help. - hoolio
Cirrostratus
Hi Chris,
Can you post the case number you had open? Also, how are you doing the cookie encryption? If you're using HTTP::cookie encrypt|decrypt, you could probably save a lot of cycles by manually encrypting the cookies instead with AES::encrypt|decrypt.
Aaron - Colin_Walker_12Historic F5 AccountWithout knowing more about what's actually happening inside the iRule tracking this down is going to be tough. I realize you can't post the entire rule, but any details you can provide would be extremely helpful. Which commands are being used, which events they're being used in, context (client vs server), etc.
Colin - hoolio
Cirrostratus
Chris,
This piqued my interest so I asked about it internally. A helpful developer pointed out an issue in pre-v11.0 where CPU cycle calculation with the timing command doesn't accurately account for the time not spent processing the iRule when suspend commands (like the session command) are used. He believes the increased CPU cycles in the live environment are a reporting issue--and not actually indicative of a drop in performance. This issue is described in BZ222913.
I've asked the NSE who has your case to contact you to explain this in greater details. As you're not comfortable posting your iRule, I'll send you some suggestions on optimizing the rest of the iRule. Generally, they are:
- Upgrade to 10.1 or higher (ideally 10.2.0HF2) to take advantage of the simpler client SSL cert handling. You could then remove the session add/lookup logic and access the cert directly throughout the lifetime of the SSL session.
http://support.f5.com/kb/en-us/products/big-ip_ltm/releasenotes/product/relnote_10_1_0_ltm.html
SSL::cert iRule commands (CR116806)
The following iRule commands now apply to the lifetime of the SSL session, and not only for the connection in which the system receives the client certificate:
SSL::cert GET_PEER_CERT
SSL::cert issuer GET_PEERCERTISSUER
SSL::cert count GET_PEER_CERTCOUNT
With this change, the system stores the received peer certificate in the SSL session, so that the certificate is available to the specified iRule commands as long as the SSL session is valid. In previous releases, the CLIENTSSL_CLIENTCERT iRule event retrieved the peer certificate; now the stored certificate can also be retrieved inside the HTTP_REQUEST event.
- Use the HTTP profile option to encrypt|decrypt cookies or manually encrypt and decrypt the cookie values using AES::encrypt|decrypt. Either of these methods is much more efficient than using the HTTP::cookie encrypt|decrypt commands in an iRule. This is due to the different ways encryption keys are generated.
- Avoid using regexes if you can use string or scan commands instead. Regexes can be 10x+ less efficient than similar string/scan commands.
- Once/if you're on a platform and OS version which supports CMP, try to avoid using global variables. In 9.x, you can change them to local variables set in CLIENT_ACCEPTED. In 10.x, you can use the static:: namespace for global variables whose values don't change during a connection. As you need to use the session command in 9.x, you won't be able to get the iRule CMP compatible. But on 10.x this would definitely improve performance (assuming you're on a CMP capable platform) as the iRule could run on all TMM instances instead of one.
http://devcentral.f5.com/wiki/default.aspx/iRules/cmpcompatibility
http://devcentral.f5.com/wiki/default.aspx/iRules/static
Aaron - Colin_Walker_12Historic F5 AccountNice Aaron, I didn't realize there was a suspend bug in timing's accounting. Good to know!
Colin - Chris_Huston_16
Nimbostratus
Aaron:- Case C850935
- Thank you for the heads up on the timing reporting accuracy. I believe the timing reflected real issues in that you could see the delay when exercising the application. The slow performance of the app vanished when the problem iRule was reverted to an earlier version.
- Definitely looking forward to upgrading to version 10+ but it's a big company and things move slowly.
- Lastly, I will modify the rule to use AES encryption instead of HTTP::cookie encrypt and compare performance.
iRule logic may not be the issue:
After doing a bunch of load testing in DEV, I believe the problem must not be with the iRule itself but with something environmental in production.
Production traffic is at a scale of a 100 requests a minute - a trickle really. In dev I loaded up roughly 40,000 requests in 20 minutes (2,000/minute) and this is the performance I saw:+-> CLIENTSSL_CLIENTCERT 2 total 0 fail 0 abort | | Cycles (min, avg, max) = (39284, 74243, 91724) +-> CLIENTSSL_HANDSHAKE 299 total 0 fail 0 abort | | Cycles (min, avg, max) = (6484, 30481, 174556) +-> CLIENT_ACCEPTED 296 total 0 fail 0 abort | | Cycles (min, avg, max) = (21288, 62419, 112668) +-> CLIENT_CLOSED 291 total 0 fail 0 abort | | Cycles (min, avg, max) = (3420, 24984, 58528) +-> HTTP_REQUEST 43262 total 0 fail 0 abort | | Cycles (min, avg, max) = (52624, 238869, 9670788) +-> HTTP_RESPONSE 43176 total 0 fail 0 abort | | Cycles (min, avg, max) = (18668, 1455857, 10122004) +-> RULE_INIT 1 total 0 fail 0 abort | Cycles (min, avg, max) = (182880, 182880, 182880)An second test run containing a different mix of traffic - one that has heavier reliance on cookie encryption produced the following metrics:
+-> CLIENTSSL_CLIENTCERT 0 total 0 fail 0 abort | | Cycles (min, avg, max) = (0, 0, 0) +-> CLIENTSSL_HANDSHAKE 18 total 0 fail 0 abort | | Cycles (min, avg, max) = (21828, 51560, 227392) +-> CLIENT_ACCEPTED 17 total 0 fail 0 abort | | Cycles (min, avg, max) = (48332, 90744, 110596) +-> CLIENT_CLOSED 18 total 0 fail 0 abort | | Cycles (min, avg, max) = (19092, 48822, 52680) +-> HTTP_REQUEST 6878 total 0 fail 0 abort | | Cycles (min, avg, max) = (78288, 1062803, 9434612) +-> HTTP_RESPONSE 6877 total 0 fail 0 abort | | Cycles (min, avg, max) = (31408, 1975720, 9943836) +-> RULE_INIT 0 total 0 fail 0 abort | Cycles (min, avg, max) = (0, 0, 0)Obviously nothing close to the 100 million cycles/request measured in production.
Output from 'b memory' showed no memory usage growth at all over the test:| ssl 7.116M 7.212M 0 | tcl 1.201M 1.647M 0
The back-end java app server was i/o bound to the DB but still had an 15 minute average load (uptime-top-/proc/loadavg) of 1.67 on a 2 processor machine.
JMeter reported an average HTTP request/response time of 22 seconds. That is hellishly slow but must be from the app server as the iRule timing metrics are still snappy.JMeter Samples 3361, Average 22073ms, Median 21883ms, 90% line 30147ms and Max of 55250ms
Session Table: we are writing a value into the session table and doing it a little oddly:session add ssl "s[SSL::sessionid]" $ccSubject $::timeout
Note the prepended 's' to the sessionid. This is vestigial - an earlier iteration of the rule attempted to save 4 different values for each SSL session by prepending a different letter to the ssl sessionid. I did this with Brice Spreitzer's (F5 iRule consultant) collaboration. Currently, the script saves only a single value. I still don't have a solid understanding of the scope of these entries.
Thanks for your time - you don't know how much I appreciate it,
-- Chris - Chris_Huston_16
Nimbostratus
Wow! AES encryption had a huge and positive impact on rule performance:
Before (with HTTP::cookie encrypt)
+-> HTTP_REQUEST 6878 total 0 fail 0 abort | | Cycles (min, avg, max) = (78288, 1062803, 9434612) +-> HTTP_RESPONSE 6877 total 0 fail 0 abort | | Cycles (min, avg, max) = (31408, 1975720, 9943836)After (with AES::encrypt)
+-> HTTP_REQUEST 3201 total 0 fail 0 abort | | Cycles (min, avg, max) = (111540, 280527, 884712) +-> HTTP_RESPONSE 3200 total 0 fail 0 abort | | Cycles (min, avg, max) = (35024, 156626, 594192)A 5x-10x improvement. Great suggestion Aaron. Thank you.
Help guide the future of your DevCentral Community!
What tools do you use to collaborate? (1min - anonymous)Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com