Forum Discussion
Mike_Lowell_108
Sep 11, 2007Historic F5 Account
Any questions? Post'em
Hi everyone,
If you have any questions or comments about the performance report or it's supporting documents, please feel free to post them here.
I'm one of the engineers who helped to create the performance report, and I'll be actively monitoring this forum to answer questions.
Mike Lowell
38 Replies
- ukiran22_113041
Nimbostratus
Thanks Mike, I think I have what I needed. I don't think I'm going to tune it to that extent. Also, I realized my Ixia is set to window scaling, but I could not find an option on 8800 to enable WS. Is there a way to enable WS on 8800??
Jay - Mike_Lowell_108Historic F5 AccountThat's great! Regarding retries, this can be reduced to almost zero even when you're at capacity if you get the concurrency/rate "just right", but it's tough. :) You should be able to eliminate timeouts entirely by reducing the load just a little bit -- timeouts suggest you're seeing quite a lot more retries than I'd expect, because it means the same flow had to have multiple retransmits. It's tough to eliminate retries entirely when you're near capacity limits (on any type of device), but it's definitely possible to minimize the impact. In a world with fast retransmits, TCP timestamps, and SACK, losing a random packet here and there doesn't have a practical impact on users/servers, and it's to be expected when you're up against the device capacity.
Some ideas to help dial it in:
1) Reduce the simuser constraint by increments that are a multiple of the number of physical ports (i.e. 12)
2) Change the congestion control algorithm on your TCP profile to "highspeed"
3) Disable rfc1323 on the TCP profile.
4) Reduce the send/recv buffer sizes on BIG-IP (or Ixia) in 8KB increments.
... this sort of tuning can take a while, but if you really want to get perfect results at the edge of capacity, it's what you need to do. :)
Good luck!
Mike Lowell - ukiran22_113041
Nimbostratus
hi Mike,
I don't have any switch in between. My Ixia cards are aggregated into 2 10G ports and I connect those ports to the 10G ports on 8800.
Thanks to your detailed suggestions, I disabled flow control on teh 10G interfaces on 8800 and my throughput went up to 6.5Gbps. I increased the number of users to 1000 and it went up to 7Gbps.
I'm happy with my throughput, but since I disabled flow control, I do see a bunch of TCP retries and timeouts on the server side Ixia. Is 8800 dropping packets at those throughput levels or Ixia is just sending more and since 8800 just does 7Gbps, anything above that is dropped by 8800??? Just wanted to understand your viewpoint of those drops.
Your suggestions have been an excellent help so far. Thanks a lot.
Jay - Mike_Lowell_108Historic F5 AccountHmmm. Sounds like a pretty good setup to me. :) It's probably just a matter of tuning to get what you need.
It's likely that somewhat more than 24 simusers would reduce throughput, but a lot more than 24 should increase throughput by ensuring that both Ixia and BIG-IP constantly have something to do. I definitely suggest trying 1020 (you have 1x 12-port client blade and 1x 12 port server blade, right?) since I've run similar tests in the past and had good luck with equivalent settings.
One challenge with tests that have only large responses is that it's hard to keep BIG-IP/Ixia busy with enough work. Like you say, it's only 30% utilized. :) BIG-IP/Ixia doesn't have too much more work for 1500byte packets compared to 64byte packets, but it takes ~24x longer to send/receive the bigger ones (12.1us vs 0.51us as mentioned above, the speed of ethernet). In the end it means that you need a lot more concurrency to make sure there's always a queue of work that's waiting, otherwise BIG-IP/Ixia will be underutilized. On a much bigger scale it's the same reason that throughput for WAN links is often substantially lower than the available bandwidth: latency is the killer. Ethernet is obviously worlds faster than a WAN connection across the country, but the same principle applies.
With ethernet you're often better-off to have more physically-unique clients -- this makes it easier to generate a truly concurrent workload. I can't say that I've tried a test with just 2x cards, but I'm guessing it'll still achieve the goal with some tuning, though it would be easier to ensure the needed parallelism with more blades (because of more unique ethernet clocks -- greater possibility for keeping a stream of "back-to-back" packets flowing). The smaller the number of unique ethernet interfaces on the client/server, the harder you have to push them to ensure they're generating a constant stream of traffic that'll keep BIG-IP busy.
The most common issues I've run into with throughput tests customers are running:
1) Not enough concurrency (i.e., see above)
2) Intermediate switch connecting test equipment to BIG-IP can't do line-rate (dropping packets...)
3) Switch and/or BIG-IP are using flow control too early (try manually disabling flow control on both sides instead of using auto)
4) Not enough client/server capacity (I don't think this could apply to you with 2x 12-port Ixia blades)
5) Bad cables/optics (rather unlikely, but not impossible, given the performance you're already getting)
6) Uneven distribution of clients/servers, causing one BIG-IP CPU or switch uplink to get overwhelmed (this typically only happens with L2 testing equipment where there's a small number of hard-coded MAC/IP/ports -- not likely to be your issue). You can check this by running "tmstat" and making sure the various links links have roughly the same throughput (they're usually within 1%, but being within 10% is still not a problem in most cases)
7) Some odd bug. It's always a good idea to make sure you're running the latest version. :)
Mike Lowell - ukiran22_113041
Nimbostratus
Thanks Mike for the explanation about throughput calculations. I always take into account throughput calculation adjustments with Ixia. Since my oackets are mostly full size, I usually add about 5 % to Ixia throughput numbers.
Some answers to your questions -
a) LTM version - 9.4.6
b) All four TMMs running at about 30%
c) I have set a constraint of 24 for simulated users as my response size is higher and any higher number of simulated users is bringing down my throughput.
I'm trying standard http with these profiles - TCP, http, and One connect
I tried http_lan_optimized profile and it actually brought down my throughput a little bit.
I have 10 self IPs assigned to teh server facing VLAN.
I have my buffer sizes set to 32k in Ixia. I tried 64k and did not make much of a difference.
I'm not sure what I'm doing wrong.
Thanks again for your help
Jay - Mike_Lowell_108Historic F5 AccountJay, internally we use Ixia as our primary testing platform so I'm hopeful we can help. :)
I'm sure you're already aware, but I feel compelled to mention this anyway for other folks watching the list: Ixia only reports L7 throughput (i.e. bytes transfered over TCP, excluding TCP/IP/ethernet headers). This means the throughput reporting by Ixia is likely to be ~8% "low" for a throughput test (maybe ~12% for a conn/s test). This is an important point because the wire itself is limited to L2 throughput, so L2 throughput is what really matters from a limits-testing point of view.
Every single packet has 18 bytes of Ethernet + 20 bytes of IP + 20 bytes of TCP (IP and TCP could have more depending on options), in addition to the actual TCP data. So for a single packet that contains 100 bytes of TCP data, Ixia only reports the "100 bytes", even though the full packet size is at least 100 + 18 + 20 + 20 == 158 bytes. That's the most extreme case, but it proves the point nicely. :)
An additional relevant detail about gigabit Ethernet is that despite the "1000Mbps" name, only about 996Mbps is technically possible with a standard 1500 byte MTU. The reason for this is simple: gigabit Ethernet takes 12.1us (microseconds) to transmit a a full-size 1518 byte packet (0.51us for 64byte), and Ethernet requires a 0.096us gap between packets. This means the maximum full size packets (1500 bytes + 18 bytes Ethernet) per second is 12.1us + 0.096us divided into 1 second: 1 / (12.1us + 0.096us). The maximum number of packets per second multiplied by 1518 gives you the maximum number of bytes per second, and multiplied by 8, gives you the maximum bits per second, which is roughly 996Mbps.
This means 7 Gbps line-rate is actually ~6,972Mbps, for example. This means 7 Gbps line-rate using a large-file test is likely to be reported by Ixia as ~6,455 Mbps (this is a very rough estimate).
Anyway, moving on to your actual question.... :) I have some questions to start, and some suggestions below:
1) What BIG-IP version are you using?
2) What's TMM CPU utilization?
3) How many simusers are configured? (and do you have a simuser constraint set? I suggest a constraint of 1024).
It should be possitlve to get > 7Gbps with an 8KB response or larger using FastHTTP with default settings, or 64KB+ with standard mode using default settings, regardless of whether your using an HTTP profile (w/ or w/o RAMcache) and/or OneConnect.
When using FastHTTP (or anytime you're using SNATs) you'll want to make sure BIG-IP has enough self IP addresses so that it won't run out of empheral ports. My standard test config has 20 self IP's on the server-facing VLAN for this purpose. 20 is overkill for almost all needs, but it doesn't hurt. :) Also, this is rarely a factor for large-file throughput tests.
I also suggest trying the "tcp-lan-optimized" profile to see if that helps. It's also worth looking at your Ixia's TCP send/recv buffer settings. Depending on what version of Ixia you have they might default to 4k , which is clearly far too small to simulate what you'd see from regular Windows/Mac/Linux boxes -- I recommend 64k to start.
Mike Lowell - ukiran22_113041
Nimbostratus
hi Mike,
I'm trying to test an 8800 in my lab before we move it to production. Before trying our specific requirements, I was trying to baseline 8800's performance to the test report published by you. And I am unable to get 8800 to do 7Gbps of L7 throughput.
Specific cases I tried -
a) fast http mode with 24 servers. 2 Ixia 10G ASM cards. Response size 512K. Throughput is about 6.2Gbps.
b) standard http mode with round robin lb across 24 servers, profiles used oc, tcp, http. automap is set, and response size 512K. Throughput is 5.5Gbps.
Ixia is configured to maximize transactions per connection. Number of users is 24 with 2 concurrent connections per user.
Any suggestions you could provide in terms of 8800 config will be really helpful. Thanks in advance.
Jay - hoolio
Cirrostratus
Hi Ugur,
This article from Deb is a good place to start:
iRules Optimization 101 - 5 - Evaluating iRule Performance
http://devcentral.f5.com/Default.aspx?tabid=63&articleType=ArticleView&articleId=123
Aaron - ugurtanyildiz_9
Nimbostratus
Hi Mike
As a global customer,my company use F5.
In a current project we have redundant LTM 340 topology.
It was working properly but we have to run an irule now, and investigating the effect of running an irule on the performance.
I mean is there a report that shows the CPU usage of F5, when we run a basic irule,
I check out the forums however, could not find the answer,
I would be glad if you could help.
Thanks in advance,
ugur - Mike_Lowell_108Historic F5 AccountHi Zafer,
I encourage you to work with your local field engineer to help size deployments. When a product does more work, it requires more resources, and understanding the performance of a combined feature-set (SSL + compression + ...) is a fairly involved multi-dimensional problem (lots of variables, many things to consider).
Having said that, the relative advantage of the BIG-IP product versus competitors is still strong. If BIG-IP can handle more L4, L7, SSL, compression, and so on for individual tests, that also means BIG-IP can handle more in combined tests. For example, if BIG-IP handles 6Gbps of compression and 6Gbps of SSL in separate tests, where a competitor may handle 3Gbps of compression and 3Gbps of SSL in separate tests, neither vendor will achieve both metrics at the same time, but perhaps BIG-IP will achieve 4Gbps of SSL+compression, and the competitor will achieve 2Gbps of SSL+compression -- BIG-IP's advantage is the same whether you look at individual metrics or combined metrics.
About iRules performance, I've performed extensive tests of L7 performance on both BIG-IP and competitive platforms (including Alteon, Redline, and Foundry). If you compare similar platform models and feature-sets between products, BIG-IP's performance with iRules is consistently higher. However, if you're using iRules to do something uncommon, something that the competitors are unable to do at all, then there's of course no way to compare this directly to the competitors. For the "L7" tests that vendors in our market use as the baseline, it's inspecting an HTTP URL to select between different groups of servers. All vendors support this basic L7 feature-set, so it's a good baseline comparison. The advertised L7 performance of BIG-IP is based on this same sort of test, using iRules. As you've seen in the report(s), BIG-IP's performance using iRules for this task is very competitive against our competitors. The real key in judging L7 performance is to make sure there's a similar feature-set in use between platforms that are being compared. Based on my extensive testing, I'm very confident BIG-IP will come out ahead. :)
For common tasks like selecting a pool of servers based on the HTTP URI you don't even need iRules -- I suggest using the httpclass profile (HTTP Classification profile) instead. For more advanced tasks, my experience testing Alteon, Redline, Foundry, and more has shown that BIG-IP's performance is very competitive. If you're seeing 60% CPU on the BIG-IP, then you'll see more than 60% CPU on similar competitive hardware. If you see higher utilization on the BIG-IP compared to a similar competitive platform, then I suggest a different set of features must be in use (perhaps BIG-IP is acting as a full proxy, whereas the competitor is not, in which case it's appropriate to use a simpler non-full proxy mode on the BIG-IP).
Hope this helps, good luck!
Mike Lowell
Recent Discussions
Related Content
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects
