What Makes LineRate so Fast?
Every year F5 has a sales conference. This involves flying well over a thousand, sales, pre-sales, marketing, and 'assorted' people to some to some location, usually a safe distance from civilized society, for the purposes of motivating and educating them. The precision weapons we choose to use to achieve these worthy aims tend to revolve around PowerPoint and inspirational speeches. I'll leave to your imagination the amount of actual knowledge or motivation that manages to pierce the protective shield of hangoverium that most of my esteemed colleagues choose to clothe themselves with.
But I digress.
I've been lucky to go to a good number of these events as either speaker or attendee, and after a while you learn either by experience or by imparted lore which sessions you should not miss no matter how acute the throbbing in your head may be. One of the most famed of this (if I'm honest, small) subset of talks is given by one of our senior software architects. Every year members of the technical sales team flock to his presentation, where, with a loose outline (which is more of a cover story for the event's organizers) he spins tales of the inner workings of the BIG-IP and TMOS and explains why things are how they are and what might be coming next. Observe carefully and you can see the three year technology roadmap being created before you. The man is a master of his field.
During the session this year, he began talking about LineRate and how the developers had solved some very difficult problems in their architecture. He went on to say he had consulted with them about some parallel solutions within the TMOS architecture and how he hoped they didn't think he was too stupid. Wait, what? If he's impressed, then there must be something going on.
I've read the LineRate data sheets and installed and experimented with it a little (and if you haven't, you should, it’s free to try and the ability to create traffic management solutions with node.js is pretty cool). I can see that it is fast, but why?
After plenty of conversations, where I've mainly been asking people repeat themselves using smaller words, I think I've got it. It turns out that the high performance that LineRate delivers boils down to the age old adage of 'find the bottleneck and fix it’ (well, move it). More specifically, in this case it turns out to be the locking mechanism of the operating system kernel and the TCP stack.
The traditional way that applications access the TCP stack to setup connections often involves separate threads waiting for locks to be released. This has become a significant issue in network throughput as multi-core processors have become the norm. This has led to some fairly ugly scalability numbers with performance per core numbers dropping rapidly as systems scale. This is especially important in devices that designed to terminate and initiate hundreds of thousands of connections per second, like a load-balancer or application proxy. Here the TCP processing time is generally going to be as or more important than the any actual payload processing – even when using some of the advanced programmability that the node.js engine inside LineRate offers.
There have been a range of approaches to solving these issues, and the various possible kernel locking mechanisms have been widely discussed, but LineRate has significantly enhanced the way that processes access the TCP stack, to remove many of the lock contention issues that, in practice, the existing methods still suffer from. With the LineRate architecture, threads are able to access different messages in parallel, with almost no locking contention between them. With less locking comes far better scalability across cores, and greatly improved performance.
So there you go, it’s fast because the designers of LineRate simply looked at where their use-case specific bottleneck was, and then spent a few short years re-engineering the kernel to eliminate it.
I’d encourage you all to download a copy and see for yourself.