Happy 20th Birthday, BIG-IP TMOS!
I wasn’t in the waiting room with the F5 family, ears and eyes perked for the release announcement of BIG-IP version 9.0. I was a customer back in 2004, working on a government contract at Scott AFB, Illinois. I shared ownership of the F5 infrastructure, pairs of BIG-IPs running version 4.5 on Dell PowerEdge 2250 servers with one other guy. But maybe a month or two before the official first release of TMOS, my F5 account manager dropped off some shiny new hardware. And it was legit purpose-built and snazzy, not some garage-style hacked Frankenstein of COTS parts like the earlier stuff. And you wonder why we chose Dell servers!
Anyway, I was a hard-core network engineer at this time, with very little exposure to anything above layer four, and even there, my understanding was limited to ports and ACLs and maybe a little high-level clarity around transport protocols. But application protocols? Nah. No idea. So with this new hardware and an entirely new full-proxy architecture (what’s a proxy, again?) I was overwhelmed. And honestly, I was frustrated with it for the first few days because I didn’t know what I didn’t know and so I struggled to figure out what to do with it, even to replicate my half-proxy configuration in the “new way”.
But I’m a curious person. Given enough time and caffeine, I can usually get to the bottom of a problem, at least well enough to arrive at a workable solution. And so I did. My typical approach to anything is to make it work, make it work better, make it work reliably better, then finally make it work reliably and more performantly better. And the beauty here with this new TMOS system is that I was armed with a treasure trove of new toys. The short list I dug into during my beta trial, which lasted for a couple of weeks:
- The concept of a profile. When you support a few applications, this is no big deal. When you support hundreds, being able to macro configuration snippets within your application and across applications was revolutionary. Not just for the final solution, but also for setting up and executing your test plans.
- iRules. Yes, technically they existed in 4.x, but they were very limited in scope. With TMOS, F5 introduced the Tcl-based and F5 extended live-traffic scripting environment that unleashed tremendous power and flexibility for network and application teams. I dabbled with this, and thought I understood exactly how useful this was. More on this a little later.
- A host operating system. I was a router, switch, and firewall guy. Nothing I worked on had this capability. I mean, a linux system built in to my networking device? YES!!! Two things I never knew I always needed during my trial: 1) tcpdump ON BOX. Seriously--mind blown; and 2) perl scripting against config and snmp. Yeah, I know, I laugh about perl now. But 20 years ago, it was the cats pajamas.
A fortunate job change
Shortly after my trial was over, I interviewed for an accepted a job offer from a major rental car company that was looking to hire an engineer to redesign their application load balancing infrastructure and select the next gear purchase for the effort. We evaluated Cisco, Nortel/Alteon, Radware, and F5 on my recommendation. With our team’s resident architect we drafted the rubric with which we’d evaluate all the products, and whereas there were some layer two performance issues in some packet sizes that were arguably less than real-world, the BIG-IP blew away the competitors across the board. Particularly, though, in configurability and instrumentation. Tcpdump on box was such a game-changer for us.
Did we have issues with TMOS version 9? For sure. My first year with TMOS was also TMOS's first year. Bugs are going to happen with any release, but a brand new thing is guaranteed. But F5 support was awesome, and we worked through all the issues in due time. Anyway, I want to share three wins in my first year with TMOS.
Win #1
Our first production rollout was in the internet space, on BIG-IP version 9.0.5. That’s right, a .0 release. TMOS was a brand new baby, and we had great confidence throughout our testing. During our maintenance, once we flipped over the BIG-IPs, our rental transaction monitors all turned red and the scripted rental process had increased by 50%! Not good. “What is this F5 stuff? Send it back!!” But it was new, and we knew we had a gem here. We took packet captures on box, of course, then rolled back and took more packet captures, this time through taps because our old stuff didn’t have tcpdump on box.
This is where Jason started to really learn about the implications of both a full proxy architecture and the TCP protocol. It turns our our application servers had a highly-tuned TCP stack on them specific to the characteristics of the rental application. We didn’t know this, of course. But since we implemented a proxy that terminates clients at the BIG-IP and starts a new session to the servers, all those customizations for WAN traffic were lost. Once we built a TCP profile specifically for the rental application servers and tested it under WAN emulation, we not only reached parity with the prior performance but beat it by 10%. Huzzah! Go BIG-IP custom protocol stack configuration!
Win #2
For the next internal project, I had to rearchitect the terminal server farm. We had over 700 servers in two datacenters supporting over 60,000 thin clients around the world for rental terminals. Any failures meant paper tickets and unhappy staff and customers. One thing that was problematic with the existing solution is that sometimes clients would detach and upon reconnect would connect directly to the server, which skewed the load balancers view of the world and frequently overloaded some servers to the point all sessions on that server would hang until metrics (but usually angry staff) would notify. Remember my iRules comment earlier on differentiators? Well, iRules architect David Hansen happened to be a community hero and was very helpful to me in the DevCentral forums and really opened my eyes to the art of possible with iRules. He was able to take the RDP session token that was being returned by the client, read it, translate it from its Microsoft encoding format, and then forward the session on to the correct server in the backend so that all sessions continued to be accounted for in our load balancing tier. This was formative for me as a technologist and as a member of the DevCentral community.
Win #3
2004-2005 was the era before security patching was as visible a responsibility as it is today, but even then we had a process and concerns when there were obstacles. We had an internal application that had a plugin for the web tier that managed all the sessions to the app tier, and this plugin was no longer supported. We were almost a year behind on system and application patches because we had no replacement for this. Enter, again, iRules. I was able to rebuild the logic of the plugin in an iRule that IIRC wasn’t more than 30 lines. So the benefits ended up not only being a solution to that problem, but the ability to remove that web tier altogether, saving on equipment, power, and complexity costs.
And that was just the beginning...
TMOS was mature upon arrival, but it got better every year. iControl added REST-based API access; clustered multi-processing introduced tremendous performance gains; TMOS got virtualized, and all the home-lab technologists shouted with joy; a plugin architecture allowed for product modules like ASM and APM; solutions that began as iRules like AFM and SSLO became products. It’s crazy how much innovation has taken place on this platform!
The introduction of TMOS didn’t just introduce me to applications and programmability. It did that and I’m grateful, but it did so much more. It unlocked in me that fanboy level that fans of sports teams, video game platforms, Taylor Swift, etc, experience. It helped me build an online community at DevCentral, long before I was an employee.
Happy 20th Birthday, TMOS! We celebrate and salute you!
- M_najafikhahNimbostratus
Let’s hope TMOS keeps running strong 💪 and never gets decommissioned.