The table Command: Counting
With version 10.1, we've given the session table some long-sought functionality, and revamped its iRules interface completely to give users a new, cleaner, full-featured way to keep track of global data. We are very proud to introduce the table command...
- Basics
- New Ways To Set Data
- New Ways To Alter Data
- Data Expiration
- Advanced Data Expiration
- Subtables
- Counting
- The Fine Print
- Examples
In this seventh part of the series on the usage of the new table command, we discuss counting techniques.
Counting
It turns out that one popular use of iRules is to count things and then report or act on those counts. It also turns out that there are a lot of different ways of counting things, and they require slightly different techniques. The four major ways of counting that we've found so far are (in order of increasing complexity):
- Events that happened ever (or recently)
- Events that happened in the past 1 second (or other fixed window)
- Events that happened in the past N seconds (or other sliding window)
- Events that are happening right now
We'll cover each of these in turn.
Events that happened ever
- How many requests have been made on this VIP?
- How many POSTs has user X made?
This is easy: just use table incr and table lookup. Most of the time you’ll only care about things that happened recently, so you can specify a different timeout for the entry, or just use the default. If you truly want to keep track of things ever, then you can also specify an indefinite timeout. This does mean that that memory will never go away on its own, though, so you should be careful with that option. :)
Events that happened in the last fixed window
- How many DB queries were made in the last second?
- How many requests has IP X made in the last 10-second window?
This is also pretty easy. Each window can have a name (for example, each 1 second window can be named by the result of [clock second]), and you just need to somehow combine that name with the name of the event you're counting. For example:
table incr "dbquery:[clock second]" set tensecwin "[string range [clock second] 0 end-1]" table incr "ipreq:[IP::client_addr]:$tensecwin"
Events that happened in the last sliding window
- How many hits on the login page in the last hour?
- How many bytes sent to IP X in the last minute?
- How many connections from user Y in the last 15 seconds?
- What users logged in in the last day?
While a fixed window is useful, more often a sliding window is what is desired. This is because a fixed window doesn't move with the clock. For example, if you're using a fixed 60-second window, and it's now 12:30:14, you can't know how many things happened "in the last 60 seconds". You only can know how many things happened from 12:30:00-now, and maybe from 12:29:00-12:29:59, if you kept the result from the previous window. If you had a sliding window instead, it would always contain "the last 60 seconds", meaning that if it's 12:30:14 then the window is from 12:29:15-12:30:14, and when it's 12:30:17, then the window is from 12:29:18-12:30:17.
For something like rate-limiting, this makes a huge difference in user experience. Unfortunately, the things that make it more desirable are the very things that make it more complicated. I'll explain this by considering each window as a bucket. With a fixed window, you only ever add things to the bucket ("user X made a request that gets counted in bucket #12:29"). With a sliding window, sometimes things need to come out of the bucket ("the bucket is now counting from 12:29:18 to 12:30:17, so the request that was made at 12:29:17 now needs to be removed"). This tells us that getting a sliding window means managing specific items (so we need to have a handle or name for each item) and expiring them as they move out of the window. Fortunately, a subtable is exactly up for the job. To implement a sliding window, you would do something like:
set reqno [table incr "reqs:$username"] table set -subtable "reqrate:$username" $reqno "ignored" indefinite 60 log local0. "User $username made [table keys -count -subtable "reqrate:$username"] requests in the past 60 seconds"
This keeps track of the rate of requests for each user. The first line gives each request a serial number (it just increments a variable), and the second line adds an entry to that user's subtable. The entry will expire on its own when it is supposed to leave the window, and counting the keys in the subtable will tell us the number of requests that user made in the window.
The more observant among you will note that the table entry we're using for the serial number is going to expire too, so if this user is idle for a while, then the next request will have the serial number go back to 0. This is perfectly OK, though, since as long as the window size (in this example, it’s 60 seconds) is less than the timeout of the serial number entry (in this example, it’s the default of 180 seconds), then by the time the serial number entry expires, all of the entries in the subtable will have expired also. So it doesn't matter that the serial numbers starts at 0 again. If you want to use a longer window than 180 seconds, then you'll need to change the timeout to be longer after the incr line. Now, there's no rule that says you have to have a separate variable to keep track of the identity of each thing you're counting, or that they have to be consecutive. If you already have a serial number (for example, if you're counting new JSESSIONIDs), then you can just use that.
Events that are happening right now
- How many client connections to this VIP?
- What users are currently logged in?
This is slightly tricky, but doable. It builds on the concepts last example, with some additions. One simple example is:
when HTTP_REQUEST { switch [HTTP::uri] { "/login.html" { table set -subtable userlist $username "ignored" $static::useridletimeout } "/logout.html" { table delete -subtable userlist $username } default { table lookup -subtable userlist $username } } log local0. "[table keys –count –subtable userlist] users currently logged in" }
Once again, we're adding an entry to a subtable to start counting something, but in this case not only are we removing the entry when we want to stop counting it, but we also do a lookup on that entry when the user accesses any page. The reason we do that is to keep the entry alive as long as the user is busy, so if the user doesn't log out via our logout page, we still stop counting that user (because that user's entry in our subtable will expire) when our application does (after the number of seconds in $static::useridletimeout) .
With a similar method, we can do something like limit a VIP to 1000 connections:
when CLIENT_ACCEPTED { set tbl "connlimit" set key "[IP::client_addr]:[TCP::client_port]" if { [table keys -subtable $tbl -count] > 1000 } { event CLIENT_CLOSED disable reject } else { table set -subtable $tbl $key "ignored" 180 set timer [after 60000 -periodic { table lookup -subtable $tbl $key }] } } when CLIENT_CLOSED { after cancel $timer table delete -subtable $tbl $key }
The basics are the same as the previous examples, but now we've added a timer to keep the table entry alive, rather than relying on any particular iRule event. This works because when the connection closes we kill the timer and delete the entry. Now, I know what you're asking: "Why all the extra complexity? Heck, if I had table decr, I could just do something like:"
when CLIENT_ACCEPTED { if { [table lookup "conns"] > 1000 } { event CLIENT_CLOSED disable reject } else { table incr "conns" } } when CLIENT_CLOSED { table decr "conns" }
The answer is: reliability. What happens if CLIENT_CLOSED never fires? Maybe your box runs low on memory and the connection sweeper hits. Maybe you're running on a VIPRION with no HA and a blade goes down. Your user count never gets to be decremented, and so now you start rejecting connections because you think your VIP is busier than it is. And that's why there is no table decr: to discourage exactly this sort of code. If you think you need it, you probably shouldn’t be using increment and decrement to solve your problem. In the real example above, if CLIENT_CLOSED never fires, there's no problem. No matter how the connection dies, the timer will be removed, so it will stop touching the entry, so the entry will expire on its own. In this way, your connection count may only be wrong temporarily; it will settle on the right answer all on its own.
Continue reading part eight in the series: The Fine Print