AI Token Limit Enforcement
Problem
Companies that run AI inference services on-premise instead of using public cloud providers often do so to keep sensitive data local. However, local LLM infrastructure introduces a new challenge: resource control. Without proper limits, users or applications can generate excessive inference requests and consume GPU or CPU capacity uncontrollably. Inference stacks may lack built-in mechanisms for enforcing per-user or per-role token budgets, so organizations need a way to control usage before requests reach the model.
Solution
Our approach uses BIG-IP LTM iRules only to control access and usage:
- JWT validation
- The company issues a JWT for each user request.
- When the request arrives at the iRule, we verify it using a RSA to ensure it hasn’t been tampered with.
- Role-based token limits
- The JWT payload includes the user role.
- We have three roles with different token budgets:
- standard_user → small token budget
- extended_user → medium token budget
- power_user → large token budget
- Token tracking with tables commands
- Budget enforcement
- If a user has already used too many tokens, the iRule returns HTTP 429.
- Otherwise, the token budget is decreased and the request is allowed to proceed.
- Role-change handling
- If the user role changes during a session, the token budget updates accordingly.
Impact
This iRule enables token budget enforcement directly on BIG-IP LTM without requiring additional modules or external gateways. By validating JWTs and extracting user and role information, the iRule applies role-based token limits before requests reach the inference service. This provides a simple, native way to introduce quota control and protect on-premise AI infrastructure from uncontrolled usage.
Authors
Marcio Goncalves <marcio.goncales@concentrade.de>,
Sven Schaefer <sven.schaefer@concentrade.de>
Code
Main iRule, requires the procedure library (proc_lib) below.
# Title: AI Token Limit Enforcement
# Author: Marcio Goncalves <marcio.goncales@concentrade.de>, Sven Schaefer <sven.schaefer@concentrade.de>
# Version: 1.0
# Description:
# This iRule enforces token budgets for AI inference services. The main goal
# is to limit how many tokens a user can consume based on their assigned
# role. Each role has a configurable token budget and a reset timer that
# defines when the budget is refreshed.
# The role information is provided through a JWT. Because the iRule relies
# on the JWT to determine the user identity and role, the token must first
# be validated before any request can be processed.
#
# JWT validation is therefore only a prerequisite. It ensures that the
# request is authenticated and that the role information can be trusted.
# Without a valid JWT the request cannot be processed, since neither the
# user nor the role would be known.
# The iRule validates the RSA signature of the JWT using the public key
# referenced by the key ID (kid) in the JWT header. Multiple keys are
# supported to allow key rollover. The expiration time (exp claim) is also
# verified to ensure the token is still valid.
#
# Once the JWT is validated, the iRule extracts the username and role from
# the payload and applies the corresponding token limits. If a user exceeds
# the allowed token budget, the iRule returns HTTP status code 429 (Too Many
# Requests).
#
# Logging is intentionally very verbose and controlled via debug levels
# ranging from 0 (silent) to 5 (logging like crazy).
#
# The overall goal is to implement a native LTM-only mechanism for enforcing
# token limits for AI workloads, without requiring APM.
#
# Credits / Sources:
# JWT validation logic adapted from:
# https://github.com/JuergenMang/f5-irules-jwt/blob/main/jwt-validate
# (Juergen Mang)
#
# JSON handling techniques inspired by:
# https://community.f5.com/kb/technicalarticles/working-with-json-data-in-
# irules---part-2/345282
# (Jason Rahm)
when RULE_INIT priority 100 {
# SHA256 signing header
set static::jwt_validate_digest_header_sha256 "3031300d060960864801650304020105000420"
# Public key for signature validation
set static::jwt_validate_pubkey_kid1 {-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA1RAIiNKFjm4DEuQet0zN
SQQ1/LDXP1xqUuEWEBWZ7nfhOru/l9eiJibtfoO+F8vUUFBTthm0SdiVWETF/psT
yqoDqKSjobqGquaglGmK63KDQparjnh5nJjtmMELvA4DSz6e5pO5mDdATVRpVXvp
j45rIW7eBoxMGAB0ivVm88ChyGA0UJUuyTSRuZnXyY8sMHz8JkhxWwr6i87i5p+p
E27HJ9WaCikBL2RALJIZLL+ByVknTWuRW785hN1A6V+/o/Yy9Cdqt0hif0zSC2+r
D+hIMHqDSR6WLb07KqCTbbL8q9v2selR8X5lbYYYh0vk9voD3JFvRbTtfz1YystH
qQIDAQAB
-----END PUBLIC KEY-----
}
set static::jwt_validate_pubkey_kid2 {-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAwlik5HcRTfp4c4oP5Jta
Thhqa4EjV+dJB9w9EqQa9dMQzVWXG8O1b3izee1kESICe+YUryVS9I6TbJavqH1t
ut0cM0VHLnWYQJAd7w2nK7qoDYX+uj9Lcq6pTSUH6zM/Sro0D4+/Ha6LAtyiJosx
QzA+yxaFrBwJHzXRgnCd/6crMG3eP/jaz+xid/AecHerQ1C0kRBTZd7FHt+SS677
489emEMwtpjNZCq2YnHgTULxQKjKEKMQGQrD1OOnz8ZyN9wtYSQp24lDmXVw5p6G
a42UqjQ5C6Nbj3qr/FV+49maLrXEw6kowMAb0qWpAui1BrEjxR95WrWQQrdfWZCU
6wIDAQAB
-----END PUBLIC KEY-----
}
array set static::user_role_token_limits {
standard_user 10000
extended_user 50000
power_user 100000
}
set static::user_role_default_token_limit 1000
set static::token_limit_reset_timer 30
}
when HTTP_REQUEST priority 100 {
# Debug
set debug_mode 3
if { not ([HTTP::header value Authorization] starts_with "Bearer ") } {
HTTP::respond 401 content "Authorization required" "Content-Type" "text/plain" "WWW-Authenticate" "Bearer"
log local0. "No bearer token found"
return
}
# Get JWT from authorization header
set jwt_header_b64_url [string range [getfield [HTTP::header value Authorization] "." 1] 7 end]
set jwt_body_b64_url [getfield [HTTP::header value Authorization] "." 2]
set jwt_sig_b64_url [getfield [HTTP::header value Authorization] "." 3]
if { $jwt_header_b64_url eq "" or $jwt_body_b64_url eq "" or $jwt_sig_b64_url eq "" } {
HTTP::respond 401 content "Authorization required" "Content-Type" "text/plain" "WWW-Authenticate" "Bearer"
log local0. "No bearer token found"
return
}
if {$debug_mode > 3}{log local0. "Header: $jwt_header_b64_url"}
if {$debug_mode > 3}{log local0. "Body: $jwt_body_b64_url"}
if {$debug_mode > 3}{log local0. "Sig: $jwt_sig_b64_url"}
# Decode JWT components
set jwt_header [call proc_lib::b64url_decode $jwt_header_b64_url]
if {$debug_mode > 3}{log local0. "JWT Header: $jwt_header"}
set jwt_body [call proc_lib::b64url_decode $jwt_body_b64_url]
if {$debug_mode > 3}{log local0. "JWT Body: $jwt_body"}
set jwt_sig [call proc_lib::b64url_decode $jwt_sig_b64_url]
if { $jwt_header eq "" or $jwt_body eq "" or $jwt_sig eq ""} {
HTTP::respond 401 content "Authorization required" "Content-Type" "text/plain" "WWW-Authenticate" "Bearer"
log local0. "Unable to decode jwt components"
return
}
# Get signing algorithm
set jwt_algo [call proc_lib::get_json_str "alg" $jwt_header]
if {$debug_mode > 3}{log local0. "JWT signing: $jwt_algo"}
if { $jwt_algo ne "RS256" } {
HTTP::respond 401 content "Authorization required" "Content-Type" "text/plain" "WWW-Authenticate" "Bearer"
log local0. "Unsupported signature algorithm"
return
}
# Get expiration
set jwt_exp [call proc_lib::get_json_num "exp" $jwt_body]
if {$debug_mode > 3}{log local0. "JWT expiration: $jwt_exp"}
set now [clock seconds]
if { $jwt_exp < $now } {
HTTP::respond 401 content "Authorization required" "Content-Type" "text/plain" "WWW-Authenticate" "Bearer"
log local0. "JWT expired"
return
}
# Get key id
set jwt_kid [call proc_lib::get_json_str "kid" $jwt_header]
switch -- $jwt_kid {
"kid1" { set jwt_pubkey $static::jwt_validate_pubkey_kid1 }
"kid2" { set jwt_pubkey $static::jwt_validate_pubkey_kid2 }
default {
HTTP::respond 401 content "Authorization required" "Content-Type" "text/plain" "WWW-Authenticate" "Bearer"
log local0. "Unknown kid: $jwt_kid"
return
}
}
# Decrypt signature with public key
if { [catch {
set jwt_sig_decrypted [CRYPTO::decrypt -alg rsa-pub -key $jwt_pubkey $jwt_sig]
binary scan $jwt_sig_decrypted H* jwt_sig_decrypted_hex
if {$debug_mode > 3}{log local0. "Signature: $jwt_sig_decrypted_hex"}
}] } {
HTTP::respond 401 content "Authorization required" "Content-Type" "text/plain" "WWW-Authenticate" "Bearer"
log local0. "Unable to decrypt signature: [subst "\$::errorInfo"]"
return
}
# Create hash from JWT header and payload
set hash [sha256 "$jwt_header_b64_url.$jwt_body_b64_url"]
binary scan $hash H* hash_hex
if {$debug_mode > 3}{log local0. "Calculated: ${static::jwt_validate_digest_header_sha256}${hash_hex}"}
# Compare calculated and decrypted hash
if { "${static::jwt_validate_digest_header_sha256}${hash_hex}" ne $jwt_sig_decrypted_hex } {
HTTP::respond 401 content "Authorization required" "Content-Type" "text/plain" "WWW-Authenticate" "Bearer"
return
}
set jwt_user [call proc_lib::get_json_str "user" $jwt_body]
set jwt_role [call proc_lib::get_json_str "role" $jwt_body]
if {$debug_mode > 0}{log local0. "Signature verified. JWT accepted. User: $jwt_user, Role: $jwt_role"}
}
when JSON_REQUEST {
if {$debug_mode > 4}{log local0. "JSON Request detected successfully."}
# Get JSON data from request body
set json_data [JSON::root]
if {$debug_mode > 4} {
#call proc_lib::print $json_data
log local0. [call proc_lib::stringify $json_data]
}
set user_prompts [call proc_lib::find_key $json_data "messages"]
if {$debug_mode > 4}{log local0. "User-Prompts: $user_prompts"}
if {$debug_mode > 3}{log local0. "JWT-User: $jwt_user"}
if {$debug_mode > 3}{log local0. "JWT-Role: $jwt_role"}
# check if role exists in dict
if {[info exists static::user_role_token_limits($jwt_role)]} {
# get configured token limit
set initial_tokens $static::user_role_token_limits($jwt_role)
} else {
if {$debug_mode > 0}{log local0. "Role \"$jwt_role\" unknown, applying default limit"}
# fallback value
set initial_tokens $static::user_role_default_token_limit
}
if {$debug_mode > 1}{log local0. "Initial Tokens: $initial_tokens"}
set estimated_tokens [expr {[string length $user_prompts] / 4}]
if {$debug_mode > 1}{log local0. "Estimated Tokens: $estimated_tokens"}
# Current time
set now [clock seconds]
# Check last refill for this user
set last_refill [table lookup "last_refill:$jwt_user"]
# If no refill exists or 24h passed
if {$last_refill eq "" || ($now - $last_refill) >= $static::token_limit_reset_timer} {
if {$debug_mode > 1}{log local0. "Refilling tokens for user $jwt_user, because reset timer expired."}
table set "tokens_remaining:$jwt_user" $initial_tokens indef
table set "last_refill:$jwt_user" $now indef
}
set prev_role [table lookup "user_role:$jwt_user"]
if {$prev_role eq ""} {
if {$debug_mode > 1}{log local0. "Role not yet defined for user $jwt_user"}
table set "user_role:$jwt_user" $jwt_role indef
}
elseif {$prev_role ne $jwt_role} {
if {$debug_mode > 0}{log local0. "Role change detected for user $jwt_user: $prev_role -> $jwt_role"}
# Re-calculate token limits based on new role
set tokens_left [table lookup "tokens_remaining:$jwt_user"]
set prev_role_limit $static::user_role_token_limits($prev_role)
set new_role_limit $static::user_role_token_limits($jwt_role)
set new_role_limit_diff [expr {$new_role_limit - $prev_role_limit}]
set tokens_left [expr {$tokens_left + $new_role_limit_diff}]
if {$debug_mode > 1}{log local0. "Adjusting tokens for role change. Previous role limit: $prev_role_limit, New role limit: $new_role_limit, Tokens left adjusted by: $new_role_limit_diff, New tokens left: $tokens_left"}
table set "tokens_remaining:$jwt_user" $tokens_left indef
table set "user_role:$jwt_user" $jwt_role indef
}
else {
if {$debug_mode > 1}{log local0. "Role for user $jwt_user remains unchanged: $jwt_role"}
}
set tokens_left [table lookup "tokens_remaining:$jwt_user"]
# Initialize or reset token count if new session or role has changed
if {$tokens_left eq "" || $prev_role ne $jwt_role} {
set tokens_left $initial_tokens
}
if {$debug_mode > 3}{log local0. "Session table info for user $jwt_user"}
foreach key [list "tokens_remaining:$jwt_user" "tokens_used:$jwt_user" "prompt:$jwt_user" "user_role:$jwt_user"] {
set val [table lookup $key]
if {$debug_mode > 3}{log local0. " $key = $val"}
}
if {$tokens_left < $estimated_tokens} {
if {$debug_mode > 0}{log local0. "Token budget exceeded for user $jwt_user (role: $jwt_role). Remaining: $tokens_left, needed: $estimated_tokens"}
HTTP::respond 429 content "Token budget exceeded for role $jwt_user. Please upgrade your plan." "Content-Type" "text/plain"
return
} else {
# decrease remaining tokens
if {$debug_mode > 1}{log local0. "Decreasing tokens for user $jwt_user (role: $jwt_role). Remaining: $tokens_left, needed: $estimated_tokens"}
set tokens_left [expr {$tokens_left - $estimated_tokens}]
table set "tokens_remaining:$jwt_user" $tokens_left indef
# initialize or update used tokens
if {$debug_mode > 1}{log local0. "Updating used tokens for user $jwt_user (role: $jwt_role). Used: $estimated_tokens"}
set tokens_used [table lookup "tokens_used:$jwt_user"]
if {$tokens_used eq ""} { set tokens_used 0 }
set tokens_used [expr {$tokens_used + $estimated_tokens}]
table set "tokens_used:$jwt_user" $tokens_used indef
}
}
when JSON_REQUEST_MISSING {
if {$debug_mode > 4}{log local0. "JSON Request missing."}
}
when JSON_REQUEST_ERROR {
if {$debug_mode > 4}{log local0. "Error processing JSON request. Rejecting request."}
}
when JSON_RESPONSE {
if {$debug_mode > 4}{log local0. "JSON response detected successfully."}
}
when JSON_RESPONSE_MISSING {
if {$debug_mode > 4}{log local0. "JSON Response missing."}
}
when JSON_RESPONSE_ERROR {
if {$debug_mode > 4}{log local0. "Error processing JSON response."}
}
This is procedure library (proc_lib must be used):
proc b64url_decode { str } {
set mod [expr { [string length $str] % 4 } ]
if { $mod == 2 } {
append str "=="
} elseif {$mod == 3} {
append str "="
}
if { [catch { b64decode [ string map {- + _ /} $str] } str_b64decoded ] == 0 and $str_b64decoded ne "" } {
return $str_b64decoded
} else {
log local0. "Base64URL decoding error: [subst "\$::errorInfo"]"
return ""
}
}
proc get_json_num { key str } {
set value [findstr $str "\"$key\"" [ expr { [string length $key] + 2 } ] ]
set value [string trimleft $value {: }]
return [scan $value {%[0-9]}]
}
proc get_json_str { key str } {
set value [findstr $str "\"$key\"" [ expr { [string length $key] + 2 } ] ]
set value [string trimleft $value {:" }]
set json_value ""
set escaped 0
foreach char [split $value ""] {
if { $escaped == 0 } {
if { $char eq "\\" } {
# next char is escaped
set escaped 1
} elseif { $char eq {"} } {
# exit loop on first unescaped quotation mark
break
} else {
append json_value $char
}
} else {
switch -- $char {
"\"" -
"\\" {
append json_value $char
}
default {
# simply ignore other escaped values
}
}
set escaped 0
}
}
return $json_value
}
proc print { e } {
set t [JSON::type $e]
set v [JSON::get $e]
set p0 [string repeat " " [expr {2 * ([info level] - 1)}]]
set p [string repeat " " [expr {2 * [info level]}]]
switch $t {
array {
log local0. "$p0\["
set size [JSON::array size $v]
for {set i 0} {$i < $size} {incr i} {
set e2 [JSON::array get $v $i]
call proc_lib::print $e2
}
log local0. "$p0\]"
}
object {
log local0. "$p0{"
set keys [JSON::object keys $v]
foreach k $keys {
set e2 [JSON::object get $v $k]
log local0. "$p${k}:"
call proc_lib::print $e2
}
log local0. "$p0}"
}
string - literal {
set v2 [JSON::get $e $t]
log local0. "$p\"$v2\""
}
default {
set v2 [JSON::get $e $t]
if { $v2 eq "" && $t eq "null" } {
log local0. "${p}null"
} elseif { $v2 == 1 && $t eq "boolean" } {
log local0. "${p}true"
} elseif { $v2 == 0 && $t eq "boolean" } {
log local0. "${p}false"
} else {
log local0. "$p$v2"
}
}
}
}
proc stringify { json_element } {
set element_type [JSON::type $json_element]
set element_value [JSON::get $json_element]
set output ""
switch -- $element_type {
array {
append output "\["
set array_size [JSON::array size $element_value]
for {set index 0} {$index < $array_size} {incr index} {
set array_item [JSON::array get $element_value $index]
append output [call proc_lib::stringify $array_item]
if {$index < $array_size - 1} {
append output ","
}
}
append output "\]"
}
object {
append output "{"
set object_keys [JSON::object keys $element_value]
set key_count [llength $object_keys]
set current_index 0
foreach current_key $object_keys {
set nested_element [JSON::object get $element_value $current_key]
append output "\"${current_key}\":"
append output [call proc_lib::stringify $nested_element]
if {$current_index < $key_count - 1} {
append output ","
}
incr current_index
}
append output "}"
}
string - literal {
set actual_value [JSON::get $json_element $element_type]
append output "\"$actual_value\""
}
default {
set actual_value [JSON::get $json_element $element_type]
append output "$actual_value"
}
}
return $output
}
proc find_key { json_element search_key } {
set element_type [JSON::type $json_element]
set element_value [JSON::get $json_element]
switch -- $element_type {
array {
set array_size [JSON::array size $element_value]
for {set index 0} {$index < $array_size} {incr index} {
set array_item [JSON::array get $element_value $index]
set result [call proc_lib::find_key $array_item $search_key]
if {$result ne ""} {
return $result
}
}
}
object {
set object_keys [JSON::object keys $element_value]
foreach current_key $object_keys {
if {$current_key eq $search_key} {
set found_element [JSON::object get $element_value $current_key]
set found_type [JSON::type $found_element]
if {$found_type eq "object" || $found_type eq "array"} {
set found_value [call proc_lib::stringify $found_element]
} else {
set found_value [JSON::get $found_element $found_type]
}
return $found_value
}
set nested_element [JSON::object get $element_value $current_key]
set result [call proc_lib::find_key $nested_element $search_key]
if {$result ne ""} {
return $result
}
}
}
}
return ""
}
Example JWT:
eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6ImtpZDEifQ.eyJzdWIiOiIxMjM0NTY3ODkwIiwidXNlciI6ImpvaG4uZG9lQGNvbmNlbnRyYWRlLmRlIiwicm9sZSI6InN0YW5kYXJkX3VzZXIiLCJpYXQiOjE3NzU4NzU5MjMsImV4cCI6MTc3NTg3NTkyM30.rV-gaGKOEG1p_1G652_dFUBHT_X4pI-KNgu2W_I0eJevIg3FviO_0c9BOoOOUspBADttCjzEciBhLPJ2P5r_PqIdXu5khUCjH4Sq5P6zV_sTQjbRiPatYirLWtbypamSJby_TfnEFFl7sz642YuDQ7zyvbHbPCllaM4stE_Zsa1QtOy18lUJO3Uy4ngJR8CRZ6flgPhvk79rTOGXAczYNJVo5gwHyKKA6Stdp5_c7FjyEySpCfYNmWQ2AasF3DDFCDiQQpxgW-hr--NnLc0FFBan4IfQ7btn73Pc56mhJC5gAwgRJLnLLe7LbR5chfjZ26COuH0ILYvaBq0w3yCE2gExample POST Data:
{
"model": "llama3.1:8b",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant for security operations."
},
{
"role": "user",
"content": "Analyze this HTTP request and tell me whether it looks malicious."
}
],
"stream": false,
"options": {
"temperature": 0.2
}
}
Help guide the future of your DevCentral Community!
What tools do you use to collaborate? (1min - anonymous)