AI/Bot Traffic Throttling iRule (UA Substring + IP Range Mapping)

Problem

Tags: appworld 2026, vegas, irules

Created by Tim Riker using AI for the DevCentral competition. Written entirely by ChatGPT.

Executive Summary

This iRule provides a practical, production-ready method for throttling AI agents, crawlers, automation frameworks, and other high-volume HTTP clients at the BIG-IP edge. Bots are identified first by User-Agent substring matching and, if necessary, by source IP range mapping.

Solution

Throttling is enforced per bot identity rather than per client IP, which more accurately reflects how modern AI systems operate using distributed egress networks.

The solution is entirely data-group driven, operationally simple, and requires no external systems. Security and operations teams can adjust bot behavior dynamically without modifying the iRule itself.

Why This Matters

Modern AI agents, LLM training bots, search indexers, and automation frameworks can generate extremely high request volumes. Even legitimate AI services can unintentionally:

  • Create excessive origin load
  • Increase bandwidth and infrastructure cost
  • Trigger autoscaling events
  • Impact latency for real users
  • Skew analytics and performance metrics

Rather than blocking AI traffic outright, organizations often need controlled rate limiting. This iRule enables responsible throttling while preserving service availability and fairness.

Contest Justification

Innovation and Creativity

This iRule implements identity-based throttling rather than traditional per-IP rate limiting. Because AI agents frequently operate from multiple IP addresses, shared throttling by canonical bot identity provides significantly more accurate control.

The dual attribution model (User-Agent substring first, IP-range fallback second) allows the system to handle both transparent and opaque clients, including cases where User-Agent headers are missing or spoofed.

Technical Excellence

This implementation uses native BIG-IP primitives only:

  • class match -element -- contains for efficient substring matching
  • class match -value for IP range mapping
  • table incr for shared counters
  • HTTP 429 with Retry-After for standards-compliant throttling

The iRule parses only the first two whitespace tokens of the datagroup value, allowing inline comments while maintaining strict numeric enforcement. The logic executes only when a bot match occurs, keeping overhead minimal.

Theme Alignment

As AI-generated traffic becomes increasingly common, edge enforcement policies must evolve. This iRule demonstrates a practical, deployable mechanism for managing AI-era traffic patterns directly at the application delivery layer.

Impact

Organizations deploying AI throttling controls can:

  • Protect origin infrastructure from automated traffic surges
  • Maintain consistent performance for human users
  • Reduce infrastructure and bandwidth cost
  • Avoid over-provisioning driven by bot bursts
  • Implement governance policies for AI consumption

Because throttle limits are configured via datagroups, operational adjustments can be made instantly without code changes, reducing risk and change-control friction.

Code

Required Datagroup Configuration

dg_bot_agent (String Datagroup)

Key: User-Agent substring or canonical bot name.

Value format: First two whitespace-separated integers define <limit> <window>. Additional text after the first two tokens is ignored.

googlebot = "5 60"
bingbot = "3 30 search crawler"
my-ai-agent = "10 10 internal load test"

"5 60" means allow 5 requests per 60 seconds.

dg_bot_net (Address Datagroup)

Key: IP address or CIDR range.

Value: Must match a key defined in dg_bot_agent.

198.51.100.0/24 = "my-ai-agent"
203.0.113.0/25 = "googlebot"

Deployment Steps

  1. Create dg_bot_agent (string).
  2. Create dg_bot_net (address).
  3. Populate dg_bot_agent using "<limit> <window> optional comment".
  4. Populate dg_bot_net ranges mapping to dg_bot_agent keys.
  5. Attach the iRule to an HTTP virtual server.

Testing Scenario

Set dg_bot_agent entry: my-ai-agent = "3 30 demo".

Send four rapid requests using User-Agent: my-ai-agent. The first three succeed. The fourth returns HTTP 429 with Retry-After: 30.

Map an IP range in dg_bot_net to my-ai-agent. Multiple clients within that range will share the same throttle counter.

Operational Notes

  • Throttling is per bot identity, not per IP.
  • Enable logging by setting static::bot_log to 1.
  • Configure table mirroring if cluster-wide counters are required.
  • Validate on BIG-IP v21 to meet contest eligibility requirements.

Architectural Diagram Description

The solution can be visualized as an edge-side decision pipeline on BIG-IP, where each HTTP request is classified and optionally rate-limited before it reaches the application.

Diagram components:

  • Client: Human browser, bot, crawler, AI agent, automation framework, or any HTTP client.
  • BIG-IP Virtual Server (HTTP): Entry point where the iRule executes in the HTTP_REQUEST event.
  • Identification Layer: Determines the bot identity using a two-stage method (User-Agent first, IP fallback).
  • Configuration Datagroups: dg_bot_agent and dg_bot_net provide bot identification and throttle settings.
  • Shared Rate Counter (table): A per-bot bucket that tracks request counts over a time window.
  • Decision Output: Either allow request through to the pool or return HTTP 429 with Retry-After.
  • Application Pool: Origin servers that only receive traffic allowed by the throttle policy.

Diagram flow (left-to-right):

  • Step 1: Client sends HTTP request to BIG-IP VIP.
  • Step 2: BIG-IP extracts User-Agent and client IP.
  • Step 3: User-Agent substring lookup is performed using class match -element -- <ua> contains dg_bot_agent.
  • Step 4: If Step 3 finds a match, the matched dg_bot_agent key becomes the canonical bot identity and its value provides <limit> <window>.
  • Step 5: If Step 3 does not match, BIG-IP checks client IP against dg_bot_net. If the IP matches a range, dg_bot_net returns a canonical bot identity.
  • Step 6: BIG-IP uses that canonical identity to lookup throttle values in dg_bot_agent. If no dg_bot_agent entry exists, the iRule exits and does not throttle.
  • Step 7: BIG-IP increments a shared counter in table using the canonical bot identity as the only key (no IP component). All IPs mapped to that bot share the same bucket.
  • Step 8: If the request count exceeds the configured limit within the configured window, BIG-IP returns HTTP 429 with a Retry-After header. Otherwise, the request is forwarded to the application pool.

Key design choice:

This architecture intentionally rate-limits by bot identity rather than by source IP. This is important for AI agents and modern crawlers because they frequently distribute traffic across many IP addresses. A per-IP limiter can be bypassed unintentionally or can fail to represent the true load being generated by the bot as a whole. A shared per-identity bucket enforces a realistic, policy-driven ceiling on aggregate bot traffic.

Code

# ------------------------------------------------------------------------------
# iRule: Bot Throttle via Data Groups
#
# Created by Tim Riker using AI for the DevCentral competition.
# Written entirely by ChatGPT.
#
# DESCRIPTION:
#   Throttles HTTP requests for known bots and AI agents based on configuration
#   stored in datagroups. User-Agent matching is attempted first. If no match
#   is found, client IP is evaluated against a network datagroup to determine
#   the bot identity.
#
# WHY THIS MATTERS:
#   Modern AI agents, crawlers, LLM training bots, search indexers, and
#   automation frameworks can generate extremely high request volumes.
#   Having a controlled throttling mechanism allows organizations to protect
#   infrastructure, manage costs, and preserve UX without blocking outright.
#
# IMPLEMENTATION NOTES:
#   • Throttling is performed per unique bot key (NOT per IP).
#   • All IPs mapped to the same bot share a single counter.
#   • Throttle values are configurable per bot in dg_bot_agent.
#
# REQUIRED DATAGROUP FORMATS
#
# dg_bot_agent (string):
#   Key: UA substring (and/or canonical bot name used by dg_bot_net values)
#   Value: "<limit> <window> [optional comment...]"
#          Only the first two whitespace tokens are used.
#
# dg_bot_net (address):
#   Key: IP/CIDR range
#   Value: MUST match a key in dg_bot_agent
# ------------------------------------------------------------------------------

when RULE_INIT {
    set static::bot_limit  3
    set static::bot_window 30
    set static::bot_log 0
    set static::bot_table "bot_throttle"
}

when HTTP_REQUEST {

    set ua  [string tolower [HTTP::header "User-Agent"]]
    set ip  [IP::client_addr]

    set dg_key ""
    set dg_value ""

    if { $ua ne "" } {
        set result [class match -element -- $ua contains dg_bot_agent]
        if { $result ne "" } {
            set dg_key   [lindex $result 0]
            set dg_value [lindex $result 1]
            if { $dg_value eq "" } {
                set dg_value [class lookup $dg_key dg_bot_agent]
            }
        }
    }

    if { $dg_key eq "" } {
        if { [class match $ip equals dg_bot_net] } {
            set net_val [class match -value $ip equals dg_bot_net]
            if { $net_val ne "" } {
                set dg_key   $net_val
                set dg_value [class lookup $dg_key dg_bot_agent]
            } else {
                return
            }
        } else {
            return
        }
    }

    if { $dg_key eq "" || $dg_value eq "" } {
        return
    }

    set vlimit ""
    set vwindow ""
    set tokens [regexp -inline -all {\S+} $dg_value]

    if { [llength $tokens] >= 1 } {
        set t1 [lindex $tokens 0]
        if { [string is integer -strict $t1] } { set vlimit $t1 }
    }
    if { [llength $tokens] >= 2 } {
        set t2 [lindex $tokens 1]
        if { [string is integer -strict $t2] } { set vwindow $t2 }
    }

    if { $vlimit ne "" } {
        set bot_limit $vlimit
    } else {
        set bot_limit $static::bot_limit
    }

    if { $vwindow ne "" } {
        set bot_window $vwindow
    } else {
        set bot_window $static::bot_window
    }

    set bot_key [string tolower [string trim $dg_key]]

    set count [table incr -subtable $static::bot_table $bot_key]
    if { $count == 1 } {
        table timeout -subtable $static::bot_table $bot_key $bot_window
    }

    if { $count > $bot_limit } {

        if { $static::bot_log } {
            log local0. "BOT_THROTTLED bot=$bot_key limit=$bot_limit window=$bot_window count=$count ip=$ip ua=\"$ua\""
        }

        HTTP::respond 429 content "Too Many Requests\r\n" \
            "Retry-After" $bot_window \
            "Connection" "close"

        return
    }
}
</window></limit>

 

Updated Mar 11, 2026
Version 2.0
No CommentsBe the first to comment