iRules LX Sideband Connection - Handling timeouts

Introduction

You often find external information necessary for traffic handling. It could be additional user information for authentication. It could be a white list to determine if the source IP addresses are legitimate. It could be whois data from Regional Internet Registries. Such information can be locally stored on the BIG-IP (e.g., datagroup), however, it is often easier to obtain them from elsewhere by accessing external entities such as a web server or database. This type of implementation is called a sideband connection.

On BIG-IP, a sideband connection can be implemented using iRules or iRules LX. The former has been there since BIG-IP Version 11, so you might have tried it before (see also Advanced iRules: Sideband Connections). The latter became available from v12, and is gaining popularity in recent days, thanks to its easy-to-use built-in modules.

In the simplest form, only thing you need in an iRules LX implementation is a single call to the external entity: For example, an HTTP call to the web server using the Node.js http module (see also iRules LX Sideband Connection). It does work most of the time, but you would like to add additional robustness to cater for corner cases. Otherwise, your production traffic will be disrupted.

ILX::call timeout

One of the most common errors in iRules LX sideband connection implementations is timeout. It occurs when the sideband connection does not complete in time.

An iRules LX RPC implementation (plugin) consists of two types of codes: iRule (TCL) and extension (Node.js) as depicted below. The iRule handles events and data of traffic just like usual iRules. When external data is required, the iRule sends a request to the extension over the separate channel called MPI (Message Passing Interface) using ILX::call, and waits for the response. The extension establishes the connection to the external server, waits for the response, processes the response if required, and sends it back to the iRule.

When the extension cannot complete the sideband task within a predefined period, ILX::call times out. The default timeout is 3s. The following list shows a number of common reasons why ILX::call times out:

  • High latency to the external server
  • Large response data
  • Heavy computation (e.g., cryptographic operations)
  • High system load which makes the extension execution slow

In any case, when timeout occurs, ILX::call raises an error: e.g.,

Aug 31 07:15:00 bigip err tmm1[24040]: 01220001:3: TCL error: /Common/Plugin/rule <HTTP_REQUEST> - ILX timeout.     invoked from within "ILX::call $rpc_handle function [IP::client_addr]"

You may consider increasing the timeout to give time for the extension to complete the task. You can do so from the -timeout option of ILX::call. This may work, but not ideal for some cases.

Firstly, a longer timeout means longer user wait time. If you set the timeout to 20s, the end user may need to wait up to 20s to see the initial response, which is typically just an error.

Secondly, even after the timeout of iRule, the extension keeps running up to the end. Upon completion, the extension sends back the data requested, but it is ignored because the iRule is no longer waiting for it. It is not only unnecessary but also consumes resources. Typically, the default timeout of the Node.js HTTP get request on a typical Unix box is 120s. Under a burst of incoming connections, the resources for the sideband connection would accumulate and stay for 120s, which is not negligible.

To avoid unnecessary delay and resource misuse, it is recommended to set an appropriate timeout on the extension side and return gracefully to the iRule. If you are using the Node.js bundled http module, it can be done by its setTimeout function. If the modules or classes that your sideband connection utilizes do not have a ready-made timeout mechanism, you can use the generic setTimeout function.

Sample HTTP sideband connection implementation with timeout handling

The specification of a sample HTTP sideband connection with timeout handling on the extension side are as follows:

  • iRule sends its timeout value to the extension.
  • Extension sets its timeout to the sideband server. In here, it is set to the iRule's timeout minus 300 ms. For example, if the iRule's timeout is 3s, it is set to 2,700 ms. The 300 is empirical. You may want to try different values to come up with the most appropriate one for your environment.
  • Extension catches any error on the sideband connection and reply error to the iRule. Note that you need to send only one return message for each ILX::call irrespective of a number of events it catches.

Let's move on to a sample implementation.

iRule (TCL) code

     1 when HTTP_REQUEST {
     2    set timeout 3000                 ;# Default
     3    set start [clock seconds]
     4
     5    set RPC_HANDLE [ILX::init "SidebandPlugin" "SidebandExt"]
     6    set response [ILX::call $RPC_HANDLE -timeout $timeout func $timeout]
     7    set duration [expr [clock seconds] - $start]
     8    if { $response == "timeout" } {
     9      HTTP::respond 500 content "Internal server error: Backend server did not respond within $timeout ms."
    10      return
    11    } elseif { $response == "error" || $response == "abort" } {
    12      HTTP::respond 500 content "Internal server error: Backend server was not available."
    13      return
    14    }
    15    else {
    16      log local0. "Response in ${duration}s: $response"
    17    }
    18  }

The iRule kicks off when the HTTP_REQUEST event is raised. It then sends a request to the extension (ILX::call). If an error occurs, it processes accordingly. Otherwise, it just logs the duration of the extension execution.

  • Line #6 sets the timeout of ILX::call to 3s (3,000 ms), which is the default value (change Line #2 if necessary). The timeout value is also sent to the extension (the last argument of the ILX::call).
  • Lines #8 to #9 capture the 'timeout' string that the extension returns when it times out (after 3000 - 300 = 2,700 ms). In this case, it returns the 500 response back to the client.
  • Lines #11 to #12capture the 'error' or 'response' string that the extension returns. It returns the 500 response too.
  • Line #16 only reports the time it took for the sideband processing to complete. Change this line to whatever you want to perform.

Extension (Node.js) code

     1  const http = require("http");
     2  const f5 = require("f5-nodejs");
     3
     4  function httpRequest (req, res) {
     5    let tclTimeout = (req.params() || [3000])[0];
     6    let thisTimeout = tclTimeout - 300;              // Exit 300ms prior to the TCL timeout
     7    let start = (new Date()).getTime();
     8
     9    let request = http.get('http://192.168.184.10:8080/', function(response) {
    10      let data = [];
    11      response.on('data', function(chunk) {
    12        data.push(chunk);
    13      });
    14      response.on('end', function() {
    15        console.log(`Sideband success: ${(new Date()).getTime() - start} ms.`);
    16        res.reply(data.join('').toString());
    17        return;
    18      });
    19    });
    20
    21    // Something might have happened to the sideband connectin
    22    let events = ['error', 'abort', 'timeout'];   // Possible erroneous events
    23    let resSent = false;                          // Send reply only once
    24    for(let e of events) {
    25      request.on(e, function(err) {
    26        let eMessage = err ? err.toString() : '';
    27        console.log(`Sideband ${e}: ${(new Date()).getTime() - start} ms. ${eMessage}`);
    28        if (!resSent)
    29          res.reply(e);                           // Send just once.
    30        resSent = true;
    31        if (! request.aborted)
    32          request.abort();
    33        return;
    34      });
    35    }
    36    request.setTimeout(thisTimeout);
    37    request.end();
    38  }
    39
    40  var ilx = new f5.ILXServer();
    41  ilx.addMethod('func', httpRequest);
    42  ilx.listen();

The overall composition is same as a boilerplate iRules LX extension code: Require necessary modules (Lines #1 and #2), create a method for processing the request from the iRule (Lines #4 to #38), and register it to the event listener (Line #41). The differences are:

  • Lines #5 and #6 are for determining the timeout on the sideband connection. As shown in the iRule code, the iRule sends its ILX::call timeout value to the extension. The code here receives it and subtracts 300 ms from there.
  • Line #5 looks unnecessarily complex because it checks if it has received the timeout properly. If it is not present in the request, it falls back to the default 3,000 ms. It is advisable to check the data received before accessing it. Accessing an array (req.params() returns an array) element that does not exist raises an error, and subsequently kills the Node.js process (unless the error is caught).
  • Line #9 sends an HTTP request to the sideband server. Lines #10 to #19 are usual routines for receiving the response. The message in Line #15 is for monitoring the latency between the BIG-IP and the sideband server. You may want to adjust the timeout value depending on statistics gathered from the messages. Obviously, you may want to remove this line for performance sake (disk I/O is not exactly cheap).
  • Lines #21 to #35 are for handling the error events: error, abort and timeout. Line #28 sends the reply to the iRule only once (if statement is necessary as multiple events may be raised from a single error incident). Line #32 aborts the sideband connection for any of the error events to make sure the connection resources are freed.
  • Line #36 sets the timeout on the sideband connection: The iRule timeout value minus 300 ms.

Conclusion

A sideband connection implementation is one of the most popular iRules LX use-cases. In the simplest form, the implementation is as easy as a few lines of HTTP request handling, however, the iRule side sometimes times out due to the latency on the extension side and the traffic is subsequently disturbed. It is important to control the timeout and handle the errors properly in the extension code.

References

Published Oct 31, 2019
Version 1.0
  • What is the purpose of line 6? I don't see any reference to it in the code. Should that have been in line 36?

  • - thanks for the tip. I contacted and he made an update just now to line 36. Does this make sense now?

  • Good spotting. I forgot to use the "thisTimeout" variable defined in line #6 in line #36. Code fixed. Thank you.

  • Have you done any testing with request.setSocketKeepAlive and its effect? I'm looking at using RPC for some fairly high volume application.

     

  • I believe request.setSocketKeepAlive is for TCP keep-alive, as it calls socket.setKeepAlive. Not for http's "Connection: Keep-Alive". For reusing the established HTTP socket, you can use the http.Agent class and set the maxSocket to 1. If you want to retrieve multiple pages concurrently via multiple sockets, use the default Infinity.

  • Because I spent way too much time on this... I feel like lines 28 and 31 should be considered an anti-pattern because simply adding new code in those blocks doesn't actually work until your also add some curly braces. Darn JavaScript.

  • This example code has been VERY helpful. However, I have run into a performance issue where sometimes an HTTP request would not leave the node runtime nor arrive at the backend service. I would get socket timeout errors. I don't know if this is due to the currently supported versions of node or some other condition that we have not accounted for. I was able to solve this issue by using the request-promise NPM package. However, this has a nasty side-effect of a very large dependency tree. HTTP requests have been rock solid since switching from this raw http module to something more robust.