Forum Discussion

randyrue's avatar
randyrue
Icon for Nimbostratus rankNimbostratus
Mar 03, 2022

robots.txt rule?

Hello,

Bear with me if this has been answered elsewhere, I've found threads that seem similar but nothing exactly like what I'm trying to do and none of the related examples seem to work.

I'm trying to present a robots.txt file in front of a VIP using an uploaded file and a simple iRule. The VIP in question is actually just a group of other iRules with no actual root directory on a server that I could otherwise drop this file into. In fact I'd like to use this robots.txt file in front of other web services whether or not the VIP points at an actual root directory, and whether or not it has other iRules in place.

It seems like this should be simple:

when HTTP_REQUEST {
  if { [HTTP::uri] == "/robots.txt" } {
    HTTP::respond 200 content [ifile get robots.txt]
  }
}

And the above works, if it's the only iRule on the VIP. If I point a browser at https://host.foo.org I get the real server behind the F5 and if I point it at https://host.foo.org/robots.txt I get the contents of the robots.txt file.

But if I add it to a VIP with other iRules, while the later iRules work, the first rule in the list fails. That is, if I add it with a second rule like:

when HTTP_REQUEST {
  if { [HTTP::uri] == "/gopher.jpg" } {
    HTTP::respond 200 content [ifile get gopher.jpg]
    }
}

and then try my browser at https://host/.foo.org/ I get the real server behind the F5. If I point it at https://host.foo.org/gopher.jpg I get the gopher. But if I point it at https://host.foo.org/robots.txt I get ERR_CONNECTION_RESET

I'd like to do this with one rule that sits on multiple VIPs whether or not it's added to other iRules. It seems like it should be simple. What am I missing?

Thanks,

Randy in Seattle

3 Replies

  • Hi randyrue

    If you have more than one iRule on a VIP dealing with HTTP::respond it is important to manage the execution using various commands. If the HTTP::respond fires then there are a limited amount of "HTTP" commands that you can use, otherwise you will recieve a TCL error. One of the methods of guarding against that is to check the HTTP::has_responded command in your code: https://clouddocs.f5.com/api/irules/HTTP__has_responded.html

    You could check your LTM logs for messages like this:
    01220001:3: TCL error: /Common/<iRule_name> <HTTP_REQUEST> - ERR_NOT_SUPPORTED (line 1)     invoked from within "HTTP::host"

    A good breakdown of the issue and solutions is here: https://support.f5.com/csp/article/K23237429 

    In a nutshell, adding the code "if { [HTTP::has_responded] } { return };" to the beginning of your iRules after the when HTTP_REQUEST command should help.

    Best,

    Josh

  • Not my ideal solution as it requires me to modify any iRule that my robots.txt rule will be in front of. That is, the robots.txt rule remains

    when HTTP_REQUEST {
      if { [HTTP::uri] == "/robots.txt" } {
        HTTP::respond 200 content [ifile get robots.txt]
      }
    }

    and my second rule (in this case the gopher rule) becomes:

    when HTTP_REQUEST {
      if { [HTTP::has_responded] } { return }
      if { [HTTP::uri] == "/gopher.jpg" } {
        HTTP::respond 200 content [ifile get gopher.jpg]
      }
    }

    Or is there a way to do this entirely in the robots rule?

    • Well the simple answer might be that you would just put the "if { [HTTP::has_responded] } { return }" text into the robots iRule and just put that rule after any other rules, assuming the other rules wouldn't prevent the execution of the robots iRule. But in some cases you may need to edit the others to include that statement.