What is HTTP Part III - Terminology

Have you watched the construction of a big building over time? For the first few weeks, the footers and foundation are being prepared to support the building. it seems like not much is happening, but that ground work is vital to the overall success of the project. So it is with this series, so stay with me—the foundation is important. This week, we begin to dig into the HTTP specifications and we’ll start with defining the related terminology.

World Wide Web, or WWW, or simply “the web" - The collection of resources accessible amongst the global interconnected system of computers.
Resource - An object or service identified by a URI. A Web page is a resource, but images, scripts, and stylesheets for a webpage would also be resources.
Web Page - A document accessible on the web by way of a URI. Example - this article.
Web Site - A collection of web pages. An example would be this site, accessed by the DNS hostname devcentral.f5.com.
Web Client - The software application that requests the resources on a web site by generating, receiving, and processing HTTP messages. A web client is always the initiator.
Web Server - The software application that serves the resources a web site by receiving, processing, and generating HTTP messages. A web server does not initiate traffic to a client. Examples would be Apache, NGINX, IIS, or even an F5 BIG-IP iRule!
Uniform Resource Identifier, or URI - As made clear in the name, the URI is an identifier, which can mean the resource name, or location, or both. All URLs (locators) and URNs (names) are URIs, but all URLs/URNs are not URIs. Think Venn diagrams here. Consider https://clouddocs.f5.com/api/irules/. The key difference between a URI (which would be /wiki/iRules.HomePage.ashx in this case) and a URL is the URL combines the name of the resource, the location of that resource (devcentral.f5.com,) and the method to access that resource (https://.) In the URL, the port is assumed to be 80 if the request method is http, and assumed to be 443 if the request method is https. If the default ports are not appropriate for a particular location, they will need to be provided by adding immediately after the hostname like https://devcentral.f5.com/s:50443/wiki/iRules.HomePage.ashx.
Message - This is the basic HTTP unit of communication
Header - This is the control section within an HTTP message
Entity - This is the body of an HTTP message
User Agent - This is a string of tokens that should be inserted by a web client as a header called User-Agent. The tokens are listed in order of significance. Most web clients perform this on your behalf, but there are browser tools that allow you to manipulate this, which can be good for testing, client statistics, or even client remediation. Keep in mind that bad actors might also manipulate this for nefarious purposes, so any kind of access control based solely on user agents is ill advised.
Proxy - Like in the business world, a proxy acts as an intermediary. A server to clients, and a client to servers, proxies must understand HTTP messages. We’ll dig deeper into proxies in the next article.
Cache - This is web resource storage that can exist at the server, any number of intermediary proxies, the browser. Or all of the above. The goals of caching are to reduce bandwidth consumption on the networks, reduce compute resource utilization on the servers, and reduce page load latency on the clients.
Cookie - Originally added for managing state (since HTTP itself is a stateless protocol,) a cookie is a small piece of data to be stored by the web client as instructed by the web server.
Standards Group Language - I won’t dive deep into this, but as you learn a protocol, knowing how to read RFCs would serve you well. As new protocols, or new versions of existing protocols, are released, there are interpretation challenges that companies work through to make sure everyone is adhering to the “standard.” Sometimes this is an agreeable process, other times not so much. Basic understanding of what must, should, or may be done in a request/response can make your troubleshooting efforts go much more smoothly.

Basic Message Format

Requests

The syntax of an HTTP request message has the following format

request-line
headers
CRLF (carriage return / line feed)
message body (optional)

An example of this is shown below:

 -----Request Line----
|GET / HTTP/1.1
 ---------------------
 -------Headers-------
|Host: roboraiders.net
|Connection: keep-alive
|User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
|Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
|Accept-Encoding: gzip, deflate
|Accept-Language: en-US,en;q=0.8
 ----------------------

The CRLF (present in the tcpdump capture but not seen here) denotes the end of the headers, and there is no body. Notice in the request line, you see the request method, the URI, and the http protocol version of the client.

Responses

The syntax of an HTTP response message has a very similar format:

status-line
headers
CRLF (carriage return / line feed)
message body (optional)

An example of this is shown below:

 -----Request Line----
|HTTP/1.1 200 OK
 ---------------------
 -------Headers-------
|Server: openresty/1.9.15.1
|Date: Thu, 21 Sep 2017 17:19:01 GMT
|Content-Type: text/html; charset=utf-8
|Transfer-Encoding: chunked
|Connection: keep-alive
|Vary: Accept-Encoding
|Content-Encoding: gzip
 ---------------------
 ---Zipped Content----
|866
|...........Y.v.....S...;..$7IsH...7.I..I9:..Y.C`....].b....'.7....eEQoZ..~vf.....x..o?^....|[Q.R/..|.ZV.".*..$......$/EZH../..n.._E..W^..
 ---------------------

Like the request line of an HTTP request, the protocol of the server is stated in the HTTP response status-line. Also stated is the response code, in this case a 200 ok. You’ll notice that the only header in this case that is similar between the request and response is the Connection header.

HTTP Request Methods

The URI is a resource with which the client is wanting to interact. The request method provides the “how” the client would like to interact with the resource. There are many request methods, but we’ll focus on the few most popular methods for this article.

GET - This method is used by a client to retrieve resources from a server.
HEAD - Like the GET method, but only retrieves the metadata that would be returned with the payload of a GET, but no payload is returned. This is useful in monitoring and troubleshooting.
POST - Used primarily to upload data from a client to a server, by means of creating or updating an object through a process handler on the server. Due to security concerns, there are usually limitations on who can do this, how it’s done, and how big an update will be allowed.
PUT - Typically used to replace the contents of a resource.
DELETE - This method removes a resource.
PATCH - Used to modify but not replace the contents of a resource.

If you are at all interested in using iControl REST to perform automation on your BIG-IP infrastructure, all the methods above except HEAD are instrumental in working with the REST interface.

HTTP Headers

There are general headers that can apply to both requests and responses, and then there are specific headers based on whether it is a request OR a response message. Note that there are differences between supported HTTP/1.0 and HTTP/1.1 headers, but I’ll leave the nuances to the reader to study. We’ll cover HTTP/2 at the end of this series. Examples in parentheses are not complete, and do not denote the actual header names. RFC 2616 has all the HTTP/1.1 define headers documented.

General Headers

These headers can be present in either requests or responses. Conceptually, they deal with the broader issues of client/server sessions like timing, caching, and connection management. The Connection header in the request and response messages above is an example of a general connection-management header in action.

Request Headers

These headers are for requests only, and are utilized to inform the server (and intermediaries) on preferred response behavior (acceptable encodings,) constraints for the server (range of content or host definition,) conditional requests (resource modification timestamps,) and client profile (user agent, authorization.) The Host header in the request message above is an example of a request header constraining the server to that identity.

Response Headers

Like with requests, response headers are for responses only, and are utilized for security(authentication challenges,) caching (timing and validation,) information sharing (identification,) and redirection.The Server header in the response message above is an example of a response header identifying the server.

Entity Headers

Entity headers exist not to provide request or response messaging context, but to provide specific insight about the body or payload of the message. The Content-Type header in the response message above for example is instructing the client that the payload of the response is just text and should be rendered as such.

A Note on MIME types - web clients/servers are for the most part "dumb" in that they are do not guess at content types based on analysis, they follow the instructions in the message via the Content-Type header. I've experienced this in both directions. For iControl REST development, BIG-IP returns an error if you send json payload but do not specify application/json in the Content-Type header. I also had a bug in an ASM deployment once where the ASM violation response page was set to text/html but should have been application/json, so the browser client never displayed the error, you had to find it buried in browser tools until we corrected that issue.

HTTP Response Status Codes

We will conclude with a brief discussion on status codes. Before getting into the specifics, there are a couple general things that should make awareness and analysis a regular part of your system management: security and SEO. On security, there are many things one can learn through status codes (and headers for that matter) on server patterns and behavior, as well as information leakage by not slurping application errors before being returned to clients. With SEO, how redirects and missing files are handled can hurt or help your overall impressions and ranking power. Moz has a good best practices article on managing status codes for search engines.

But back to the point and hand: status codes. There are five categories and 41 status codes recognized in HTTP/1.1.

Informational - 1xx - This category added for HTTP/1.1. Used to inform clients that a request has been received and the initial request (likely a POST of data) can continue
Success - 2xx - Used to inform the client that the request was processed successfully.
Redirection - 3xx - The request was received but resource needs to be dealt with in a different way.
Client Error - 4xx - Something went wrong on the client side (bad resource, bad authentication, etc.)
Server Error -5xx - Something went wrong on the server side.

Application monitors pay particularly close attention to the 5xx errors. Security practitioners focus in on 4xx/5xx errors, but even 2xx/3xx messages if baseline volume and accessed resources start to skew from normal. Join us next week when we start to talk about clients, proxies, and servers, oh my!

Published Sep 21, 2017

Version 1.0