How HTTP/2 Compression works under the hood
Introduction
This is HTTP/1.1 and HTTP/2 requests side-by-side as seen on Wireshark:
At first glance, they look quite similar and HTTP/1.1 even simpler.
However, HTTP/2 entire header block occupied 37 bytes as opposed to 76 bytes from HTTP/1.1:
This is due to HTTP/2 compression and that's what we're going to explore in this article.
First I'll introduce the 3 methods HTTP/2 uses for compression and then I'll show you my lab set up. We'll then go through a real packet capture to understand how HTTP/2 compression (HPACK) works covering as much detail as reasonably possible.
Lab Test Scenario
For this test, I sent 3 consecutive requests (using same TCP connection) to download 3x identical 10 MB files, named first_req.img, second_req.img and third_req.img:
I have a fairly simple lab set up:
How HTTP/2 compression works
I'll use the first GET request above as the guinea pig here.
We all know that a character is typically 1 byte, right? Notice that the whole of :method: GET and :scheme: https are only 1 byte each:
That's because compression in HTTP/2 works by not sending a headers and values when possible.
HTTP/2 compresses headers using a static table, dynamic table and Huffman encoding.
We'll go through each of them now using examples.
Static Table
All client sends, instead of :method: GET, is a 1-byte index represented by a decimal number.
For example, the index that represents :method: GET is 2 as seen below:
HTTP/2 was implemented in such a way that when receiver reads Index 2, it immediately understands Index 2 means :method: GET.
The same applies to :scheme: https which is represented by Index 7.
Also note that on Wireshark, when we see Indexed Header Field, it means the whole header + value is represented by an Index.
That's how HTTP/2 achieves "compression" under the hood but there's more to it.
The mapping between headers or header + values is listed in a static table in Appendix A of RFC7541 and contains 61 mappings.
However, HTTP/2 also has a dynamic table to store values on the go and that's what we're going to see in action now.
Dynamic Table
Let's now pick :authority: 10.199.3.44.
This one is interesting because :authority: is in the static table but 10.199.3.44 (BIG-IP's HTTP/2 virtual server) isn't.
So, how does HTTP/2 solve this problem?
Because :authority: (header's name) is present in static table, it indexes it anyway using Index 1.
The value (10.199.3.44) is obviously not in static table but BIG-IP assigns a dynamic Index value from Index 62 onwards to the whole ":authority: 10.199.3.44" name + value (remember static table has only 61 indexes!):
How do we know BIG-IP assigned such value?
Because of "Incremental Indexing" keyword.
Also note that in this first request, :authority: 10.199.3.44 eats up 10 bytes (1 byte for :authority and 9 bytes for 10.199.3.44)!
In the next request, we not only see that the whole :authority: 10.199.3.44 is now using a unique Index (63) but it's only eating up 1 byte this time:
Note: The reason why :authority: 10199.3.44 wasn't assigned Index 62 is just because accept: */* used it first. Normally, the first value uses Index 62, second Index 63 and so on.
Impressive, isn't it?
This is dynamic table in action.
Setting HTTP/2 Dynamic table size on BIG-IP
On BIG-IP, the default value for the Dynamic table size is 4096 bytes and such value is configurable via GUI:
Or tmsh:
I'm now quoting the article I created for AskF5 Overview of the BIG-IP HTTP/2 profile to expand on what header table size is:
"Specifies the maximum table size, in bytes, for the dynamic table of HTTP/2 header compression. The default value is 4096.
Note: The HTTP/2 protocol compresses HTTP headers to save bandwidth and uses a static table with predefined values and a dynamic table with values that are likely to be reused again in the same HTTP/2 connection. The Header Table Size limits the number of entries of HTTP/2 dynamic table as described in Section 4.2 of RFC 7541. When the limit is reached, old entries are evicted so that new entries are added."
Huffman coding
The values that are not compressed using static/dynamic table are still not directly sent in plain text. There is a best effort compression method using Huffman encoding that achieves around 20-30% improvement over plain-text. Remember the dynamic table where in the first request :authority: header name was compressed using static table (Index 1) but its value (10.199.3.44) wasn't? 10.199,.3.44 wasn't sent in plain text either! That's right. It was encoded using Huffman code from Appendix B in RFC7541.
Appendix - Are there values that are not added to Dynamic table?
Yes, implementations may decide not to add certain values to protect sensitive header fields.
In our lab test above, we can see that :path: is indexed, but its value is not AND not added to dynamic table:
If /first_req.img had been added to dynamic table, Wireshark's Representation field would be Literal Header Field with Incremental Index rather than Literal Header Field without Indexing.
The other question I often get asked is about Name Length and Value Length fields. More specifically, why do they differ from the actual value sent on the Wire?
Name Length and Value Length are just the size in bytes of the decompressed field if it was sent in plain-text.
For example, :path has 5 characters and as character's size is 1-byte, decompressed :path = 5 bytes.
The same goes for /first_req.img (14 characters = 14 bytes).
However, in reality, HTTP/2 client only Index 4 (which is 1-byte long) is sent to represent :path and Huffman code is used to decrease the size of /first_req.img to 11 bytes instead of 14 bytes. That's about 21% reduction in size when compared to plain-text.