The HTTP/2 CONTINUATION frame attack

There has been a bit of news lately – and more than a couple of advisories – about a new HTTP/2 Denial of Service (DoS) attack using HTTP/2 CONTINUATION frames. You might have seen it called VU#421644, the “HTTP/2 CONTINUATION frame attack” or a number of CVEs (which we’ll discuss later), and I thought it would be worthwhile to write a little about the attack itself, what a VU number is, and why there are so many CVEs for a single “issue”.

First, a quick recap on the main difference between HTTP/2 and HTTP/1 before we get into the specifics of this class of attack:

HTTP/1 (1.1, 1.0 and 0.9) is a clear-text protocol. You will commonly find it wrapped within TLS, of course, for security reasons, but if you decrypt the TLS connection you’ll get right back to the plain text contents. We’re probably all quite used to seeing HTTP represented in that way because we’ve been dealing with it for 30 years or so, and a request might look something like this:

GET /page.html HTTP/1.1
Host: www.example.com
Connection: close

Each of those human-readable words has some meaning to the webserver, and the webserver has to parse them out of the text and into machine-readable elements to take action on. For example, GET is the method (GET information, POST data to the server, etc), /page.html is the URI (the page we are requesting) and finally HTTP/1.1 tells the server we are speaking HTTP/1.1.

The second line gives the server more information about the client’s request in the form of an HTTP header with the name Host and the value of the specific website we are looking for.

The third line gives the server yet more information in the form of a header named Connection with a value of Close, which tells the server to close the connection once this request has been satisfied.

The fact that the server has to parse this human-readable text, one character at a time, into machine-readable elements is what allows attacks like HTTP Request Smuggling to take place. Machines need explicit instructions to understand where one header ends and another begins. Where the header name ends and the value begins and so on – those are all defined in RFCs, but RFCs are huge, complex documents and that means not everyone implements them exactly the same.

OK, so we’ve talked about HTTP/1 a lot, what about HTTP/2? Just more of the same, right? Actually, no. While HTTP/1 is a text protocol, HTTP/2 is a binary protocol. It is, at the core, already machine-readable information, which means parsing the incoming request is much easier and far less prone to error (but not impervious to them), and allows for much more complex handling of multiple requests and responses within a single TCP connection (unlike HTTP/1 which only supports simple pipelining or one-at-a-time request and response pairs).

With HTTP/2 what we have is a single TCP connection, which can contain one or more HTTP/2 streams. Each of those streams will handle one or more messages (a complete request or response) and each of those messages will be made up of one or more frames which contain a specific piece or pieces of information like headers or message body. If you ignore the lower layers (TCP, TLS etc) then the communication flow looks something like this (adapted from https://web.dev/articles/performance-http2):

The last thing to know is that there are more frame types than just HEADERS and DATA; there’s a whole list (https://webconcepts.info/concepts/http2-frame-type/) which includes things like RST_STREAM and GOAWAY (to control the streams within the TCP connection), SETTINGS to specify things the client and server need to negotiate separate from the HTTP streams (like how many streams to support), WINDOW_UPDATE and CONTINUATION which implement forms of flow control. CONTINUATION, specifically, means “the previously sent header block continues” and the HTTP/2 RFC says you can send any number of CONTINUATION frames as long as the previous frame within this stream was HEADERS, PUSH_PROMISE, or another CONTINUATION frame without the END_HEADERS flag.

Anatomy of the attack

That was a lot of theory! So let’s talk about the attack technique Bartek Nowotarski discovered and disclosed via CERT/CC’s VINCE system:

What Bartek noted is that many HTTP/2 implementations allow a malicious client to send an unlimited number of CONTINUATION frames (which is actually, per the original RFC: https://datatracker.ietf.org/doc/html/rfc7540#section-6.10) and depending on the implementation, this can cause the server to exhaust the available memory in processing nothing but these (very small) frames.

Additionally, the attacker could send HPACK (encoded) CONTINUATION frames, causing the server to exhaust available CPU decoding each received frame to append it to the existing stream in memory.

This is all described in more detail in Vulnerability Note VU#421644: https://kb.cert.org/vuls/id/421644

Now, you might be asking why correctly following the RFC (which specifies that a client can send any number of CONTINUATION frames, remember) constitutes a vulnerability? That is simply because doing so may, depending on implementation specifics, allow an attacker to crash the target service and/or deny service to other users of that service, and regardless of RFC, that is an undesirable state which constitutes a vulnerability.

You could argue that this vulnerability is inherent in the RFC and I would suggest you are right, and that perhaps the RFC should have suggested limiting the number of CONTINUATION frames and rather than simply saying “Any number of ..” the RFC should have explicitly noted that HTTP/2 implementations are free to implement a limit but that they “SHOULD accept a reasonable number of CONTINUATION frames” – but instead it simply says the following, and here we are:

Any number of CONTINUATION frames can be sent, as long as the preceding frame is on the same stream and is a HEADERSPUSH_PROMISE, or CONTINUATION frame without the END_HEADERS flag set.

Why so many CVEs for one VU?

Think of it this way: In this instance, the VU# describes the method of attack while the CVE numbers describe the impact on specific vendor products.

For example, VU#421644 describes the fact that HTTP/2 CONTINUATION frames can be used to launch a DoS attack and the theoretical mechanism you could employ to do so, while CVE-2024-27983 describes the specific impact to Node.js should an attacker perform the attack against a vulnerable Node.js HTTP/2 server.

In other words, the VU# is methodology-specific while the CVE# is implementation-specific.

Note that many, many vendors (including F5 for both BIG-IP and NGINX products) were contacted and have provided their responses to CERT and only vendors whose products are vulnerable will assign CVE#s. When a vendor has only non-vulnerable products they will only refer to the VU# as a reference (which is the case for F5 at the time of writing).

BIG-IP

Finally, we can get to some good news:

  1. A BIG-IP (or any flavor) configured with an HTTP/2 Virtual Server (BIG-IP) or HTTP/2 Application Service (BIG-IP Next) is not vulnerable to this attack technique.
  2. A BIG-IP or BIG-IP Next instance, terminating HTTP/2 (e.g., with the HTTP/2 profile on BIG-IP LTM) will effectively protect a vulnerable back-end server against this attack.

Now there are some small caveats still under investigation; BIG-IP iRules and iAppsLX use Node.js and so it might be possible to create a vulnerable service if you roll your own HTTP/2 server in iRulesLX, for example, but the core product functionality of a Virtual Server or Application Service is not affected.

Any impact through third party components will be disclosed via Security Advisories on MyF5; right now the following are published:

K000139236: Apache Traffic Server HTTP/2 CONTINUATION DoS attack vulnerability CVE-2024-31309
K000139229: Tempesta vulnerability CVE-2024-2758
K000139228: Envoy vulnerability CVE-2024-27919
K000139227: amphp/http vulnerability CVE-2024-2653
K000139225: nghttp2 vulnerability CVE-2024-28182
K000139214: Apache httpd vulnerability CVE-2024-27316
K000139532: Node.js vulnerability CVE-2024-27983

CVE-2024-45288 (Go net/http & net/http2) is still under investigation and will be published as soon as possible; but again, core functionality (provided by F5’s Traffic Management Microkernel, or TMM) will not be affected.

 

Now, the reason BIG-IP itself is not impacted is that we do actually have limits provided in the configuration, despite what the HTTP/2 RFC suggests; for example, by default we only allow 32Kb of HTTP headers (note this is defined in the HTTP profile, not the HTTP/2 profile) and if that size is exceeded we send a GOAWAY and reset the HTTP/2 stream. The BIG-IP also doesn’t do any further processing on the HTTP/2 request until the complete request has been received, which means that (thanks to this and delayed binding) no server-side connection will be initiated, and we won’t send any data to the pool member, thus protecting any vulnerable server-side resources.

Of course, a malicious client could simply open a large number of TCP connections, but that is not the technique described by VU#421644 – and here, the Layer 4 DOS protections available in AFM, or the SYNCookie protections available in LTM, should be configured appropriately to protect against a ‘regular’ DoS attack.

NGINX

Thanks to its robust HTTP/2 implementation, NGINX is similarly unaffected by this attack technique; though it is possible to consume some server resources, as long as the system is sized and configured appropriately, it should continue to serve legitimate clients despite any attack traffic.

We recommend investigating the following configuration elements and ensuring they are set appropriately for the traffic levels you need to service in your environment:

Syntax:

worker_rlimit_nofile number;

Default:

Context:

main

Changes the limit on the maximum number of open files (RLIMIT_NOFILE) for worker processes. Used to increase the limit without restarting the main process.

Syntax:

worker_connections number;

Default:

worker_connections 512;

Context:

events

Sets the maximum number of simultaneous connections that can be opened by a worker process.

It should be kept in mind that this number includes all connections (e.g. connections with proxied servers, among others), not only connections with clients. Another consideration is that the actual number of simultaneous connections cannot exceed the current limit on the maximum number of open files, which can be changed by worker_rlimit_nofile.

Syntax:

keepalive_timeout timeout [header_timeout];

Default:

keepalive_timeout 75s;

Context:

http, server, location

The first parameter sets a timeout during which a keep-alive client connection will stay open on the server side. The zero value disables keep-alive client connections. The optional second parameter sets a value in the “Keep-Alive: timeout=time” response header field. Two parameters may differ.

The “Keep-Alive: timeout=time” header field is recognized by Mozilla and Konqueror. MSIE closes keep-alive connections by itself in about 60 seconds.

Syntax:

client_header_timeout time;

Default:

client_header_timeout 60s;

Context:

http, server

Defines a timeout for reading client request header. If a client does not transmit the entire header within this time, the request is terminated with the 408 (Request Time-out) error.

Syntax:

worker_processes number | auto;

Default:

worker_processes 1;

Context:

main

Defines the number of worker processes.

The optimal value depends on many factors including (but not limited to) the number of CPU cores, the number of hard disk drives that store data, and load pattern. When one is in doubt, setting it to the number of available CPU cores would be a good start (the value “auto” will try to autodetect it).

 

client_header_timeout is set to 60s by default, which allows any client (legitimate or malicious) 60 seconds in which they must finish sending headers; this means that any single attack stream can only send CONTINUATION frames for 60s. Reducing this number means the attacker must cycle through more HTTP/2 streams for the same impact, making any attack more expensive and less effective. Unless you are dealing with particularly slow clients, reducing this number from 60 to 10 is likely to have no negative impact while reducing the load any attacker using this technique (and other, similar, techniques) can have.

Similarly, keepalive_timeout ensures that clients must send traffic at least every N seconds in order to prevent the connection from being reset. Reducing this number, unless you have particularly slow or unusual clients, will further reduce the possibility of this or similar attacks having an impact.

Meanwhile, worker_connections and worker_rlimit_nofile should both be tuned to suit the maximum capacity of your NGINX server, which should be sized appropriately for the traffic you expect to serve.

 

If you have any questions, feel free to leave a comment, and I will do my best to answer and keep this article updated should any new information become available!

Updated May 31, 2024
Version 3.0
No CommentsBe the first to comment