Bug or not? VE LTM resets client connection on consecutive calls to different nodes with same IP
Hi,
Not quite sure if this is a bug I should ask for technical assistance from F5 on, I'd like to get your opinion first.
Problem: LTM erraneously(?) resets client connections in a specific scenario, and logs the following related to the https server-side connection. More details on the error at the bottom.
Nov 1 00.00.00 HOSTNAME.tine.no warning tmm1[12067]: 01260013:4: SSL Handshake failed for TCP server-ip:80 -> snat-pool-ip:60225
Infrastructure: HA cluster of BIG-IP VE LTM version 14.1.4.6.
VS client-side: Type:Standard, HTTPS, HTTP Compression enabled
VS server-side: HTTP, SNAT:enabled
Pools: FQDN nodes only, no request queueing, only "http" monitors used.
The iRule separates calls by the URI, and for various reasons the two pools used (pool-A and pool-B) for different calls both contain FQDN members that have the same IP address but different FQDN names.
The iRule looks like this:
when HTTP_REQUEST {
switch -glob -- [string tolower [HTTP::host]] {
"FQDN-used-by-client.internal" {
SSL::disable serverside
switch -glob -- [string tolower [HTTP::uri]] {
"/api/*" {
HTTP::header replace "Host" "backend-fqdn-A.internal"
pool pool-A
return
}
default {
HTTP::header replace "Host" "backend-fqdn-B.internal"
pool pool-B
return
}
}
}
}
When the client calls https://fqdn-used-by-client.internal/foo, the traffic is sent to pool-B (http).
When the URI starts with /api/* the traffic is sent to pool-A (http).
The problem is that LTM sends a RST back to the client after receiving a request when we do the following:
1. First, the client requests https://fqdn-used-by-client.internal/foo 1-3 times. All is well so far, the traffic is SNAT-ed and sent to pool-A without problems.
2. Then, after one second, the client requests https://fqdn-used-by-client.internal/api/bar. This causes the irule to reset the client connection and log the "SSL Handshake failed" message given at the top.
3. If the client only calls /api/bar in its connection, the call succeeds.
4. If we change the server side to HTTPS and use different serverssl profiles to force the correct SNI headers for the two destinations, the problem disappears.
As we use "SSL::disable serverside" the SSL profile defined on the VS shouldn't be relevant when using HTTP server-side.
Tcpdump/Wireshark run on the client-side connection of the BIGIP shows that LTM returns a RST packet to the client 14ms after sending the client an ACK packet for the "offending" request.
My theory is that some logic related to whether/when TMM chooses to re-use a server-side HTTP connection causes this, given that it happens when consecutive requests in the same client connection is sent to the same port on two different FQDN nodes which resolve to the same IP address.