F5 BIG-IP deployment with Red Hat OpenShift - the no CIS option
This article is a set of recommendations when using F5 BIG-IP as external load balancer of an Red Hat OpenShift cluster with the default router (HA-proxy) and not using CIS
Overview
There are deployments where F5 BIG-IP is used in front of a Red Hat OpenShift cluster without the Container Ingress Services (CIS) controller. In general, this is not recommended because without CIS there is no Kubernetes integration and automation.
This article is for deployments where for some reason CIS is not used. The article assumes the default OpenShift deployment where HA-proxy (aka OpenShift´s router) is used for ingress where ports 80 and 443 are exposed in separate pools.
Implementation
The next is a set of recommendations for appropriate health checking and failure handling of HA-proxy instances in OpenShift:
- Create an HTTP monitor for the readiness endpoint
When a POD is not in Ready state CIS automatically removes it from the pool. When not using CIS we need to mimic this functionality with an HTTP monitor in port 1936 (readiness endpoint) for both 443 and 80 pools:
ltm monitor http ocp_haproxy { destination *.1936 recv "^HTTP/1.1 200" send "GET /healthz/ready HTTP/1.1\r\nHost: localhost\r\nConnection: close\r\n\r\n" }
This monitor will detect the following failure scenarios:
- There is no IP connectivity to the HA-proxy instance
- HA-proxy is not working properly (it will return a non-HTTP 200 status code)
- HA-proxy is performing a graceful shutdown (a POD deletion), for example when doing an upgrade (returning an HTTP 500 status code).
The graceful shutdown process of HA-proxy is as follows:
- the readiness endpoint immediately returns an HTTP 500 error but will continue processing requests for the applications for 45 seconds. This is indicated in the response of the readiness endpoint with an ”[+]backend-http ok” message in the payload.
- after 45 seconds, the endpoint will return “[-]backend-http failed: reason withheld” and will TCP RESET any request for the applications. Create an HTTP monitor probing the readiness endpoint (port 1936) of HA-proxy. During these 45 seconds, the timer values to the HTTP monitor have plenty of time to disable the HA-proxy instance appropriately.
- Handle nongraceful errors with the pool's reselect tries
To deal better with nongraceful shutdowns or transient errors, we can make use of this mechanism which will reselect a new HA-proxy instance when a request to an application fails. The recommendation is to set the number of tries to the number of HA-proxy instances -1.
- Set an additional TCP monitor for HA-proxy´s application's path socket
This additional TCP monitor in either port 80 or 443 complements the HTTP monitor of the readiness endpoint by additionally validating that the HA-proxy instances can listen for requests in their designated application's socket. Although this is handled with the reselect tries mechanism this monitor will provide visibility that such types of errors are happening.
- Established connections
For already established connections the Action on Service Down feature could be used but it is considered that the default option NONE should be used. This will allow us to potentially finish already established connections. Using RESET would trigger a faster retry from the HTTP client with the disadvantage of not allowing finishing requests that could potentially finish. Other options are not worth considering.
Final configuration
In the next image, it can be seen graphically the configuration recommended with key settings highlighted.
Conclusions
F5 BIG-IP has sophisticated features for health checking and handling failures in backend services, including Kubernetes, to operate without any service disruption.
Although in this article we configured BIG-IP for OpenShift manually, it is recommended to use the CIS controller which brings OpenShift visibility, automation more advanced use cases to BIG-IP.
CIS is open-source software and is included in your support contract entitlement.
The next links provide a good introduction to CIS: