OpenShift Service Mesh 2.x/3.x with F5 BIG-IP

 

Overview

OpenShift Service Mesh (OSSM) is Red Hat´s packaged version of Istio Service Mesh. Istio has the Ingress Gateway component to handle incoming traffic from outside of the cluster. Like other ingress controllers, it requires an external load balancer to get the traffic into the ingress PODs. This follows the canonical Kubenetes 2-tier arrangement for getting the traffic inside the cluster. This is depicted in the next figure:

This article covers the configuration of OpenShift Service Mesh 2.x/3.x and expose it to the BIG-IP, and how to properly monitor its health, either using BIG-IP´s Container Ingress Services (CIS) or without using it.

 

Exposing OSSM in BIG-IP - VIP configuration

It is a customer choice how to publish OSSM in the BIG-IP:

  • A Layer 4 (L4) Virtual Server is more simple and certificate management is done in OpenShift.

    The advantages of using this mode are the potential higher performance and scalability, including connection mirroring, yet mirroring is not usually used for HTTP traffic due to the typical retry mechanism of HTTP applications. Connection persistence is limited to the source IP.

    When using CIS, this is done with a TransportServer CR, which creates a fastL4 type virtual server in the BIG-IP.

  • A Layer 7 (L7) Virtual Server requires additional configuration because TLS termination is required.

    In this mode, OpenShift can take advantage of BIG-IP´s TLS off-loading capabilities and Hardware/Network/SaaS/Cloud HSM integrations, which store private keys securely, including FIPS level support.

    Working at L7 also allows to do per-application traffic management, including headers and payload rewrites, cookie persistence, etc. It also allows to do per-application multi-cluster.

    The above features are provided by the LTM (load balancing) module in BIG-IP. The possibilities are further expanded when using modules such as ASM (Advanced WAF) and Access (authentication).

    When using CIS, this is done with a VirtualServer CR, which creates a standard-type virtual server in the BIG-IP.


Exposing OSSM to BIG-IP - pool configuration

There are two options to expose Istio Ingress Gateways to BIG-IP:

  • Using ClusterIP addresses, these are POD IPs which are dynamic. This requires the use of CIS for discovering the IP addresses of the Ingress Gateway PODs.

  • Using NodePort addresses, these are reachable from the outside network. When using these, it is not strictly necessary to use CIS, but it is recommended.

Exposing OpenShift Service Mesh using ClusterIP

This requires the use of CIS with the following parameters

--orchestration-cni=ovn
--static-routing-mode=true

These make CIS create IP routes in the BIG-IP for reaching the POD IPs inside the OpenShift cluster. Please note that this only works if all the OpenShift nodes are directly connected in the same subnet as the BIG-IP.

Additionally, it is required following parameter. It is the one that actually makes CIS populate pool members with Cluster (POD) IPs:

--pool-member-type=cluster

It is not needed to change any configuration in OSSM because ClusterIP mode is the default mode in Istio Ingress Gateways.

Exposing OpenShift Service Mesh using NodePort

Using NodePort allows to have known IP addresses for the Ingress Gateways, reachable from outside the cluster. Note that when using nodePort, only one Ingress Gateway replica will run per node.

The behavior of NodePort varies using the externalTrafficPolicy field:

  • Using the Cluster value, any OpenShift node will accept traffic and will redirect the traffic to any node that has an Ingress Gateway POD, in a load balancing fashion. This is the easiest to setup, but because each request might go to a different node makes health checking not reliable (it is not known which POD goes down).

  • Using the Local value, only the OpenShift nodes that have an Ingress Gateway PODs will accept traffic. The traffic will be delivered to the local Ingress Gateway PODs, without further indirection. This is the recommended way when using NodePort because of its deterministic behaviour and therefore reliable health checking.

Next, it is described how to setup a NodePort using the Local externalTrafficPolicy. There are two options for configuring OSSM:

  • Using the ServiceMeshControlPlane CR method: this is the default method in OSSM 2.x for backwards compatibility, but it doesn’t allow to fine tune the configuration of the proxy. See this OSSM 2.x link for further details. This is deprecated and not available in OSSM 3.x. 

  • Using Gateway injection method: this is the only method possible in OSSM 3.x and the current recommendation from Red Hat for OSSM 2.x. Using this method allows you to tune the proxy settings. In this article, it will be shown how this tuning is of special interest because at present the Ingress Gateway doesn’t have good default values for allowing reliable health checking. These will be discussed in the Health Checking section.

When using ServiceMeshControlPlane CR method, the above will be configured as follows:

apiVersion: maistra.io/v2
kind: ServiceMeshControlPlane
[...]
spec:
  gateways:
    ingress:
      enabled: false
      runtime:
        deployment:
          replicas: 2
      service:
        externalTrafficPolicy: Local
        ports:
        - name: status-port
          nodePort: 30021
          port: 15021
          targetPort: 15021
        - name: http2
          nodePort: 30080
          port: 80
          targetPort: 8080
        - name: https
          nodePort: 30443
          port: 443
          targetPort: 8443
        type: NodePort

When using the Gateway injection method (recommended), the Service definition is manually created analogously to the ServiceMeshControlPlane CR:

apiVersion: v1
kind: Service
[...]
spec:
  externalTrafficPolicy: Local
  type: NodePort
  ports:
  - name: status-port
    nodePort: 30021
    port: 15021
    protocol: TCP
    targetPort: 15021
  - name: http2
    nodePort: 30080
    port: 80
    protocol: TCP
    targetPort: 8080
  - name: https
    nodePort: 30443
    port: 443
    protocol: TCP
    targetPort: 8443

Where the ports section is optional but recommended in order to have deterministic ports, and required when not using CIS (because it requires static ports). The nodePort values can be customised.

When not using CIS, it is needed to manually configure the pool members in the BIG-IP. It is typical in OpenShift to have the Ingress components (OpenShift Router or Istio) in dedicated infra nodes. See this Red Hat solution for details. When using the ServiceMeshControlPlane method, the configuration is as follows:

apiVersion: maistra.io/v2
kind: ServiceMeshControlPlane
[...]
spec:
  runtime:
    defaults:
      pod:
        nodeSelector:
          node-role.kubernetes.io/infra: ""
        tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/infra
          value: reserved
        - effect: NoExecute
          key: node-role.kubernetes.io/infra
          value: reserved


When using the Gateway injection method, the configuration is added to the Deployment file directly:

apiVersion: apps/v1
kind: Deployment
[...]
spec:
  template:
    metadata:
    spec:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved


The configuration above is also a good practice when using CIS.

Additionally, CIS by default adds all nodes IPs to the Service pool regardless of whether the externalTrafficPolicy is set to Cluster or Local value. The health check will discard nodes where there are no Ingress Gateways. It can be limited to the scope of the nodes discovered by CIS with the following parameter:

--node-label-selector


Health Checking and retries for the Ingress Gateway


Ingress Gateway Readiness


The Ingress Gateway has the following readinessProbe for Kubernete´s own health checking:

readinessProbe: 
  failureThreshold: 30 
  httpGet: 
    path: /healthz/ready 
    port: 15021 
    scheme: HTTP 
    initialDelaySeconds: 1 
    periodSeconds: 2 
    successThreshold: 1 
    timeoutSeconds: 3


where the failureThreshold value of 30 is considered way too large and only marks down the Ingress Gateway as not Ready after 90 seconds (tested to be failureThreshold *timeoutSeconds). In this article, it is recommended to mark down an Ingress Gateway no later than 16 seconds.

When using CIS, Kubernetes informs whenever a POD is not Ready and CIS automatically, removes its associated pool member from the pool. In order to achieve the desired behaviour of marking down the Ingress Gateway before 16 seconds, it is required to change the default failureThreshold value in the Deployment file by adding the following snippet:

apiVersion: apps/v1
kind: Deployment
[...]
spec:
  template:
    metadata:
    spec:
      containers:
        - name: istio-proxy
          image: auto
          readinessProbe:
            failureThreshold: 5
            httpGet:
              path: /healthz/ready
              port: 15021
              scheme: HTTP
            initialDelaySeconds: 1
            periodSeconds: 2
            successThreshold: 1
            timeoutSeconds: 3

Which keeps all other values equal and sets failureThreshold to 5, therefore marking down the Ingress Gateway after 15 seconds.

When not using CIS, a HTTP health check has to be configured manually in the BIG-IP. An example health check monitor is shown next:

 

Connection draining


When an Ingress Gateway POD is deleted (because of an upgrade, scale-down event, etc...), it immediately returns HTTP 503 in the /healthz/ready endpoint and keeps serving connections until it is effectively deleted. This is called the drain period and by default is extremely short (3 seconds) for any external load balancer. This value has to be increased so the Ingress Gateway PODs being deleted continue serving connections until the Ingress Gateway POD is removed from the external load balancer (the BIG-IP) and the outstanding connections finalised.

This setting can only be tuned using the Gateway injection method and it is applied by adding the following snippet in the Deployment file:

apiVersion: apps/v1 
kind: Deployment 
[...] 
spec: 
  template: 
    metadata: 
      annotations: 
        proxy.istio.io/config: | 
          terminationDrainDuration: 45s


In the example above, it has been used as the default drain period of the OpenShift Router (45 seconds). The value can be customised, keeping in mind that:

  • When using CIS, it should allow CIS to update the configuration in the BIG-IP and drain the connections.

  • When not using CIS, it should allow the health check to detect the condition of the POD and drain the connections.

Additional recommendations

The next recommendations apply to any ingress controller or API manager and have been previously suggested when using OpenShift Router.

  • Handle non-graceful errors with the pool’s reselect tries 

    To deal better with non-graceful shutdowns or transient errors, this mechanism will reselect a new Ingress Gateway POD when a request fails. The recommendation is to set the number of tries to the number of Ingress Gateway PODs -1. When using CIS, this can be set in the VirtualServer or TransportServer CRs with the reselectTries parameter.

  • Set an additional TCP monitor for Ingress Gateway´s application traffic sockets

    This complementary TCP monitor (for both HTTP and HTTPS listeners) validates that Ready instances can actually receive traffic in the application’s traffic sockets. Although this is handled with the reselect tries mechanism, this monitor will provide visibility that such types of errors are happening.

Conclusion and closing remarks

We hope this article highlights the most important aspects of integrating OpenShift Service Mesh with BIG-IP. A key aspect for having a reliable Ingress Gateway integration is to modify OpenShift Service Mesh’s terminationDrainDuration and readinessProbe.failureThreshold defaults. F5 has submitted to Red Hat RFE 04270713 to improve these. This article will be updated accordingly.

Whether CIS integration is used or not, BIG-IP allows you to expose OpenShift ServiceMesh reliably with extensive L4-L7 security and traffic management capabilities. It also allows fine-grained access control, scalable SNAT or keeping the original source IP, among others. Overall, BIG-IP is able to fulfill any requirement.

We look forward to hearing your experience and feedback on this article.

 

Published Oct 06, 2025
Version 1.0
No CommentsBe the first to comment