When using F5 Distributed Cloud Platform, never deal with Site to Site IP conflicts again!

One of the most common headaches in NetOps is thoughtfully planning out internal IP address allocations. How many IP's will be needed? Does the site contain microservices? Will the workloads in a site need to connect to other sites? When it comes to supporting deployments in public clouds, many resources use the exact same CIDR blocks, for example, 10.0.0.0/8, 192.168.0.0/16, and 172.16.0.0/16 - 172.17.0.0/16. With services being spun up automatically, in many situations, the owner of the service may not have any idea which IP addresses to use and simply choose a common default. Having the same IP address used at multiple connected sites poses real challenges for transit routing.

F5 Distributed Cloud Services solve the IP Address overlap problem once and for all by connecting apps via services on its Global Network. Distributed Cloud Network Connect establishes connectivity across subnets and sites with orchestrated VPNs using MPLS to ensure segmentation. Distributed Cloud App Connect adds control and visibility at L4 and L7 by acting as a distributed reverse-proxy, allowing full abstraction of IP addresses to simplify the routing at each end. All that an admin needs to do is deploy Distributed Cloud CE's to the sites and configure a Load Balancer.

To understand how Distributed Cloud Services solve this problem before diving in to this solution, F5 staffer buulam has a whiteboard session covering exactly what Distributed Cloud does at each step of the way.

In the following solution, the topology I use is an AWS TGW Site connected to both an Azure VNET and GCP site. Because the org subcontracts the development of its portfolio services, independent contractors are able to use whatever CIDR blocks they want. In this case, both Azure and GCP use exact the same CIDR range 10.40.0.0/16.

This would ordinarily be a problem for a traditional L3 routed network without NAT, except here by using an L7 HTTP or L4 TCP load balancer in Distributed Cloud, it's simple to deploy a reverse-proxy service to steer connections and traffic between sites that would otherwise be unreachable without configuring NAT policies.

Consider the following scenario - a company has subcontracted and divided development and QA for part of a service it provides between two independent contractors. Both contractors are connected to the company with a Distributed Cloud global network, and both contractors use the same CIDR block, 10.40.0.0/16. Users need to access part of the app now developed and hosted by contractor B in GCP. Meanwhile, subcontractor A in Azure still needs to be able to connect to the app and continue validating it. The app is distributed in the following way:

  1. The main ingress to the app continues to be in Azure via an HTTP Load Balancer hosted on F5 Distributed Cloud
  2. A module in AWS calls out to the refer-a-friend module that has just been relocated from contractor A in Azure to contractor B in GCP.
  3. Contractor A in Azure needs to continue to validate the the refer-a-friend module and also needs to the module's endpoint now hosted by contractor B.

Note: Contractors A and B both use the same CIDR block, 10.40.0.0/16.

Distributed Cloud App Connect Configuration
To provide Contractor A with ssh access to the server and continue to QA the app now running in GCP, an L4 TCP Load Balancer is deployed internally in F5 XC with the following YAML configuration.

Origin Pool:

metadata:
  name: gcp-remote-ssh
  namespace: dpotter
  labels: {}
  annotations: {}
  description: null
  disable: null
spec:
  origin_servers:
    - private_ip:
        ip: 10.40.0.4
        site_locator:
          site:
            tenant: tme-lab-works-oeaclgke
            namespace: system
            name: dp-gcp-vpc-2
        outside_network: {}
      labels: {}
  no_tls: {}
  port: 22
  same_as_endpoint_port: {}
  healthcheck: null
  loadbalancer_algorithm: LB_OVERRIDE
  endpoint_selection: LOCAL_PREFERRED
  advanced_options: null
resource_version: null

TCP Load Balancer:

metadata:
  name: same-cidr-other-site
  namespace: dpotter
  labels: {}
  annotations: {}
spec:
  domains:
    - same-cidr-other-site.demo.internal
  listen_port: 2022
  origin_pools: []
  origin_pools_weights:
    - pool:
        tenant: tme-lab-works-oeaclgke
        namespace: dpotter
        name: gcp-remote-ssh
      weight: 1
      priority: 1
      endpoint_subsets: {}
  advertise_custom:
    advertise_where:
      - site:
          network: SITE_NETWORK_INSIDE_AND_OUTSIDE
          site:
            tenant: tme-lab-works-oeaclgke
            namespace: system
            name: azure-vnet-wus2
        use_default_port: {}
  hash_policy_choice_round_robin: {}
  idle_timeout: 3600000
  retract_cluster: {}
  dns_info: []

With the above configuration, connections to hostname same-cidr-other-site.demo.internal:2022 from Contractor A in Azure will now be proxied and tunneled by the Distributed Cloud Platform to the server on TCP/22 running in GCP hosted by Contractor B. This makes it possible for Contractor A to continue to QA and validate the module now developed by Contractor B, while also preserving the original workflow:

Internet > External Ingress > Azure, and now from Azure > GCP.

Validation

  1. No direct access between Contractors A and B
    root@dpotter-appserver:/home/azureuser# ping 10.40.0.4
    PING 10.40.0.4 (10.40.0.4) 56(84) bytes of data.
    ^C
    --- 10.40.0.4 ping statistics ---
    3 packets transmitted, 0 received, 100% packet loss, time 2050ms
    
    root@dpotter-appserver:/home/azureuser# ping 10.40.0.5
    PING 10.40.0.5 (10.40.0.5) 56(84) bytes of data.
    ^C
    --- 10.40.0.5 ping statistics ---
    3 packets transmitted, 0 received, 100% packet loss, time 2039ms
  2. SSH is allowed from Contractor A to the module's server now owned by Contractor B despite both contractors using the same CIDR block (10.40.0.0/16)
    Client side:

    root@dpotter-appserver:/home/azureuser# telnet same-cidr-other-site.demo.internal 2022
    Trying 10.40.0.5...
    Connected to same-cidr-other-site.demo.internal.
    Escape character is '^]'.
    SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.2
    ^]q
    
    telnet> q
    Connection closed.
    root@dpotter-appserver:/home/azureuser#

    Server side:

    root@docker-ubuntu-20-2-vm:/home/da.potter# tcpdump -ni ens4 tcp port 22
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on ens4, link-type EN10MB (Ethernet), capture size 262144 bytes
    22:26:23.941910 IP 10.40.0.3.51675 > 10.40.0.4.22: Flags [S], seq 3472146549, win 65535, options [mss 1132,nop,wscale 9,sackOK,TS val 2217414740 ecr 0], length 0
    22:26:23.941965 IP 10.40.0.4.22 > 10.40.0.3.51675: Flags [S.], seq 2409916872, ack 3472146550, win 64768, options [mss 1420,sackOK,TS val 316240982 ecr 2217414740,nop,wscale 7], length 0
    22:26:23.953482 IP 10.40.0.3.51675 > 10.40.0.4.22: Flags [.], ack 1, win 2048, options [nop,nop,TS val 2217414750 ecr 316240982], length 0
    22:26:23.963366 IP 10.40.0.4.22 > 10.40.0.3.51675: Flags [P.], seq 1:42, ack 1, win 506, options [nop,nop,TS val 316241004 ecr 2217414750], length 41

Conclusion
IP Address Overlap is no longer a problem, and will never be a concern when using Network Connect and App Connect to connect sites and the apps that run between them. For example, using TCP Load Balancers with internally advertised VIPs, it's easy to maintain service level connectivity between sites even when the sites and services have the exact same IP addresses.

Updated Apr 18, 2023
Version 3.0
No CommentsBe the first to comment