gtm
258 TopicsGTM Pool Members Gone After Maintenance? It's Probably This One Setting
You finish a maintenance window, everything looks good on LTM, and then someone notices Wide IPs are resolving to fewer destinations than before. You check the GTM pools and the members are just... gone. The virtual servers are fine on LTM. GTM just doesn't know about them anymore — and more importantly, it doesn't remember if they were ever pool members. This happens more often than it should, and it almost always comes back to the same thing: virtual-server-discovery enabled doing exactly what it was designed to do, at exactly the wrong moment. What's Actually Going On When virtual-server-discovery is set to enabled on a GTM server object, GTM keeps its view of LTM virtual servers in sync via iQuery. It automatically adds new virtual servers, updates existing ones, and — this is the part that causes problems — deletes virtual servers that LTM stops reporting on. That delete behavior is the issue. Any time iQuery reports zero virtual servers, even temporarily, GTM treats it as a mass deletion event. The virtual servers get pulled from the server object, and with them, their pool memberships. When LTM eventually reports on those virtual servers again, GTM re-discovers them as brand new objects with no memory of which pools they belonged to. Two scenarios trigger this consistently. Scenario 1: LTM Software Upgrade This is the one that catches most people. During an upgrade, LTM reboots and goes through a phase where iQuery can connect but the full configuration hasn't finished loading yet. From GTM's perspective, LTM is reachable but reporting no virtual servers. GTM interprets that as a deletion event, clears out the discovered virtual servers, and empties the pools. When LTM finishes loading and the virtual servers come back, GTM re-discovers them — but the pool memberships are gone. You're left manually rebuilding what was there before the maintenance window started. The telltale sign is pool members coming back in blue/CHECKING state. That only happens to newly discovered objects. GTM treated a returning virtual server as a brand new one — because as far as it's concerned, it is. The GTM log won't show a deletion event, only the re-add. That gap in the logs is a known blind spot with virtual-server-discovery enabled, and it's exactly why the problem is hard to diagnose after the fact. What you'll typically see in /var/log/gtm after the LTM comes back: alert gtmd[xxxxx]: 011a1005:1: SNMP_TRAP: Pool your_pool state change green --> red (No enabled pool members available) alert gtmd[xxxxx]: 011a3004:1: SNMP_TRAP: Wide IP your.wideip.example.com state change green --> red (No enabled pools available) And then shortly after, the virtual servers re-appear in CHECKING state as GTM re-discovers them — but with no pool bindings. Scenario 2: LTM HA Failover This one surprises people because the LTM pair is still running — it's just switching active units. After a failover, the new active device may not have its iQuery connections fully re-established yet. GTM sees the iQuery state as inconsistent, virtual server status updates stop coming through, and members disappear from the discovered list. What makes this harder to diagnose is that tmsh show gtm iquery may show "connected" — but connected doesn't mean the config sync is working correctly. In a GTM sync group, only the device assigned local ID 0 (the GTM with the lowest IP address) is responsible for writing auto-discovery results to the configuration. If that specific device loses its iQuery connection during the failover window, discovery events are missed entirely — even if every other GTM in the group can still reach the LTM. So you can have a situation where five out of six GTMs look perfectly healthy, iQuery shows connected everywhere, and yet pool members are still disappearing — because the one device that matters for discovery is the one with the broken connection. You can check which device in your sync group holds local ID 0 with: tmsh list sys db gtm.peerinfolocalid If that device's iQuery connection to the LTM is the one that dropped during the failover window, that's your answer — even if everything else looks fine. The Fix: enabled-no-delete Both scenarios share the same root cause: GTM's auto-delete behavior treating a temporary iQuery disruption as a permanent deletion event. The fix is the same for both: gtm server /Common/site1-ltm { addresses { 10.1.1.1 { device-name site1-ltm } } datacenter /Common/dc1 monitor /Common/bigip virtual-server-discovery enabled-no-delete } With enabled-no-delete, GTM still auto-discovers new virtual servers and keeps existing ones updated. The only thing that changes is that it will never delete a virtual server just because LTM temporarily stopped reporting it. Your pool memberships survive both scenarios above. Mode Adds new VS Updates VS Deletes VS Pool memberships survive iQuery disruption? disabled No No No Yes — nothing changes enabled Yes Yes Yes No — any disruption can empty pools enabled-no-delete Yes Yes No Yes — preserved The Trade-Off enabled-no-delete won't clean up after you when you intentionally decommission a virtual server on LTM. The stale GTM object stays in the discovered list until you remove it manually. In environments with a lot of VS churn, this can accumulate over time. The question is which failure mode you'd rather manage: pool members silently disappearing during a maintenance window, or occasionally needing to clean up stale objects after a planned decommission. For most production environments, the latter is far easier to deal with — and far less likely to wake someone up at 2am. How to Make the Change Via tmsh: tmsh modify gtm server /Common/site1-ltm \ virtual-server-discovery enabled-no-delete tmsh save sys config Via GUI: Go to DNS → GSLB → Servers Select the server object Set Virtual Server Discovery to Enabled (No Delete) Click Update This takes effect immediately and does not affect existing discovered virtual servers or current pool memberships. Cleaning Up Stale Objects When you intentionally decommission a virtual server on LTM, remove the leftover GTM object manually: # List virtual servers under a GTM server object tmsh list gtm server /Common/site1-ltm virtual-server # Remove a specific stale entry tmsh modify gtm server /Common/site1-ltm \ virtual-servers delete { /Common/old-vs-name } tmsh save sys config Make this part of your standard VS decommission runbook and stale objects will never pile up. Quick Diagnostic When Members Go Missing Before assuming it's a discovery issue, check iQuery health across all GTM devices first: tmsh show gtm iquery Look for: State: should be connected to all entries Reconnects: A high count suggests instability even if the connection looks up Configuration Time: None means the config has never successfully synced from that LTM Then confirm which GTM holds local ID 0 and verify its connectivity specifically: tmsh list sys db gtm.peerinfolocalid If the local ID 0 device is the one with the broken iQuery connection, that's your answer — regardless of what the other devices are showing. Wrapping Up Whether it's an LTM upgrade or an HA failover, the pattern is the same: iQuery goes quiet for a moment, GTM interprets silence as deletion, and your pool memberships are gone. It's working as designed — just not in a way that's useful to you. enabled-no-delete is a one-line change that stops this from happening. The cleanup overhead it introduces is predictable and manageable. The alternative — rebuilding pool memberships after an unplanned event — is not. Have you run into either of these scenarios in your environment? Drop a comment below, especially if you've seen the local ID 0 shift cause issues during a rolling GTM upgrade.90Views1like0CommentsRestore configuration to GTM Sync Group device
I am in the process of writing up a change to delete config from a GTM Sync Group which I am fine with but I am looking to confirm my thoughts on how to restore the configuration should I need to back the change out. For an LTM F5 I would create a UCS file on the Standby device before making any changes as the first step. Then should I need to rollback I would just restore the UCS file to the Standby device, check that the configuration has restored correctly and then sync the Standby device to the Active. I think that a similar approach will work for the devices in the GTM Sync Group but I want to avoid the restored config automatically syncing to the other devices in the GTM Sync Group so my thoughts on how to do this are as follows: 1. Log onto the Standby F5 2. Navigate to DNS > Settings > GSLB > General 3. Untick the 'Synchronize' and 'Synchronize DNS Zone Files' tick boxes 4. Change the Group Name to something unique. 5. Save the config 6. Create the UCS file 7. Reverse the changes made in Steps 3 and 4 8. Delete configuration The thinking here is that should I need to restore the config from the UCS file there is no chance of this automatically syncing to the other devices in the GTM Sync Group since it was not part of the GTM Sync Group when this was taken. Once restored I would then update the description for one of the WideIP's (simply append something like '1234' to it) so that this has the higher 'commit-id' and then add the device back to the GTM Sync Group. Since the newly added device has the highest 'commit-id' this would then push the confog back to the GTM Sync Group and I am back where I started. To my mind this makes sense but it would be very much appreciated if I could get a second opinion on this.Solved199Views1like5CommentsSingle LTM with multiple GTM domains
I am currently working on a Datacenter migration and we are re-IP'ing everything and rebuilding all the network appliances. I am working out the BEST, least impactful, way to migrate the GTM appliances to the new DC's. Here is the overall situation. Everything is the same version running 15.x.x with a mix of rSeries hardware running VE's and iSeries hardware also running VE's. Existing DC's: GTM Domain with two GTM's in different DC's Multiple LTM's all joined to the GTM New DC's: Two GTM's in different DC's, blank configuration Multiple LTM's all joined with the existing DC GTM's I know that I can add the new GTM's to the existing DC GTM domain, let them sync up, then update the NS records to migrate the DNS flows over to the new DC, but that also sync's over all the technical debt and limits my pre-testing abilities. I would like to setup a new GTM Domain in the new DC, build some automation for the WideIP / Pool creation, and manually review / rebuild all the necessary records in the new DC. My hangup is that this is ONLY possible if the LTM appliance can join multiple GTM domains. Can a single LTM appliance join multiple GTM domains and report status to multiple appliances? I don't have an easy way to build a test environment and build this out with VE's and validate so I am hoping for some input from the community.161Views0likes2CommentsTerraform AS3 code for GTM Only.
Hello All, I am really really suffering here :( Have been looking for GTM ONLY code in AS3 form, need a simple code hardcoded values will also work. I have seen documentation and couldn't see exact use case. We are doing POC for where VMs are direct;y added to GTM and NO LTM component are there. I can't post my LTM + GTM code as its in office. Would really appreciate any help and guidance here. Any simple code work snippet using only AS3 please.415Views0likes9CommentsF5 DNS/GTM External Monitor(EAV) with SNI support and response code check
Code is community submitted, community supported, and recognized as ‘Use At Your Own Risk’. The example DNS/GTM health monitor is for versions before 16.1 as BIG-IP supports SNI for default DNS/GTM HTTPS monitor in the latest version but if you have still not upgraded then this is for you! I have used this monitor for XC Distributed Cloud as the HTTP LB share by default the same tenant IP address and SNI support is needed. You can order dedicated public IP addresses for each HTTP LB and enable "Default Load Balancer" ( https://my.f5.com/manage/s/article/K000152902 ) option but it will cost you extra 😉 The script is a modified version of External https health monitor for SNI-enabled pool as to handle response codes and to set the SNI globally for the entire pool and it's members. If you are uploading from Windows machine see External monitor fails to run as you could hit the bug. This could be needed for F5 DNS/GTM below 16.1 that do not support SNI in HTTPS monitors. The only mandatory variable is "SNI" that should be set in the external monitor config that references this uploaded bash script. The "URI" variable by default is set to "/" and "$2" variable by default is empty or 443, the default expected response code 200. #!/bin/sh # External monitoring script for checking HTTP status code # $1 = IP (::ffff:nnn.nnn.nnn.nnn notation or hostname) # $2 = port (optional; defaults to 443 if not provided) # Default SNI to IP if not explicitly provided node_ip=$(echo "$1" | sed 's/::ffff://') # Remove IPv6 compatibility prefix SNI=${SNI:-"$node_ip"} # Assign sanitized IP to SNI # Default variables MON_NAME=${MON_NAME:-"MyExtMon$$"} pidfile="/var/run/$MON_NAME.$1..$2.pid" # PID file path DEBUG=${DEBUG:-0} # Enable debugging if set to 1 EXPECTED_STATUS=${EXPECTED_STATUS:-200} # Default HTTP status code to 200 URI=${URI:-"/"} # Default URI DEFAULT_PORT=443 # Default port (used if $2 is unset) # Set port to default if $2 is not provided if [ -z "${2}" ]; then PORT=${DEFAULT_PORT} else PORT=${2} fi # Kill old process if pidfile exists if [ -f "$pidfile" ]; then kill -9 -$(cat "$pidfile") > /dev/null 2>&1 fi echo "$$" > "$pidfile" # Perform the HTTP(S) request via single curl (fetch status code only) status_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 5 --resolve "${SNI}:${PORT}:${node_ip}" "https://${SNI}:${PORT}${URI}") # Cleanup rm -f "$pidfile" > /dev/null 2>&1 # Output server status based on HTTP status code match if [ "$status_code" -eq "$EXPECTED_STATUS" ]; then echo "up" else echo "down" fi # Debugging if [ "$DEBUG" -eq 1 ]; then echo "Debugging on..." echo "SNI=${SNI}" echo "URI=${URI}" echo "IP=${node_ip}" echo "PORT=${PORT}" echo "MON_NAME=${MON_NAME}" echo "STATUS_CODE=${status_code}" echo "EXPECTED_STATUS=${EXPECTED_STATUS}" echo "curl -s -o /dev/null -w '%{http_code}' --connect-timeout 5 --resolve ${SNI}:${PORT}:${node_ip} https://${SNI}:${PORT}${URI}" fi312Views0likes1CommentDNS/GTM health monitor big3d timeout because of alias config
Hello Everyone, I was testing some experimental config for DNS/GTM where the health monitor does not monitor the pool members but a specific IP address configured in the "alias" and it does not work as the error says bigd timeouts to report the state. For LTM http/https health monitors the "alias" option works but not for gtm/dns. I think I discovered a bug as this is rare use case to not monitor the pool members themselves. I have changed the ip to 1.1.1.1 just for the picture screenshot 😄 Also in the logs after gtm and big3d is enabled I see the logs below and too bad that F5 DNS does not have monitor debug like LTM to just enable a debug for a monitor and not the entire box. ----- Will not probe x.x.x.x:80 ( in DC /Common/niki-dc because will be done by other GTM (<unknown>:<unknown>) Unable to identify which gtm server represents the local device140Views0likes1Commentwhat will happen if local gtm/dns disable the sync with other gtm/dns sync group?
Hi, we want to temporarily remove local gtm/dns from corporate global gtm/dns sync group. What will happen to local dns service? what is the impact? will some applications be marked as down if the application servers are located in other region and learned via gtm sync group? we have gtm/dns in three different regions. Can anyone please advise? thanks in advance!Solved219Views0likes2Commentscross platform migration issue
Hi, we want to migrate the config from iseries 4K to rseries 5k . The current software version on iseries is 13.x.. I tried to run bigip v15.x on rseries, then export the config from iseries and import it into rseries, but not successful, there were some errors. Can someone please advise how should I do to make the migration successful? Thanks in advance!268Views0likes2CommentsSNI Sites not taking correct certificate.
I have configured one VIP with two certificate aks.test.com aks4.test.com On SSL profile for aks.test.com i have enabled SNI feature and aks.test.com is working fine taking correct certificate (aks.test.com). but aks4.test.com having not secure error on browser and taking the certificate of (aks.test.com). Could someone please help what could be the issue in this case.407Views0likes8Comments