enabled-no-delete
1 TopicGTM Pool Members Gone After Maintenance? It's Probably This One Setting
You finish a maintenance window, everything looks good on LTM, and then someone notices Wide IPs are resolving to fewer destinations than before. You check the GTM pools and the members are just... gone. The virtual servers are fine on LTM. GTM just doesn't know about them anymore — and more importantly, it doesn't remember if they were ever pool members. This happens more often than it should, and it almost always comes back to the same thing: virtual-server-discovery enabled doing exactly what it was designed to do, at exactly the wrong moment. What's Actually Going On When virtual-server-discovery is set to enabled on a GTM server object, GTM keeps its view of LTM virtual servers in sync via iQuery. It automatically adds new virtual servers, updates existing ones, and — this is the part that causes problems — deletes virtual servers that LTM stops reporting on. That delete behavior is the issue. Any time iQuery reports zero virtual servers, even temporarily, GTM treats it as a mass deletion event. The virtual servers get pulled from the server object, and with them, their pool memberships. When LTM eventually reports on those virtual servers again, GTM re-discovers them as brand new objects with no memory of which pools they belonged to. Two scenarios trigger this consistently. Scenario 1: LTM Software Upgrade This is the one that catches most people. During an upgrade, LTM reboots and goes through a phase where iQuery can connect but the full configuration hasn't finished loading yet. From GTM's perspective, LTM is reachable but reporting no virtual servers. GTM interprets that as a deletion event, clears out the discovered virtual servers, and empties the pools. When LTM finishes loading and the virtual servers come back, GTM re-discovers them — but the pool memberships are gone. You're left manually rebuilding what was there before the maintenance window started. The telltale sign is pool members coming back in blue/CHECKING state. That only happens to newly discovered objects. GTM treated a returning virtual server as a brand new one — because as far as it's concerned, it is. The GTM log won't show a deletion event, only the re-add. That gap in the logs is a known blind spot with virtual-server-discovery enabled, and it's exactly why the problem is hard to diagnose after the fact. What you'll typically see in /var/log/gtm after the LTM comes back: alert gtmd[xxxxx]: 011a1005:1: SNMP_TRAP: Pool your_pool state change green --> red (No enabled pool members available) alert gtmd[xxxxx]: 011a3004:1: SNMP_TRAP: Wide IP your.wideip.example.com state change green --> red (No enabled pools available) And then shortly after, the virtual servers re-appear in CHECKING state as GTM re-discovers them — but with no pool bindings. Scenario 2: LTM HA Failover This one surprises people because the LTM pair is still running — it's just switching active units. After a failover, the new active device may not have its iQuery connections fully re-established yet. GTM sees the iQuery state as inconsistent, virtual server status updates stop coming through, and members disappear from the discovered list. What makes this harder to diagnose is that tmsh show gtm iquery may show "connected" — but connected doesn't mean the config sync is working correctly. In a GTM sync group, only the device assigned local ID 0 (the GTM with the lowest IP address) is responsible for writing auto-discovery results to the configuration. If that specific device loses its iQuery connection during the failover window, discovery events are missed entirely — even if every other GTM in the group can still reach the LTM. So you can have a situation where five out of six GTMs look perfectly healthy, iQuery shows connected everywhere, and yet pool members are still disappearing — because the one device that matters for discovery is the one with the broken connection. You can check which device in your sync group holds local ID 0 with: tmsh list sys db gtm.peerinfolocalid If that device's iQuery connection to the LTM is the one that dropped during the failover window, that's your answer — even if everything else looks fine. The Fix: enabled-no-delete Both scenarios share the same root cause: GTM's auto-delete behavior treating a temporary iQuery disruption as a permanent deletion event. The fix is the same for both: gtm server /Common/site1-ltm { addresses { 10.1.1.1 { device-name site1-ltm } } datacenter /Common/dc1 monitor /Common/bigip virtual-server-discovery enabled-no-delete } With enabled-no-delete, GTM still auto-discovers new virtual servers and keeps existing ones updated. The only thing that changes is that it will never delete a virtual server just because LTM temporarily stopped reporting it. Your pool memberships survive both scenarios above. Mode Adds new VS Updates VS Deletes VS Pool memberships survive iQuery disruption? disabled No No No Yes — nothing changes enabled Yes Yes Yes No — any disruption can empty pools enabled-no-delete Yes Yes No Yes — preserved The Trade-Off enabled-no-delete won't clean up after you when you intentionally decommission a virtual server on LTM. The stale GTM object stays in the discovered list until you remove it manually. In environments with a lot of VS churn, this can accumulate over time. The question is which failure mode you'd rather manage: pool members silently disappearing during a maintenance window, or occasionally needing to clean up stale objects after a planned decommission. For most production environments, the latter is far easier to deal with — and far less likely to wake someone up at 2am. How to Make the Change Via tmsh: tmsh modify gtm server /Common/site1-ltm \ virtual-server-discovery enabled-no-delete tmsh save sys config Via GUI: Go to DNS → GSLB → Servers Select the server object Set Virtual Server Discovery to Enabled (No Delete) Click Update This takes effect immediately and does not affect existing discovered virtual servers or current pool memberships. Cleaning Up Stale Objects When you intentionally decommission a virtual server on LTM, remove the leftover GTM object manually: # List virtual servers under a GTM server object tmsh list gtm server /Common/site1-ltm virtual-server # Remove a specific stale entry tmsh modify gtm server /Common/site1-ltm \ virtual-servers delete { /Common/old-vs-name } tmsh save sys config Make this part of your standard VS decommission runbook and stale objects will never pile up. Quick Diagnostic When Members Go Missing Before assuming it's a discovery issue, check iQuery health across all GTM devices first: tmsh show gtm iquery Look for: State: should be connected to all entries Reconnects: A high count suggests instability even if the connection looks up Configuration Time: None means the config has never successfully synced from that LTM Then confirm which GTM holds local ID 0 and verify its connectivity specifically: tmsh list sys db gtm.peerinfolocalid If the local ID 0 device is the one with the broken iQuery connection, that's your answer — regardless of what the other devices are showing. Wrapping Up Whether it's an LTM upgrade or an HA failover, the pattern is the same: iQuery goes quiet for a moment, GTM interprets silence as deletion, and your pool memberships are gone. It's working as designed — just not in a way that's useful to you. enabled-no-delete is a one-line change that stops this from happening. The cleanup overhead it introduces is predictable and manageable. The alternative — rebuilding pool memberships after an unplanned event — is not. Have you run into either of these scenarios in your environment? Drop a comment below, especially if you've seen the local ID 0 shift cause issues during a rolling GTM upgrade.13Views0likes0Comments