FAILOVER
9 TopicsA device recently failover, but I don't know why.
Hi, I recently had a device failover, but I can't find anything in the logs to indicate why. log Oct 11 09:13:11 dvice_A notice tmm[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm2[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm2[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm5[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm5[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm6[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm6[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm3[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm3[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm4[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm4[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm7[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm7[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:20 dvice_A notice mcpd[6384]: 0107168c:5: Incremental sync complete: This system is updating the configuration on device gro up /Common/dg device %cmi-mcpd-peer-/Common/dvice_B from commit id { 81368 7424061905973374382 /Common/dvice_A } to commit i d { 85137 7424304469001948692 /Common/dvice_A }. Oct 11 09:13:20 dvice_A notice mcpd[6384]: 0107168c:5: Incremental sync complete: This system is updating the configuration on device gro up /Common/dg device %cmi-mcpd-peer-/Common/dvice_B from commit id { 81368 7424061905973374382 /Common/dvice_A } to commit i d { 85137 7424304469001948692 /Common/dvice_A }. Oct 11 09:18:09 dvice_A notice icrd_child[13462]: 13462,13469, iControl REST Child Daemon, INFO,Transaction [1728605588792715] execu tion time expired. Oct 11 17:45:40 dvice_A info platform_agent[7260]: 01e10007:6: Token is renewed. Oct 11 17:45:40 dvice_A info platform_agent[7260]: 01e10007:6: Token is renewed. thank you70Views0likes4CommentsNetwork failover - peer-offline
Hello, I think I'll need advices or at least some opinions, here... On the cluster of F5 we manage, the secondary node passed master, one month ago. Besides, I see, in the GUI, the button "force failover" is greyzed. So Impossible to make a failover from that. But.. Maybe I could force it in CLI... I am not yet sure. I didn't try that, for now (it is not our cluster, so... I must be careful). Anyway... when I have made tests on the clusters, I found that : show cm failover-status -------------------- Status STANDBY (...) ----------------------------------------------------------------------------------------------------------- adress IP1:1026 nodename_Sec 0 1 - Error adress IP21026 nodename_Sec 0 1 - Error adress IP3:1026 nodename_Sec 30334301 3 2024-Sep-09 16:48:55 Ok (PS. I do not indicate the real address / node name, of course, here...) # show /cm traffic-group (...) ------------------------------------------------------------------------------------------------- traffic-group-1 nodename_Pri standby true false - traffic-group-1nodename_Sec active false false peer-offline # show /sys failover Failover active for 35d 04:03:10 Well, there is 3 address used for the configSync. The 2 first one are self IPs. They are configured with a port lockdown "none". Normally, it is not correct, that is ok, I know it. It should be configured on "default" or "allow all". BUT the management IP work well, obviously. We have a status "ok" for this one. So... Basically, I should be able to make a "failover, in that case, In first view. Except no. Because the button "force failover" is grey. However, I see too the "peer offline" with my cmd "show /cm traffic-group". That means I should be in that situation :https://my.f5.com/s/article/K000137178. But... the "network -pan" doesn't show me any "sod off". So, I am not sure of that, after all. So, 1/ Do you know if the fact I see the "peer-offline" explain, itself, why my button "force failover" is grey ? 2/ The fact we have only the management IP usable for the configSync is functionnal, according to you ? Could it explain too all the problem ? 3/ I do not see "sod off" with a "netstat -pan" (Cf. the Kb I shared her above). In despite of that, do you think I should restart the sod ? Brief, is someone knew a similar situation and would have an opinion or a suggestion about it, please ? Have a nice day end! Best regards, Christian36Views0likes1CommentForce peer to standby using Ansible!?
So, I have read over a number of articles found online regarding determining which nodes are active and attempt to perform the "force to standby" action, thus, in turn, causes the HA peer to become active. (Same as clicking on the button by the same name in the GUI). So, I have successfully generated a variable that contains the node that needs to sent to standby. That part works thus far. However, when I then go to use "f5networks.f5_modules.bigip_node" ansible module to perform the failover, it tells me that the task resulted in a "change", but the node does not go offline and its peer does not become active. - name: Force the failover of the B unit peer... f5networks.f5_modules.bigip_node: state: offline fqdn: "{{ host_to_failover.name }}" name: "{{ host_to_failover.name }}" provider: server: "{{ host_to_failover.name }}" server_port: 443 validate_certs: false no_f5_teem: false user: admin password: "{{ admin_acct_password }}" delegate_to: localhost Output: TASK [debug] ********************************************************************** ok: [f5-r5900-a.its.utexas.edu] => { "host_to_failover": { "name": "f5-r5900-revprox-b.its.utexas.edu", "state": "active" } } TASK [Force the failover of the B unit peer...] *************************************** changed: [f5-r5900-a.its.utexas.edu] Some confusion about parameters: See: ansible module > bigip_node It requires a "name" parameter, even though I am not adding a "node", so I just populate it with the hostname I was to "force to standby". It also asks for It asks for an fqdn/address. I guess this is the node I want to perform the "force to standby" on? So I also populate that with the hostname I was to "force to standby". There is also the "server" within the "provider", and does that have any affect on requesting this action? I tried putting the node to failover here, as well as its peer, but it does not make any difference. Nothing I do, actually causes a "failover" to the "A peer" from the "B peer". HELP!? What is the best way (example please?) to "force to standby" a node, to cause a failover, and the HA peer to become the active peer?86Views0likes1CommentHow to force close TLS sessions in a failover scenario
Hi, We have an application behind Big-IP which doesn't handle failovers well. The Big-IP keeps all TLS sessions consistent and open during failover but the application doesn't support TLS resume for a session and this causes problems in the app. I'm looking for a way to close TLS sessions for a specific VS in a failover scenarios. We're on version 16.1.4.1 Any suggestions? Thanks538Views0likes5CommentsVLAN Failsafe failover settings change on STANDBY device - affect ACTIVE device?
we have two devices in an HA group but the failsave is VLAN and set to fail over on both devices. If I turn off VLAN failsafe on the standby device, does that affect the HA group or ACTIVE device?Solved420Views0likes1CommentPoll members not stable after failover
Hi, Our setup: - two vcmp guests in HA (viprion with two blades) - ~10 partitions - simple configuration with LTM, AFM. nodes directly connected to f5 device (f5 device is default gw for nodes). - sw 16.1.3.3, after upgrade 16.1.4 ^^ this setup in two data centers. We are hitting interesting behaviour in first data center only: - second f5 guest is active: pool members monitors (http and https) respond without problem. everything is stable. this is valid for both f5 devices in HA. - after failover (first f5 guest is active): pool members response is not stable (not stable for https monitor, http is stable again). sometimes are all pool members down, then virtual server is going down. ^^ it looks like a problem on node side, but it's not, because when second f5 device is active, everything is stable. This issue is hitting almost all partitions. We checked: - physical interface: everything is stable, no error on ports, ether-channels (trunks). - arp records: everything looks correct, no mac flapping - spanning tree: stable in environment - routing: correct, default gw on node side: correct, subnet mask: correct on nodes and both f5 devices. floating addresses is working correctly (including arp in network) - log on f5 devices: without any issue connected to this behaviour. I don't know what else connected to this issue we can check. Configuration for all f5 devices (2x dc1, 2x dc2 - two independed ha pairs) is the same (configured with automation), sw version is the same (we did upgrade to 16.1.4 two days ago). It looks that someting is "blocked" on first f5 device in dc1 (reboot or upgrade is not solving our issue). Do you have any idea what else to check?743Views0likes2CommentsGTM - Topology load balancing failover when one pool is down
Hello All, I am looking for a solution to the problem that has been raised several times, but I do not find a confirmed solution. The situation I am in is described in the following post:GTM Topology across pools issue when one of the po... - DevCentral (f5.com) We have two topology records with the same source, but different destination pools, with different wights: SRC: Region X => DEST: Pool A, wieght 200 SRC: Region X => DEST: Pool B, Wieght 20 When Pool A is down the Topology load balancing for the Wide IP still selects Pool A which is down, and no IP is returned to the client. If the topology load balancing selection mechanism, is not going to take in the status of the destination pool and just stop on first match in its selection mechanism, then why have "Wieght" at all.I do no believe disabling "longest match" would help as this just affects the order the topology rules are searched, it woudl still stop with the first match. The often mentioned solution is to use a single pool with Global Availability load balancing, as mentioned in the post:GTM and Topology - DevCentral (f5.com). The problem I have is that Pool A and Pool B are pools with mulitple generic host servers. I cannot have a pool with all generic host in it as we want to memebers in each Pool are Active/Active and not Active/ Backup Many thanks, Michael1.9KViews0likes11CommentsBIG-IP VE VMWare Cluster HA triggering configuration
Hi, this is my first step into BIG-IP VE deployments (always viprion so far). I have all my test clusters up & running in a VMWare environment: Active/Stanby using dedicated vNIC&VLAN. 4 vNIC per device, 2 cluster members, each one running at different ESXi. But I would like to ccountercheck which would be the best option to trigger HA. At PHY deployments we deploy HA Group based on trunks. Now this does not work for all cases. Would a failsafe condition based on VLAN be the best solution? E.g. with failover to the sdby BIG-IP in case no ARP was received from client_VLAN gateway? any comment wellcome! Regards.Solved1.4KViews0likes2Comments