failover
12 TopicsConfiguration Assistance: Configure Email Alerts for HA Failover Events and Device Offline
We have a BIG-IP VE High Availability Pair deployed in Microsoft Azure. We need to configure the BIG-IP to automatically send an email notification to our Operations teams immediately when a Failover event occurs(When the unit goes from Active to Standby or Offline) Could you provide the recommended procedure for the configuration to trigger these email alerts?69Views0likes3Commentsfailover issue between datacenters with GSLB
Ok I have been looking for the best solution for an application requirement. I have an app that will be living in two datacenters. Each location has 4 servers running. What they want is the application to run 100% of the time in DC1 unless 2 or more servers fail in DC1 and then failover to DC2 I thought this would be rather easy, but I have been unable to find a way to get the VIP to go into a down state if two servers have gone down. Does anyone have a idea how I could implement this? Thanks252Views0likes8CommentsA device recently failover, but I don't know why.
Hi, I recently had a device failover, but I can't find anything in the logs to indicate why. log Oct 11 09:13:11 dvice_A notice tmm[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm2[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm2[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm5[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm5[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm6[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm6[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm3[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm3[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm4[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm4[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm7[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:11 dvice_A notice tmm7[13231]: 01340011:5: HA unit 1 state change: from 1 to 0. Oct 11 09:13:20 dvice_A notice mcpd[6384]: 0107168c:5: Incremental sync complete: This system is updating the configuration on device gro up /Common/dg device %cmi-mcpd-peer-/Common/dvice_B from commit id { 81368 7424061905973374382 /Common/dvice_A } to commit i d { 85137 7424304469001948692 /Common/dvice_A }. Oct 11 09:13:20 dvice_A notice mcpd[6384]: 0107168c:5: Incremental sync complete: This system is updating the configuration on device gro up /Common/dg device %cmi-mcpd-peer-/Common/dvice_B from commit id { 81368 7424061905973374382 /Common/dvice_A } to commit i d { 85137 7424304469001948692 /Common/dvice_A }. Oct 11 09:18:09 dvice_A notice icrd_child[13462]: 13462,13469, iControl REST Child Daemon, INFO,Transaction [1728605588792715] execu tion time expired. Oct 11 17:45:40 dvice_A info platform_agent[7260]: 01e10007:6: Token is renewed. Oct 11 17:45:40 dvice_A info platform_agent[7260]: 01e10007:6: Token is renewed. thank you370Views0likes4CommentsNetwork failover - peer-offline
Hello, I think I'll need advices or at least some opinions, here... On the cluster of F5 we manage, the secondary node passed master, one month ago. Besides, I see, in the GUI, the button "force failover" is greyzed. So Impossible to make a failover from that. But.. Maybe I could force it in CLI... I am not yet sure. I didn't try that, for now (it is not our cluster, so... I must be careful). Anyway... when I have made tests on the clusters, I found that : show cm failover-status -------------------- Status STANDBY (...) ----------------------------------------------------------------------------------------------------------- adress IP1:1026 nodename_Sec 0 1 - Error adress IP21026 nodename_Sec 0 1 - Error adress IP3:1026 nodename_Sec 30334301 3 2024-Sep-09 16:48:55 Ok (PS. I do not indicate the real address / node name, of course, here...) # show /cm traffic-group (...) ------------------------------------------------------------------------------------------------- traffic-group-1 nodename_Pri standby true false - traffic-group-1 nodename_Sec active false false peer-offline # show /sys failover Failover active for 35d 04:03:10 Well, there is 3 address used for the configSync. The 2 first one are self IPs. They are configured with a port lockdown "none". Normally, it is not correct, that is ok, I know it. It should be configured on "default" or "allow all". BUT the management IP work well, obviously. We have a status "ok" for this one. So... Basically, I should be able to make a "failover, in that case, In first view. Except no. Because the button "force failover" is grey. However, I see too the "peer offline" with my cmd "show /cm traffic-group". That means I should be in that situation : https://my.f5.com/s/article/K000137178. But... the "network -pan" doesn't show me any "sod off". So, I am not sure of that, after all. So, 1/ Do you know if the fact I see the "peer-offline" explain, itself, why my button "force failover" is grey ? 2/ The fact we have only the management IP usable for the configSync is functionnal, according to you ? Could it explain too all the problem ? 3/ I do not see "sod off" with a "netstat -pan" (Cf. the Kb I shared her above). In despite of that, do you think I should restart the sod ? Brief, is someone knew a similar situation and would have an opinion or a suggestion about it, please ? Have a nice day end! Best regards, Christian97Views0likes1CommentForce peer to standby using Ansible!?
So, I have read over a number of articles found online regarding determining which nodes are active and attempt to perform the "force to standby" action, thus, in turn, causes the HA peer to become active. (Same as clicking on the button by the same name in the GUI). So, I have successfully generated a variable that contains the node that needs to sent to standby. That part works thus far. However, when I then go to use "f5networks.f5_modules.bigip_node" ansible module to perform the failover, it tells me that the task resulted in a "change", but the node does not go offline and its peer does not become active. - name: Force the failover of the B unit peer... f5networks.f5_modules.bigip_node: state: offline fqdn: "{{ host_to_failover.name }}" name: "{{ host_to_failover.name }}" provider: server: "{{ host_to_failover.name }}" server_port: 443 validate_certs: false no_f5_teem: false user: admin password: "{{ admin_acct_password }}" delegate_to: localhost Output: TASK [debug] ********************************************************************** ok: [f5-r5900-a.its.utexas.edu] => { "host_to_failover": { "name": "f5-r5900-revprox-b.its.utexas.edu", "state": "active" } } TASK [Force the failover of the B unit peer...] *************************************** changed: [f5-r5900-a.its.utexas.edu] Some confusion about parameters: See: ansible module > bigip_node It requires a "name" parameter, even though I am not adding a "node", so I just populate it with the hostname I was to "force to standby". It also asks for It asks for an fqdn/address. I guess this is the node I want to perform the "force to standby" on? So I also populate that with the hostname I was to "force to standby". There is also the "server" within the "provider", and does that have any affect on requesting this action? I tried putting the node to failover here, as well as its peer, but it does not make any difference. Nothing I do, actually causes a "failover" to the "A peer" from the "B peer". HELP!? What is the best way (example please?) to "force to standby" a node, to cause a failover, and the HA peer to become the active peer?248Views0likes1CommentHow to force close TLS sessions in a failover scenario
Hi, We have an application behind Big-IP which doesn't handle failovers well. The Big-IP keeps all TLS sessions consistent and open during failover but the application doesn't support TLS resume for a session and this causes problems in the app. I'm looking for a way to close TLS sessions for a specific VS in a failover scenarios. We're on version 16.1.4.1 Any suggestions? Thanks703Views0likes5CommentsVLAN Failsafe failover settings change on STANDBY device - affect ACTIVE device?
we have two devices in an HA group but the failsave is VLAN and set to fail over on both devices. If I turn off VLAN failsafe on the standby device, does that affect the HA group or ACTIVE device?Solved525Views0likes1CommentPoll members not stable after failover
Hi, Our setup: - two vcmp guests in HA (viprion with two blades) - ~10 partitions - simple configuration with LTM, AFM. nodes directly connected to f5 device (f5 device is default gw for nodes). - sw 16.1.3.3, after upgrade 16.1.4 ^^ this setup in two data centers. We are hitting interesting behaviour in first data center only: - second f5 guest is active: pool members monitors (http and https) respond without problem. everything is stable. this is valid for both f5 devices in HA. - after failover (first f5 guest is active): pool members response is not stable (not stable for https monitor, http is stable again). sometimes are all pool members down, then virtual server is going down. ^^ it looks like a problem on node side, but it's not, because when second f5 device is active, everything is stable. This issue is hitting almost all partitions. We checked: - physical interface: everything is stable, no error on ports, ether-channels (trunks). - arp records: everything looks correct, no mac flapping - spanning tree: stable in environment - routing: correct, default gw on node side: correct, subnet mask: correct on nodes and both f5 devices. floating addresses is working correctly (including arp in network) - log on f5 devices: without any issue connected to this behaviour. I don't know what else connected to this issue we can check. Configuration for all f5 devices (2x dc1, 2x dc2 - two independed ha pairs) is the same (configured with automation), sw version is the same (we did upgrade to 16.1.4 two days ago). It looks that someting is "blocked" on first f5 device in dc1 (reboot or upgrade is not solving our issue). Do you have any idea what else to check?1.1KViews0likes2Comments