Forum Discussion

DaveNulty76's avatar
DaveNulty76
Icon for Nimbostratus rankNimbostratus
May 28, 2024

Issues with F5 appliance black holeing traffic

Hi 

 

wondering if anyone can help or has seen an issue similar to this?

 

2 tenants on 1 Host

One tenant has the issue, the other doesn't.

All upstream networks and devices assurance checked and logs analysed.

Issue only present on Hostname: dcnn-lb01-int.circlehealthgroup.co.uk

Key Info:

CPU in the control plane consistently spikes at 100% (5-10 times per min)

Outbound traffic is blackholed if session starts during CPU spikes.

All outbound traffic types all affected - ICMP from tmos, Node polling, F5 sync messaging, snmp messages....etc

If session starts outside of CPU spikes then issue isn't present - ICMP started will run continuously without drops

Symptoms last < 2secs per "CPU event".

Symptoms present as if the default route drops/changes, gateway is unreachable or traffic is blackholed.

Repeating here but essential to note that this is only for newly established sessions during that window.

Established sessions (e.g. continuous ICMP from tmos) are not affected during the same blackholing event window where new sessions are.

 

Thanks in advance

Dave

  • That's an intriguing set of symptoms.

    BIG-IP divides up CPU cores between running TMM and running control-plane processes. Most of the cores should be scheduled running TMM as fast as possible. TMM is the user-space traffic microkernel that communicates with the network interfaces directly, has its own IP stack, etc. One of the cores should be mostly running control plane (not TMM) processes.

    You can ask TMM if it's been starved for CPU or runaway iRules or other data-plane stuff by looking for "clock advanced" errors:

    https://my.f5.com/manage/s/article/K10095

     

    When BIG-IP first gets a flow (first packet of TCP or ICMP or whatever), it's added to the flow table after virtual server selection. It sounds like this may be interrupted somehow, like maybe there is a large influx of new connections that has both the right periodicity and right amount of complexity to trigger some bad behavior. BIG-IP doesn't categorize traffic into "inbound" and "outbound", they're all flows that go into the same table. So inbound and outbound traffic should be the same.

    I have seen symptoms that seem sort of similar to the ones you describe when there are MAC address conflicts.