Scaling iControl, race conditions, and other pain points

Question

Hello.&nbsp;
I have written a reporting tool which connects to a cluster leader, determines all the cluster members, and spawns a thread for each load balancer in the cluster. Each thread collects partition, pool, virtual server, and pool member statistics from a load balancer. The final results are merged into a cluster-wide roll-up report. You can see an example of what one of the pool reports looks like in the attached screenshot. Many of these reports are generated -- one for each pool in a cluster. (The screenshot is an example from a lab H/A pair, and not my 12-member cluster.)&nbsp;
&nbsp;
I am running into several issues scaling this approach out on one of our largest clusters -- a 12 member active/active cluster. It is a fairly active cluster, with many hosts leaving and joining pools every few minutes. My first approach was to connect to a load balancer, set the active folder to "/", and set a recursive query state. I would then retrieve ALL POOLS from every folder/partition in a single API call, and then perform a lookup of all pool members in the pools returned by the previous API call. In my case, there are hundreds of pools, and thousands of pool members. I then iterate through a number of pool member specific API calls. Problems arise, however, in that a race condition is introduced wherein some pool members will actually be removed from the  load balancers while the report is running, and hence member-related API calls will fail with node not found errors. There's not much that can be done except rerun the report at this point (using this approach).&nbsp;
&nbsp;
Once the race condition became obvious, I rewrote the tool to crawl the load balancer on a partition by partition basis, so at least the race condition could be isolated to a partition-wide view. I could then either retry the report generation for the partition or just move on to the next partition when a race condition is hit and the node not found exception is experienced. This is my current approach.&nbsp;
&nbsp;
However, I am finding the partition-by-partition crawl to be EXTREMELY SLOW. I also find that iControl because mostly unresponsive to other programs that need to use it (ie. node registration, other reporting processes, etc.).&nbsp;
&nbsp;
Is iControl a bad approach? If I tried this with SNMP, would we expect it be significantly faster for gathering pool member, pool and related statistics? Just looking for some high level advice from the pro's who have tackled this before.&nbsp;
Thanks,&nbsp;
-M&nbsp;

patrick_chang_7 · Answer

Unfortunately, the iControl process on the F5 side is pretty much single threaded (it can only process one request at a time).  If you have a process that is tying it up for long periods, it will become unresponsive to other processes that are trying to issue iControl commands.  In general, it is better to use SNMP in order to gather statistics and only use iControl to issue actual configuration changes/queries.  In older versions of code (pre v10.2.2) we had some major inefficiencies in the way we processed SNMP requests that made SNMP pretty unusable for gathering statistics on large numbers of objects at one time.  If you are running a pre v10.2.2 TMOS version, you can request an engineering HF that fixes this.  Prior to v11.2.0 there were statistics available through iControl that were not available via SNMP.  Since v11.2.0, one can create custom MIB entries that enable one to grab anything via SNMP that could be gotten through the command line.  The process to do this is described here: http://support.f5.com/kb/en-us/solutions/public/13000/500/sol13596.html?sr=28857189

mhite_60883 · Answer

Thanks, Patrick. I appreciate the response. I will give a try to your suggestion about splitting some of the queries out into SNMP.  Do you expect the situation to be different with the introduction of the REST API in 11.4? I would guess that the lower overhead will allow iControl to process requests quicker so the effective capacity will be ultimately higher? 
&nbsp;  
&nbsp; -M

patrick_chang_7 · Answer

REST API should improve things, but it would have to be tested to be sure and I have seen no performance testing done on it yet.  In addition, the first iteration of the REST API is geared towards making configuration changes and will probably not be able to collect all the statistics you want.

Forum Discussion

Scaling iControl, race conditions, and other pain points

3 Replies

Win Big in Vegas: The iRules Contest is back with $5k on the line at AppWorld 2026

Jürgen - February 2026 Featured Member

F5 Architecture Track Sessions - AppWorld 2026

Recent Discussions

How to add Syslog headers to Bot Defense logs over HSL? (Missing formatting options)

AppWorld DC Booth Kiosk Generator

Questions on R-Series 5800s

Blindfold key for API request to replace TLS certificate

ASM bd daemon crash while processing request body (SIGSEGV) – anyone seen similar behavior?

Related Content

Key Steps to Securely Scale and Optimize Production-Ready AI for Banking and Financial Services

Cloud Scaling without Native Cloud Scaling

Scale, Enhance, and Evolve BIG-IP Administration-Getting Started

Scale Your DMZ with F5 Distributed Cloud Services

Conditional XOR operations

ABOUT DEVCENTRAL

RESOURCES

SUPPORT

PARTNERS