Mar 08, 2012

Change in CPU measurement?

Two weeks back, over the February 25-26 weekend, one of our BIG-IP LTMs suddenly changed how it measures CPU usage.



We first noticed a "knee" in the CPU usage graph. Over a span of a minute or two, CPU usage suddenly dropped by three percentage points. I could find nothing in the logs to indicate why, and no one had been logged in or making changes at that time. So I took a look at the raw stats from the RRD files, using "rrdtool fetch".



What I found was that whatever is being measured for CPU utilization suddenly went up by a factor of 15 or so. Before the "knee", typical user CPU use might be in the "150" to "300" range, and idle time in the "600" to "700" range. After the "knee", these jumped to "1800" to "3000" and "14000" to "16000" or so, respectively.



I tried normalizing the data by dividing every value by the sum across all columns for that particular sample (about 1000 before the "knee", about 17,000 after). The graph I ended up with showed the same "knee" at the same time.



It seems like the measurement methodology suddenly changed, all by itself. I don't know if the "knee" reflects an actual drop in CPU utilization or is an artifact of a measurement change.



Has anyone else encountered this before?


