Forum Discussion

Zen_Y's avatar
Zen_Y
Icon for Cirrus rankCirrus
Jan 12, 2025
Solved

checking the fan status on the device.

Hi everyone,

I have a question about checking the fan status on the device.

I checked using the command: system_check -d | grep -E "Units|_fan"

 Key Sys Data Unit Sta Cur Tbl # Type Range Value Hyst Sev
 1/bld_cpu_fan 0 4928 RPM 1 NORM I|D|S 0 1 0 750 250 CRIT
 cha_fan1_volt 3 12065 mV 1 NORM I|D|S 0 2 0 10558 241 CRIT
 cha_fan1_volt 3 12065 mV 1 NORM I|D|S 2 2 2 13442 241 CRIT
 cha_fan2_volt 3 12100 mV 1 NORM I|D|S 0 2 0 10558 241 CRIT
 cha_fan2_volt 3 12100 mV 1 NORM I|D|S 2 2 2 13442 241 CRIT
cha_fan_1 3 2882 RPM 1 NORM I|D|S 0 1 0 500 250 CRIT
cha_fan_2 3 2896 RPM 1 NORM I|D|S 0 1 0 500 250 CRIT
cha_fan_3 3 2896 RPM 1 NORM I|D|S 0 1 0 500 250 CRIT
cha_fan_4 3 2906 RPM 1 NORM I|D|S 0 1 0 500 250 CRIT

with the results as the table above, what I am confused about is what the Sev column means, and why all the statuses are CRIT, while the current sensor value and data are still above Hyst.

thank you

  • Understanding how the threshold system works for sensor monitoring on your F5 device is crucial for maintaining the health and performance of the system. The thresholds determine when the system will flag a sensor reading as normal, warning, or critical. Here’s a detailed explanation of how this typically works:

    Threshold System Overview

    1. Sensor Measurement: The system continuously monitors various sensors, such as fan speeds, temperatures, voltages, etc.
    2. Thresholds: Each sensor has predefined thresholds that specify what values are considered normal, warning, or critical. These thresholds can include:
      • Lower Critical (LC): The value below which the sensor reading is considered critical.
      • Lower Warning (LW): The value below which the sensor reading is considered a warning.
      • Upper Warning (UW): The value above which the sensor reading is considered a warning.
      • Upper Critical (UC): The value above which the sensor reading is considered critical.
    3. Hysteresis: This is a value used to prevent frequent toggling of the sensor status. It creates a buffer zone so that minor fluctuations around the threshold do not cause constant status changes.

    Example of Thresholds

    For a fan speed sensor:

    • LC: 500 RPM (below this is critical)
    • LW: 750 RPM (below this is warning)
    • UW: 5000 RPM (above this is warning)
    • UC: 5500 RPM (above this is critical)

    View Current Thresholds:

    tmsh show /sys hardware

     

  • HI Zen_Y,

    Let me try to breakdown the terms found in the command output. 

    The CRIT (Critical) value and Hyst (Hysteresis) value serve different purposes in system monitoring and are not directly compared to determine if a component is in a critical state. Instead, the critical status (CRIT) is typically determined based on predefined thresholds or conditions set for the component, independent of the hysteresis value.

    1. CRIT (Critical):
      • The CRIT status indicates that a component's current measurement (e.g., fan speed, voltage) is outside the acceptable operational range and is in a critical state. This typically means that the component is either failing or performing in a way that could lead to system instability or damage.
      • Therefore, the actual threshold for what constitutes a critical state is predefined by the system's monitoring policies and can vary depending on the component and its specifications.
    2. Hyst (Hysteresis):
      • Hysteresis is a buffer range used to prevent frequent toggling or cycling of a component. For example, in temperature control, hysteresis might define how much the temperature needs to decrease below a set threshold before a cooling fan turns off, thus avoiding rapid on/off cycles.
      • The hysteresis value helps stabilize the operation by introducing a delay or buffer before triggering actions like turning on/off fans or other controls.

    In the output you provided, the "CRIT" status indicates that the current value (Cur) of the component is outside the predefined acceptable range, triggering a critical alert.

    • For example, the 1/bld_cpu_fan has a current speed of 4928 RPM but is marked as "CRIT," indicating that this speed is outside the acceptable operational range for the CPU fan.

    The hysteresis (Hyst) value, in this context, serves to provide a buffer for operational changes but does not directly influence whether a component is marked as critical. The critical status is determined based on whether the current value exceeds the predefined thresholds, not in comparison to the hysteresis value.

    If you'd like F5 TAC to take a look at the fan reading and examine if there's an issue, you can submit a support case to F5. 

    K2633: Submit a support case

    Cheers,

    Mo

     

     

     

    • Zen_Y's avatar
      Zen_Y
      Icon for Cirrus rankCirrus

      Hi MoFaz 

      means in general, the sensor detection status is in the Sev column? what about the Cur column with the value NORM, does this indicate a status on the sensor?

  • Hey Zen_Y ,

    In this context, "NORM" in the "Cur" (Current) column indicates that the current status or value of the component is normal. Despite the current value being normal, the presence of CRIT in the "Sev" (Severity) column suggests that there might be a critical concern or threshold that needs attention, possibly related to the hysteresis or range values.

    Cheers,

    Mo

     

  • they mean SEVerity, CRITical.

    I suggest simply use the web admin gui to get system info easier.

  • Hi Zen_Y , what BIG-IP device are you using for the fan system check? Is it an iseries or an rseries?

  • f51's avatar
    f51
    Icon for Cirrocumulus rankCirrocumulus

    Hi Zen,

    It looks like you've run the system_check -d command on your F5 device and are trying to interpret the results, particularly the Sev (Severity) column, and why all the statuses are marked as CRIT (Critical) even though the current sensor values appear to be normal.

    Here's a breakdown of the columns in your table:

    • Key: Identifier for the specific sensor being monitored.
    • Sys Data: System data associated with the sensor.
    • Unit: Measurement unit for the sensor value (e.g., RPM for fan speed, mV for voltage).
    • Sta: Status of the sensor (typically 0 indicates normal operation).
    • Cur: Current value of the sensor reading.
    • Tbl: Table identifier (possibly indicating a group or type of sensor).
    • #: Number or identifier for the sensor.
    • Type: Type of data being measured (e.g., voltage, fan speed).
    • Range: Range of acceptable values for the sensor.
    • Value: Current value of the sensor, similar to Cur.
    • Hyst: Hysteresis value, indicating the threshold below which the sensor reading would be considered critical.
    • Sev: Severity of the current status of the sensor (e.g., I for informational, D for degraded, S for severe, CRIT for critical).

    The Sev (Severity) column indicates the current severity status of the sensor. In your case, all the statuses are marked as CRIT (Critical). This is concerning, as it suggests that the system is interpreting the sensor readings as being in a critical state.

    There are a few potential reasons why the statuses might be marked as CRIT even though the values appear normal:

    1. Threshold Settings: The thresholds for what constitutes a critical state might be set incorrectly or too low. You may need to verify and adjust these thresholds in the system configuration.
    2. Sensor Calibration: The sensors might need calibration. If the sensors are not calibrated correctly, they might report incorrect values leading to false critical alerts.
    3. Firmware/Software Issue: There might be a bug or issue in the firmware or software that is causing incorrect reporting of sensor statuses. Checking for firmware or software updates could resolve this issue.
    4. Interpreting Hysteresis: The Hyst value indicates the hysteresis threshold below which the sensor would be considered critical. If the current value is close to this threshold, it might still be marked as critical to prevent rapid switching between states.

    To address this, you should:

    1. Verify Thresholds: Check the configuration for the thresholds of each sensor and adjust them if necessary.
    2. Consult Documentation: Refer to the F5 documentation for details on interpreting sensor readings and adjusting thresholds.
    3. Check for Updates: Ensure your device firmware and software are up-to-date.
    4. Contact Support: If the issue persists, consider reaching out to F5 support for further assistance.

    By investigating these areas, you can determine why the statuses are marked as critical and take appropriate action to resolve the issue.

    • Zen_Y's avatar
      Zen_Y
      Icon for Cirrus rankCirrus

      thanks, i'm trying to figure out how the threshold system works.

  • f51's avatar
    f51
    Icon for Cirrocumulus rankCirrocumulus

    Understanding how the threshold system works for sensor monitoring on your F5 device is crucial for maintaining the health and performance of the system. The thresholds determine when the system will flag a sensor reading as normal, warning, or critical. Here’s a detailed explanation of how this typically works:

    Threshold System Overview

    1. Sensor Measurement: The system continuously monitors various sensors, such as fan speeds, temperatures, voltages, etc.
    2. Thresholds: Each sensor has predefined thresholds that specify what values are considered normal, warning, or critical. These thresholds can include:
      • Lower Critical (LC): The value below which the sensor reading is considered critical.
      • Lower Warning (LW): The value below which the sensor reading is considered a warning.
      • Upper Warning (UW): The value above which the sensor reading is considered a warning.
      • Upper Critical (UC): The value above which the sensor reading is considered critical.
    3. Hysteresis: This is a value used to prevent frequent toggling of the sensor status. It creates a buffer zone so that minor fluctuations around the threshold do not cause constant status changes.

    Example of Thresholds

    For a fan speed sensor:

    • LC: 500 RPM (below this is critical)
    • LW: 750 RPM (below this is warning)
    • UW: 5000 RPM (above this is warning)
    • UC: 5500 RPM (above this is critical)

    View Current Thresholds:

    tmsh show /sys hardware