Forum Discussion

bdraschk_114903's avatar
bdraschk_114903
Icon for Nimbostratus rankNimbostratus
Apr 14, 2014

Monitoring RAID status?

We just upgraded our BigIP 6900 from 10.2.3 to 11.2.1. Maybe caused by the usage of a previously unused part of the disk, maybe by bad karma, one of the two hard drives decided to go belly up:

 

sd 1:0:0:0: [sdb] Unhandled sense code
sd 1:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 1:0:0:0: [sdb] Sense Key : Medium Error [current] [descriptor]
...
 cat /proc/mdstat
Personalities : [raid0] [raid1]
md13 : active raid1 dm-28[0]
      3145664 blocks [2/1] [U_]
...

While this is certainly manageable with F5 support, the fact that we noticed it only by pure chance (accessing the standby device via the web interface) somehow makes us concerned. We can monitor a lot of things on the F5 via SNMP, including temperature, memory and disk space, but /proc/mdstat seems inaccessible from the SNMP agent.

 

On a vanilla netsnmp agent, one could add a custom OID and script, but the big THIS IS AN AUTO-GENERATED FILE -- DO NOT EDIT!!! somehow makes this a moot idea.

 

Has anyone solved this yet?

 

  • We are currently looking into deploying a daemon (aka caching) version of check_mk agent on the devices.

     

    Another option would be to use the cronjob and ssh pubkey-authentication to push the contents of /proc/mdstat and smartctl output to a remote server, but that would mean to add a ton of monitoring to ensure that the cronjob is still running by looking at the result and comparing modification timestamps etc.

     

  • I actually opened a case with f5 to add raid status to the SNMP MIB and they send me information that it's already in there:

    config  snmpwalk -v 2c -c public 127.0.0.1 sysPhysicalDiskTable
    F5-BIGIP-SYSTEM-MIB::sysPhysicalDiskSerialNumber."WD-WCAT1E407050" = STRING: WD-WCAT1E407050
    F5-BIGIP-SYSTEM-MIB::sysPhysicalDiskSerialNumber."WD-WCAT1E408695" = STRING: WD-WCAT1E408695
    F5-BIGIP-SYSTEM-MIB::sysPhysicalDiskSerialNumber."B92341DDYGKJ0908HV00" = STRING: B92341DDYGKJ0908HV00
    F5-BIGIP-SYSTEM-MIB::sysPhysicalDiskSlotId."WD-WCAT1E407050" = INTEGER: 0
    F5-BIGIP-SYSTEM-MIB::sysPhysicalDiskSlotId."WD-WCAT1E408695" = INTEGER: 0
    F5-BIGIP-SYSTEM-MIB::sysPhysicalDiskSlotId."B92341DDYGKJ0908HV00" = INTEGER: 0
    F5-BIGIP-SYSTEM-MIB::sysPhysicalDiskName."WD-WCAT1E407050" = STRING: HD1
    F5-BIGIP-SYSTEM-MIB::sysPhysicalDiskName."WD-WCAT1E408695" = STRING:
    F5-BIGIP-SYSTEM-MIB::sysPhysicalDiskName."B92341DDYGKJ0908HV00" = STRING: CF1
    F5-BIGIP-SYSTEM-MIB::sysPhysicalDiskIsArrayMember."WD-WCAT1E407050" = INTEGER: true(1)
    F5-BIGIP-SYSTEM-MIB::sysPhysicalDiskIsArrayMember."WD-WCAT1E408695" = INTEGER: true(1)
    F5-BIGIP-SYSTEM-MIB::sysPhysicalDiskIsArrayMember."B92341DDYGKJ0908HV00" = INTEGER: false(0)
    F5-BIGIP-SYSTEM-MIB::sysPhysicalDiskArrayStatus."WD-WCAT1E407050" = INTEGER: ok(1)
    F5-BIGIP-SYSTEM-MIB::sysPhysicalDiskArrayStatus."WD-WCAT1E408695" = INTEGER: missing(3)
    F5-BIGIP-SYSTEM-MIB::sysPhysicalDiskArrayStatus."B92341DDYGKJ0908HV00" = INTEGER: undefined(0)
    

    The indexing with the disks' serial numbers is somewhat strange, you'll have to adapt your checks after replacing a drive.