Forum Discussion

Drew's avatar
Drew
Icon for Nimbostratus rankNimbostratus
Jul 14, 2015

Pronlem with a monitor

Hi I wonder if anyone has seen a similar issue to this. We have a pair of 6900s in two Data Centers and we have a SQL partition we use to load balancing a set of SQL DB servers. We are using a custom monitor to check the health of the servers in each pool.

 

One of the SQL servers appears to be UP in the live DC, and DOWN in the standby DC. The monitors are identical.We've successfully sync'd the boxes up but still the offline box thinks one server is down. I've tried a tcpdump on each box and I can see the monitor firing in the live DC, but not in the standby DC.

 

The parent monitor is mssql. The send string is

 

USE ts_74_prod_lsds;SELECT DB_NAME()

 

User Names and passwords are correct.

 

It seems more to be an issue with the monitor on one box, rather than a problem with the server.

 

Even if the box was down I would still expect to see some activity in the tcpdump

 

I can ping the server from both boxes and can telnet on the correct port too. These show up in the tcpdump so I'm guessing the monitor never actually fires.

 

To further complicate things, the DBDaemon.log files are the same and it looks like the monitor actually fires. See below

 

7/14/15 5:00 PM: (Thread-318260): Count : 0 7/14/15 5:00 PM: jdbc:sqlserver://172.31.61.51:50198;databaseName=;(Thread-318260): new pinger created.connect str =jdbc:sqlserver://172.31.61.51:50198;databaseName=; 7/14/15 5:00 PM: jdbc:sqlserver://172.31.61.51:50198;databaseName=;(Thread-318260): attempt =0

 

The server in question has an IP address of 172.31.61.51 and the port is 50198.

 

Any help greatly appreciated.

 

Cheers, Drew

 

5 Replies

  • Have you tried enabling "Debug" in the SQL monitor? Or as an alternative you can enable monitor logging in the properties of the pool member itself beginning with v11.4 or maybe v11.5.

     

  • Drew's avatar
    Drew
    Icon for Nimbostratus rankNimbostratus

    Hi I guess your second suggestion may be available in V11.5, I can't see it here. We tried Debug too. Strange it looks as though the monitor actually runs too, yet nothing in the tcpdump. The SQL DBA says he doesn't see any checks hitting his server from the offline box. Debug output shown below: Working ********** Debugging session beginning at: Tue Jul 7 13:59:09 2015

     

    Arguments 1-2: ::ffff:172.31.61.51 50198

     

    Environment variables: COUNT=0 DATABASE= DEBUG=yes MON_TMPL_NAME=/SQL/MONITOR_PROD_SQL2_INTERNAL_LIVE NODE_IP=::ffff:172.31.61.51 NODE_PORT=50198 PASSWORD=xxxxxxxxxxxxxxx RECVCOLUMN= RECVROW= RECV_I= SEND=USE ts_74_prod_lsds;SELECT DB_NAME()

     

    USERNAME=bigip

    count='0' converts to '0' MakePidFile: ::ffff:172.31.61.51-0..50198 pidfile exists -- checking for correctness of pid... DBDaemon says its pid is 10375 pid of 10375 is correct! Recvd: 'com.microsoft.sqlserver.jdbc.SQLServerDriver ' Recvd: 'jdbc:sqlserver://172.31.61.51:50198;databaseName=; ' Recvd: 'bigip ' Recvd: 'xxxxxxxxxxxxxxx ' Recvd: 'USE ts_74_prod_lsds;SELECT DB_NAME() ' Recvd: 'jdbc:sqlserver://172.31.61.51:50198;databaseName=;(Thread-154072): ' Recvd: '!Up! ' up

     

    Not working ********** Debugging session beginning at: Tue Jul 7 13:43:48 2015

     

    Arguments 1-2: ::ffff:172.31.61.51 50198

     

    Environment variables: COUNT=0 DATABASE= DEBUG=yes MON_TMPL_NAME=/SQL/MONITOR_PROD_SQL2_INTERNAL_LIVE NODE_IP=::ffff:172.31.61.51 NODE_PORT=50198 PASSWORD=xxxxxxxxxxxxxxx RECVCOLUMN= RECVROW= RECV_I= SEND=USE ts_74_prod_lsds;SELECT DB_NAME()

     

    USERNAME=bigip

    count='0' converts to '0' MakePidFile: ::ffff:172.31.61.51-0..50198 pidfile exists -- checking for correctness of pid... DBDaemon says its pid is 14013 pid of 14013 is correct! Recvd: 'com.microsoft.sqlserver.jdbc.SQLServerDriver ' Recvd: 'jdbc:sqlserver://172.31.61.51:50198;databaseName=; ' Recvd: 'bigip ' Recvd: 'xxxxxxxxxxxxxxx ' Recvd: 'USE ts_74_prod_lsds;SELECT DB_NAME() ' Recvd: 'jdbc:sqlserver://172.31.61.51:50198;databaseName=;(Thread-3448787): ' Recvd: '!Down! ' Database down, see /var/log/DBDaemon.log for details. DBDaemon.log shown below: 7/15/15 11:07 AM: jdbc:sqlserver://172.31.63.47:50206;databaseName=;(Thread-3630637): closed connection!!!

     

    7/15/15 11:07 AM: (Thread-3630639): Count : 1 7/15/15 11:07 AM: jdbc:sqlserver://172.31.61.51:50206;databaseName=;(Thread-3630639): new pinger created.connect str =jdbc:sqlserver://172.31.61.51:50206;databaseName=; 7/15/15 11:07 AM: jdbc:sqlserver://172.31.61.51:50206;databaseName=;(Thread-3630639): attempt =0 7/15/15 11:07 AM: jdbc:sqlserver://172.31.61.51:50206;databaseName=;(Thread-3630639): closing connection after 1 uses.

     

    7/15/15 11:07 AM: jdbc:sqlserver://172.31.61.51:50206;databaseName=;(Thread-3630639): closed connection!!!

     

    7/15/15 11:07 AM: (Thread-3630641): Count : 0 7/15/15 11:07 AM: jdbc:sqlserver://172.31.63.47:50198;databaseName=;(Thread-3630641): new pinger created.connect str =jdbc:sqlserver://172.31.63.47:50198;databaseName=; 7/15/15 11:07 AM: jdbc:sqlserver://172.31.63.47:50198;databaseName=;(Thread-3630641): attempt =0 7/15/15 11:07 AM: (Thread-3630643): Count : 0 7/15/15 11:07 AM: jdbc:sqlserver://172.31.61.51:50198;databaseName=;(Thread-3630643): new pinger created.connect str =jdbc:sqlserver://172.31.61.51:50198;databaseName=; 7/15/15 11:07 AM: jdbc:sqlserver://172.31.61.51:50198;databaseName=;(Thread-3630643): attempt =0 7/15/15 11:07 AM: (Thread-3630645): Count : 1 7/15/15 11:07 AM: jdbc:sqlserver://172.31.63.47:50206;databaseName=;(Thread-3630645): new pinger created.connect str =jdbc:sqlserver://172.31.63.47:50206;databaseName=; 7/15/15 11:07 AM: jdbc:sqlserver://172.31.63.47:50206;databaseName=;(Thread-3630645): attempt =0 7/15/15 11:07 AM: jdbc:sqlserver://172.31.63.47:50206;databaseName=;(Thread-3630645): closing connection after 1 uses.

     

    7/15/15 11:07 AM: jdbc:sqlserver://172.31.63.47:50206;databaseName=;(Thread-3630645): closed connection!!!

     

    7/15/15 11:07 AM: (Thread-3630647): Count : 1 7/15/15 11:07 AM: jdbc:sqlserver://172.31.61.51:50206;databaseName=;(Thread-3630647): new pinger created.connect str =jdbc:sqlserver://172.31.61.51:50206;databaseName=; 7/15/15 11:07 AM: jdbc:sqlserver://172.31.61.51:50206;databaseName=;(Thread-3630647): attempt =0 7/15/15 11:07 AM: jdbc:sqlserver://172.31.61.51:50206;databaseName=;(Thread-3630647): closing connection after 1 uses.

     

    7/15/15 11:07 AM: jdbc:sqlserver://172.31.61.51:50206;databaseName=;(Thread-3630647): closed connection!!!

     

    7/15/15 11:07 AM: (Thread-3630649): Count : 0 7/15/15 11:07 AM: jdbc:sqlserver://172.31.63.47:50198;databaseName=;(Thread-3630649): new pinger created.connect str =jdbc:sqlserver://172.31.63.47:50198;databaseName=; 7/15/15 11:07 AM: jdbc:sqlserver://172.31.63.47:50198;databaseName=;(Thread-363

     

  • Drew's avatar
    Drew
    Icon for Nimbostratus rankNimbostratus

    Strange that didn't show anything either. I tried tcpdump -vvv -i VLAN81 host 172.31.63.47 and saw some traffic due to the icmp checks. So at least I know which interface it's accessed via. I think I'll open a call with F5.

     

    Thanks for your help :-)

     

  • Drew's avatar
    Drew
    Icon for Nimbostratus rankNimbostratus

    Sorry, that should be tcpdump -vvv -i VLAN81 host 172.31.61.51

     

  • Drew's avatar
    Drew
    Icon for Nimbostratus rankNimbostratus

    We got a fix from F5.

     

    I'd never have guessed this one but here it is.

     

    In the Monitor config, change the "Count" to anything other than a value of 0 I changed it to one and the monitor fired straight off and marked the server up. See text below from F5

     

    I've checked over the QKview you've supplied us, it looks like we may be seeing an issue with the SQL monitor that can happen when it's disconnected from the database, but doesn't necessary realize that it's been disconnected from the database. For the future, using a non-0 value for "count" will prevent this from happening In this instance, I'd like to recommend that you set the "count" value on PROD_SQL2 to a non-0 number, and then reboot your LTM using the procedure found here: https://support.f5.com/kb/en-us/solutions/public/13000/000/sol13030.html. This process causes the LTM to reload its binary database from the configuration. Needless to say, it should not be performed on a system currently passing traffic. I'd recommend a minimum of a failover, preferably a work window.

     

    Thanks again for your help .

     

    Cheers

     

    Drew