GTM Oracle monitor succeeding, but server not marked as up (no reply from big3d)
Env: GTM 11.5.2
We have a wide IP based on pools that use an Oracle monitor. That monitor performs a SQL query to check the read/write status of the database, and if that status is good, marks the Oracle server accessed as "up".
This monitor is failing for 3 out of our 6 Oracle IPs. The monitor itself seems to succeed, based on turning on debugging - i see success in the debug log for the monitor itself, as well as in the DBDaemon-0.log (copies of the lines from those logs below). But in the gtm log, it's reporting that there was no reply from big3d, and therefore the 3 servers are being marked as down. The "no reply" error is even occurring on the same GTM - it's reporting that error for its own self IP. iqdump shows that there's no issue with connecting to big3d, all looks nominal.
These same 3 Oracle servers/IPs are used in other pools, that use other Oracle monitors, that are all working correctly.
None of the "known" causes that I could find for a "no reply from big3d" error seem to apply - e.g., no multiple traffic groups involved (in fact, there's no LTM virtual server involved), no iquery issues (neither connectivity nor certificates nor anything else). And all other big3d related activities work fine, all of the rest of our wide IPs are fine, the LTM VIP status is correctly registering on the GTMs, etc.
Any thoughts?
Here are lines from various log files - first, the monitor debug log (i'm fuzzing our internal IPs and the password, but they are correct):
********** Debugging session beginning at: Wed Aug 22 03:59:06 2018
Arguments 1-2:
::ffff:172.16.XX.XX
1521
Environment variables:
COUNT=0
DATABASE=(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=%node_ip%)(PORT=%node_port%))(CONNECT_DATA=(SERVICE_NAME=app_odb)))
DEBUG=yes
MON_TMPL_NAME=/Common/MyAccount-ODB_prod_monitor
NODE_IP=::ffff:172.16.XX.XX
NODE_PORT=1521
PASSWORD=XXXXX
RECVCOLUMN=1
RECVROW=1
RECV_I=READ WRITE
SEND=select open_mode from v$database
USERNAME=XXXXX
--
TMOS_RD: 0 (0)
Daemon port: 1521
count='0' converts to '0'
Command-line PID filename: /var/run/ORACLE__Common_MyAccount-ODB_prod_monitor_::ffff:172.16.XX.XX-0_1521.pid
PID file /var/run/DBDaemon-0.pid exists. Checking for correctness of PID.
DBDaemon on port 1521 says its PID is 9115.
PID matches
Asking daemon to ping remote database.
Recvd: 'oracle.jdbc.OracleDriver
'
Recvd: 'jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.XX.XX)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=app_odb)))
'
Recvd: 'XXXXX
'
Recvd: 'XXXXX
'
Recvd: 'select open_mode from v$database
'
Recvd: 'READ WRITE
'
Recvd: '1
'
Recvd: '1
'
Recvd: 'jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.121.35)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=app_odb)))(Thread-34756097):
'
Recvd: '!Up!
'
up
Now an extract from the DBDaemon log:
2018-08-22 03:59:50.992: (Thread-34756183): Count: 0
2018-08-22 03:59:51.02: jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.121.36)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=app_odb)))(Thread-34756183): DB connect succeeded.
2018-08-22 03:59:51.02: jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.121.36)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=app_odb)))(Thread-34756183): Query message: select open_mode from v$database
2018-08-22 03:59:51.06: jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.121.36)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=app_odb)))(Thread-34756183): Send Query success
2018-08-22 03:59:51.06: jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.121.36)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=app_odb)))(Thread-34756183): Response from server: OPEN_MODE: 'READ WRITE'
2018-08-22 03:59:51.06: jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.121.36)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=app_odb)))(Thread-34756183): Checking for recv string: READ WRITE
2018-08-22 03:59:51.07: jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.121.36)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=app_odb)))(Thread-34756183): Analyze Response success
2018-08-22 03:59:51.07: jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.121.36)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=app_odb)))(Thread-34756183): Current count: 199 Count : 0
And GTM:
Aug 22 01:28:17 gc-www-ns-01 alert gtmd[18606]: 011ae0f2:1: Monitor instance /Common/MyAccount-ODB_prod_monitor 172.16.XX.XX:1521 UNKNOWN_MONITOR_STATE --> DOWN from /Common/gc-www-ns-01 (no reply from big3d /Common/gc-www-ns-01(172.23.XX.XX): timed out)