Forum Discussion
hui_37443
Nimbostratus
Feb 24, 2010auth_result fired twice?
We have implemented an iRule to do OCSP check, based on the prize winner http://devcentral.f5.com/Default.aspx?tabid=108. When it encounters an error, it doesn't resume the suspended SSL::handshake.
when AUTH_RESULT {
if {[info exists tmm_auth_ssl_ocsp_sid] and $tmm_auth_ssl_ocsp_sid eq [AUTH::last_event_session_id]} {
set tmm_auth_status [AUTH::status]
log local0.debug "authentication status of round 1 $tmm_auth_status"
if {$tmm_auth_status eq 0} {
Success
log local0.debug "OCSP verfication succeeded"
SSL::handshake resume
lset ssl_array 1 "auth_success"
} elseif {$tmm_auth_status eq 1} {
Failure
log local0.debug "OCSP verfication failed"
SSL::handshake resume
lset ssl_array 1 "auth_failure"
} elseif {$tmm_auth_status eq -1} {
Error
log local0.debug "OCSP verfication error"
lset ssl_array 1 "auth_failure"
}
else {
Not authorised
log $tmm_auth_status
log local0.debug "OCSP verfication not authorised"
reject
}
session add ssl "$id any pool" $ssl_array 21600
}
}
From the client browser, we can see the error page shown promptly. However, in the log there is an TCL parsing error:
Feb 22 17:35:10 tmm tmm[1786]: 01220001:3: TCL error: cbz_ocsp_action - can't read "ssl_array": no such variable while executing "lset ssl_array 1 "auth_failure"
From the timestamp, it is more a couple of minutes later than the OCSP check. Looks like to me the event AUTH_RESULT is fired twice. Once right after the OCSP check, and the other a while later.
Not sure whether I need to resume the SSL::handshake?
12 Replies
- hoolio
Cirrostratus
In the example iRule, ssl_array is set in the CLIENTSSL_CLIENTCERT event. Is it possible that this event isn't triggering when a client doesn't present a cert? Else, can you add logging to the rest of the rule, reproduce the problem and post the logs?
Thanks,
Aaron - Craig_Reeve
Nimbostratus
Sorry for hijacking your issue Hui but mine sounds very similar to yours and my investigation my assist us both in getting a resolution :-)
I am running v 10.0.1 Build 354.0.
In my CLIENTSSL_CLIENTCERT event I perform an LDAP Authenticate based on information provided in the Client Certificate. This causes an AUTH_RESULT event to fire and I perform my checks and set variables etc. This all works great for a number of clients, however, a few (3 to be correct) cause a rogue AUTH_RESULT event to fire at exactly the same time as our TCP Idle Timeout setting is reached. This causes nothing but pain as there is in reality no result to check, thus I flag everything as invalid and hence any other data being sent using the SSL Session Id is then dropped.
Apart from doing an AUTH::authenticate, what else would cause the AUTH_RESULT to fire? Why don't the majority of clients get this extra AUTH_RESULT after their TCP Session Idle Timeout is reached?
An interesting side note is that we can get one of the clients to work if we change their environment. They are connecting to the internet via a LinkSys router plugged into an ADSL modem. If the the LinkSys is taken out of the equation, and the client device is plugged directly into the ADSL modem, everthing works as it should. How this small change can affect the events that are fired on the F5 is beyond me and may be coincidental. TCPdumps show no rogue data or flags coming from the device to the F5, but if it did I would expect a CLIENTSSL_DATA event to fire which it doesn't.
We are about to go to v 10.1 in the next week or so and this may resolve this issue as one of the fixes may make the TCP session shutdown cleanly rather than waiting for the F5 Idle Timeout to force a reset. Will post an update to advise the outcome of the update.
Thanks,
Craig - hui_37443
Nimbostratus
Aaron,
It happened on production and almost impossible for me to play around. Anyway, the issue has since been resolved. It has turned out the root cause is DNS server down. The OCSP server URL can not be resolved. However, there is potentially something wrong with the exception handling inside F5 process.
My assumption is that when the DNS resolution fails an exception is thrown so the OCSP authentication fails straight away. However, there is the normal process thread not be cleaned up and still in block waiting status, and finally gets timed out. Hence the "two-time fired" authe_result event.
Craig,
I am not a network expert, but I guess the LnkSys router should be doing some trick that confuses F5 session management...
Regards, - hoolio
Cirrostratus
Hi Hui,
Thanks for that information. A possible workaround is to define a virtual server which load balances a pool of OCSP server(s). If you use a VIP, you can also add a monitor to the pool. In your main OCSP iRule, you can then check the state of the OCSP pool before attempting the remote auth. I put an example of this in the Codeshare:http://devcentral.f5.com/wiki/default.aspx/iRules/client_cert_request_by_uri_with_ocsp_checking.html TODO: If the OCSP responder is an LTM VIP (used for load balancing multiple OCSP servers) you could add a check here of the OCSP server pool before attempting the OCSP validation. Just change my_ocsp_http_pool to the actual OCSP server pool name. Check if the OCSP server pool does not have any if {[active_members my_ocsp_http_pool] == 0}{ OCSP servers are not available!! log local0.emerg "$log_prefix: OCSP auth pool is down! Resuming SSL handshake and blocking HTTP request." Audit logging if {$::ocsp_audit_log_level > 0}{ catch {log -noname local0. "cc_audit: $log_prefix; status_text=OCSP server pool is unavailable. Blocking request."} } We could send an HTTP response from this event, but it doesn't actually get sent until the CLIENTSSL_HANDSHAKE event anyhow. So track that this is an invalid request and set the app auth status code to indicate OCSP validation of the cert failed. set invalidate_session 1 SSL::handshake resume return } TODO END:
I also did an example which handles resolution of the hostname for an OCSP server. I'll anonymize that rule and post it to the codeshare.
Aaron - hui_37443
Nimbostratus
Hi,
Me again. Now we have managed to replicate the issue. Log file shows the following strage behaviour. Our system has recently been upgraded to V10.2
First, the event CLIENT_ACCEPTED fires off,
Dec 13 15:29:58 local/tmm debug tmm[21217]: Rule cbz_ocsp_action CLIENT_ACCEPTED: inside event CLIENT_ACCEPTED 3772965817
Dec 13 15:29:58 local/tmm debug tmm[21217]: Rule cbz_ocsp_action CLIENT_ACCEPTED: inside event CLIENT_ACCEPTED 2928793149
Then CLIENTSSL_CLIENTCERT fires,
Dec 13 15:29:58 local/tmm debug tmm[21217]: Rule cbz_ocsp_action CLIENTSSL_CLIENTCERT: verify result ok
Dec 13 15:29:58 local/tmm debug tmm[21217]: Rule cbz_ocsp_action CLIENTSSL_CLIENTCERT: doing OCSP
Dec 13 15:29:58 local/tmm debug tmm[21217]: Rule cbz_ocsp_action CLIENTSSL_CLIENTCERT: verify result ok
Dec 13 15:29:58 local/tmm debug tmm[21217]: Rule cbz_ocsp_action CLIENTSSL_CLIENTCERT: doing OCSP
Event AUTH_RESULT follows, and so far so good.
Dec 13 15:30:12 local/tmm debug tmm[21217]: Rule cbz_ocsp_action AUTH_RESULT: authentication status of round 1 0, 3772965817
Dec 13 15:30:12 local/tmm debug tmm[21217]: Rule cbz_ocsp_action AUTH_RESULT : OCSP verfication succeeded
Dec 13 15:30:12 local/tmm debug tmm[21217]: Rule cbz_ocsp_action AUTH_RESULT: authentication status of round 1 0, 2928793149
Dec 13 15:30:12 local/tmm debug tmm[21217]: Rule cbz_ocsp_action AUTH_RESULT: OCSP verfication succeeded
Then, a couple of minutes later, actually 5 minutes sharp after CLIENTSSL_CLIENTCERT , AUTH_RESULT fires again! This time, it reports an verification error.
Dec 13 15:34:58 local/tmm debug tmm[21217]: Rule cbz_ocsp_action AUTH_RESULT: authentication status of round 1 -1, 2928793149
Dec 13 15:34:58 local/tmm debug tmm[21217]: Rule cbz_ocsp_action AUTH_RESULT: OCSP verfication error, ocsp:response:status good
Dec 13 15:34:58 local/tmm debug tmm[21217]: Rule cbz_ocsp_action AUTH_RESULT: authentication status of round 1 -1, 3772965817
Dec 13 15:34:58 local/tmm debug tmm[21217]: Rule cbz_ocsp_action AUTH_RESULT: OCSP verfication error, ocsp:response:status good
It doesn't always happen in that way. There are some SSL sessions just have one .
Here is my iRule,
when CLIENT_ACCEPTED { set tmm_auth_ssl_ocsp_sid [AUTH::start pam default_ssl_ocsp] log local0.debug "inside event CLIENT_ACCEPTED $tmm_auth_ssl_ocsp_sid" } when CLIENTSSL_CLIENTCERT { set ssl_cert [SSL::cert 0] set id [SSL::sessionid] set hash [b64encode [sha1 $ssl_cert]] set ssl_array [list blah1 blah2 $hash] lset ssl_array 0 [X509::verify_cert_error_string [SSL::verify_result]] log local0.debug "verify result [lindex $ssl_array 0]" if {[matchclass 1 equals $::cbz_ocsp_check]} { AUTH::cert_credential $tmm_auth_ssl_ocsp_sid $ssl_cert AUTH::cert_issuer_credential $tmm_auth_ssl_ocsp_sid [SSL::cert issuer 0] AUTH::subscribe $tmm_auth_ssl_ocsp_sid AUTH::authenticate $tmm_auth_ssl_ocsp_sid log local0.debug "doing OCSP" SSL::handshake hold } else { log local0.debug "ocsp bypassed" lset ssl_array 1 "ocsp_bypassed" session add ssl "$id any pool" $ssl_array 21600 } } when AUTH_RESULT { if {[info exists tmm_auth_ssl_ocsp_sid] and $tmm_auth_ssl_ocsp_sid eq [AUTH::last_event_session_id]} { set tmm_auth_status [AUTH::status] log local0.debug "authentication status of round 1 $tmm_auth_status, $tmm_auth_ssl_ocsp_sid" if {$tmm_auth_status eq 0} { Success log local0.debug "OCSP verfication succeeded" SSL::handshake resume lset ssl_array 1 "auth_success" } elseif {$tmm_auth_status eq 1} { Failure log local0.debug "OCSP verfication failed, [AUTH::response_data]" SSL::handshake resume lset ssl_array 1 "auth_failure" } elseif {$tmm_auth_status eq -1} { Error log local0.debug "OCSP verfication error, [AUTH::response_data]" lset ssl_array 1 "auth_failure" } else { Not authorised log local0.debug "OCSP verfication not authorised" reject } session add ssl "$id any pool" $ssl_array 21600 } } when HTTP_REQUEST priority 100 { log local0. "Inside ocsp" if {not([matchclass [string tolower [HTTP::path]] contains $::cbz_data_static])} { if {[matchclass 1 equals $::cbz_ocsp_check]} { set id [SSL::sessionid] set ssl_array1 [session lookup ssl "$id any pool"] set ssl_data0 [lindex $ssl_array1 0] set ssl_data1 [lindex $ssl_array1 1] log local0.debug "session id: $id ssl_data0: $ssl_data0 ssl_data1: $ssl_data1" if { $ssl_data0 contains "expired" } { log local0.debug "Certificate expired" HTTP::path "/common/common.web/html/CertificateExpired.html" } elseif { $ssl_data1 contains "auth_failure" } { log local0.debug "Authentication failure" HTTP::path "/Common/Common.Web/html/OcspFailed.html" } elseif { $ssl_data0 contains "ok" } { HTTP::header insert SSLCLientCertStatus $ssl_data1 } else { log local0.debug "unknow error" HTTP::path "/common/common.web/html/CertificateError.html" } } else { log local0.debug "Generate ocsp bypass header" HTTP::header insert OcspBypassed 1 } } }
I can think of two places specifying the 5 minutes limit. One is the OCSP profile, idle timeout. Default value is 300 secs, i.e. 5 minutes. Another one is "Validity period" of OCSP responder, which is configured to be 300 secs to be default value as well. - Craig_Reeve
Nimbostratus
Hi Hui,
I hope I aren't way off base but have you tried changing your Auth Profile timeout to 310 secs?
Why do this you ask? My problem as I noted in a post above turned out to be a session that didn't get its final TCP FIN packet so it essentially waited for the TCP timeout of 300 secs which then caused the F5 to force a TCP RESET, which sometimes occurred at the same time as the AUTH 300 secs timeout. What I got in the AUTH Event was garbage data. Easy solution was to increase the AUTH timeout by 10 secs to 310 secs. I think the F5 ENE came up with this one as 1 of 3 possible solutions.
I didn't update my findings here as I thought the issue you had was completely different to mine but decided to monitor this thread just in case. Interestingly this issue (Improper FIN closure) was supposed to have been fixed in 10.2 so I am surprised it sounds like it is still there.
Hope this works for you as it took many emails and tcpdumps to get a satisfactory resolution to our issue.
Regards, Craig - hoolio
Cirrostratus
Thanks for the tip Craig. Hui, if that doesn't work for you, can you edit your post or reply again with the iRule in [ code ] [/ code ] blocks (without the spaces) to preserve the iRule spacing?
Aaron - hui_37443
Nimbostratus
Craig, I've tried your solution but with no luck. However, it is quite interesting to notice that the second auth_result event fires 310 secs later, instead of original 300! Therefore it suggests that the auth profile idle time setting dictates when the second auth_result event occurs (if there is one). In other words, the second auth_result is actually a timeout.
Apart from that we try to switch off OneConnect profile. Intial test result is quite encouraging. My observation is that when the incident happens, there are concurrent SSL handshake sessions. Chances are OneConnect confuses the OCSP authentication session, and auth object is not closed properly although the authentication result has come back a while ago.
A while ago, we had a different issue with oneconnect. NTLM authentication module on the web tier, behind LTM, was totally confused by the OneConnect TCP trunck. That's why we suspect whether OCSP gets tricky as well.
Anyway, code posted here.when CLIENT_ACCEPTED { set tmm_auth_ssl_ocsp_sid [AUTH::start pam default_ssl_ocsp] log local0.debug "inside event CLIENT_ACCEPTED $tmm_auth_ssl_ocsp_sid" } when CLIENTSSL_CLIENTCERT { set ssl_cert [SSL::cert 0] set id [SSL::sessionid] set hash [b64encode [sha1 $ssl_cert]] set ssl_array [list blah1 blah2 $hash] lset ssl_array 0 [X509::verify_cert_error_string [SSL::verify_result]] log local0.debug "verify result [lindex $ssl_array 0]" if {[matchclass 1 equals $::cbz_ocsp_check]} { AUTH::cert_credential $tmm_auth_ssl_ocsp_sid $ssl_cert AUTH::cert_issuer_credential $tmm_auth_ssl_ocsp_sid [SSL::cert issuer 0] AUTH::subscribe $tmm_auth_ssl_ocsp_sid AUTH::authenticate $tmm_auth_ssl_ocsp_sid log local0.debug "doing OCSP" SSL::handshake hold } else { log local0.debug "ocsp bypassed" lset ssl_array 1 "ocsp_bypassed" session add ssl "$id any pool" $ssl_array 21600 } } when AUTH_RESULT { if {[info exists tmm_auth_ssl_ocsp_sid] and $tmm_auth_ssl_ocsp_sid eq [AUTH::last_event_session_id]} { set tmm_auth_status [AUTH::status] log local0.debug "authentication status of round 1 $tmm_auth_status, $tmm_auth_ssl_ocsp_sid" if {$tmm_auth_status eq 0} { Success log local0.debug "OCSP verfication succeeded" SSL::handshake resume lset ssl_array 1 "auth_success" } elseif {$tmm_auth_status eq 1} { Failure log local0.debug "OCSP verfication failed, [AUTH::response_data]" SSL::handshake resume lset ssl_array 1 "auth_failure" } elseif {$tmm_auth_status eq -1} { Error log local0.debug "OCSP verfication error, [AUTH::response_data]" lset ssl_array 1 "auth_failure" } else { Not authorised log local0.debug "OCSP verfication not authorised" reject } session add ssl "$id any pool" $ssl_array 21600 } } when HTTP_REQUEST priority 100 { log local0. "Inside ocsp" if {not([matchclass [string tolower [HTTP::path]] contains $::cbz_data_static])} { if {[matchclass 1 equals $::cbz_ocsp_check]} { set id [SSL::sessionid] set ssl_array1 [session lookup ssl "$id any pool"] set ssl_data0 [lindex $ssl_array1 0] set ssl_data1 [lindex $ssl_array1 1] log local0.debug "session id: $id ssl_data0: $ssl_data0 ssl_data1: $ssl_data1" if { $ssl_data0 contains "expired" } { log local0.debug "Certificate expired" HTTP::path "/common/common.web/html/CertificateExpired.html" } elseif { $ssl_data1 contains "auth_failure" } { log local0.debug "Authentication failure" HTTP::path "/Common/Common.Web/html/OcspFailed.html" } elseif { $ssl_data0 contains "ok" } { HTTP::header insert SSLCLientCertStatus $ssl_data1 } else { log local0.debug "unknow error" HTTP::path "/common/common.web/html/CertificateError.html" } } else { log local0.debug "Generate ocsp bypass header" HTTP::header insert OcspBypassed 1 } } } - hoolio
Cirrostratus
Hui,
After reading through some of Craig's case notes, I can see this is a fairly complicated issue. It would probably be most efficient for you to open a case with F5 Support and provide them with tcpdumps and LTM logs showing the issue you're seeing intermittently. If you do so, have them reference C656224 as it may have some related info.
Aaron - Craig_Reeve
Nimbostratus
Hui, It's a shame this didn't work out for you but sounds like you may have another piece in your puzzle. Maybe 310 secs is too still low for your situation but at the end of the day the sessions should terminate cleanly anyway and hence these timeouts shouldn't be invoked. We never found out why most of our customers worked correctly and only a handful consistently failed with the duplicate AUTH_RESULT. Given that the client devices were exactly the same running the same code we put it down to either their internal network configuration, their ADSL modem vendor, or their ISP, or a mixture of those, or something we just hadn't thought of. In other words we have no idea but as the 310 sec timeout change resolved the issue for these very few occurrences we stopped looking. I think Aaron is right in suggesting you open a support case as if your issue and mine are loosely related, maybe having two sets of different data may assist F5 in determining the cicumstances that causes this to happen.
Aaron, you are a brave man hunting down and reading my case notes for this. ;-)
Craig
Recent Discussions
Related Content
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects
