04-Oct-2018 09:07
I'm running into this well known KCD SSO error. I have APM performing the necessary SSO variable definitions using LDAP queries which map certificate IDs (Domain userPrincipalName) to sAMAccountNames and then using the sAMAccountName within the KCD WebSSO profile within the access policy. The service account I am using of course has "use any auth protocol" and the appropriate HTTP/fqdn SPN hard coded to rule out reverse lookup issues for dynamic SPN creation by APM. What I am seeing is:
Upon first login with APM SSO, my service account SPN gets a TGT and then fails to get the HTTP/service ticket with the error "Requesting ticket can't get forwardable tickets (-1765328163)"
I kill the APM session and restart - Now when I log in, I pull the ticket for the user, but IIS throws up a few 401's with a login prompt for a three or so URIs. I "cancel" on each and then pass through to the web resource (200 OKs)
I kill the APM session and re-login - Now I see APM debug grabbing the cached ticket and I seamlessly pass through to the desired web resource.
So basically it works... I just need to run through APM three times for everything to work seamlessly. The first time I cannot get a service ticket, the next time IIS doesn't accept the ticket I present, the last time everything is 200 OK and there are no issues.
Any ideas?
04-Oct-2018 09:36
Can you by chance watch the Kerberos traffic between APM and the DC? Capture with tcpdump and load the pcap into Wireshark. I'm wondering if APM is contacting the same KDC every time, or maybe the KDC isn't returning good information.
04-Oct-2018 10:28
Kevin - we will be wiresharking the traffic this afternoon - hopefully it sheds some light on the issue.
04-Oct-2018
11:52
- last edited on
02-Jun-2023
08:16
by
JimmyPackets
Kevin, What we are seeing in wireshark on the DC is the TGT request and TGS request completing without issue. The F5 is requesting a forwardable ticket per the option fields. We see no KRB errors.
On the APM WEBSSO set to debug - we see the first
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0001:7: ssoMethod: kerberos usernameSource: session.ldap.last.attr.sAMAccountName userRealmSource: session.logon.last.domain Realm: DOMAIN.NAME KDC: 10.10.227.54 AccountName: HOST/serviceacct spnPatterh: HTTP/somesite.com@DOMAIN.NAME TicketLifetime: 600 UseClientcert: 0 SendAuthorization: 0
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0001:7: ctx: 0x5a905598, CLIENT: TMEVT_REQUEST
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0001:7: ctx: 0x5a905598, CLIENT: TMEVT_REQUEST_DONE
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0001:7: ctx: 0x5a905598, CLIENT: TMEVT_SESSION_RESULT
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0001:7: ctx: 0x5a905598, CLIENT: TMEVT_SESSION_RESULT
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0001:7: ctx: 0x5a905598, CLIENT: TMEVT_SESSION_RESULT
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0001:7: ctx: 0x5a907250, SERVER: TMEVT_REQUEST
Oct 4 14:46:25 F5-APM info websso.1[14917]: 014d0011:6: 4e45c27d: Websso Kerberos authentication for user 'eric.haupt1' using config '/Common/Kerberos_SSO'
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0046:7: 4e45c27d: adding item to WorkQueue
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0021:7: sid:4e45c27d ctx:0x5a905598 SPN = HTTP/somesite.com@DOMAIN.NAME
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0023:7: S4U ======> ctx: 4e45c27d, sid: 0x5a905598, user: eric.haupt1@DOMAIN.NAME, SPN: HTTP/somesite.com@DOMAIN.NAME
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0001:7: Getting UCC:eric.haupt1@DOMAIN.NAME@DOMAIN.NAME, lifetime:36000
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0001:7: fetched new TGT, total active TGTs:1
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0001:7: TGT: client=HOST/serviceacct@DOMAIN.NAME server=krbtgt/DOMAIN.NAME@DOMAIN.NAME expiration=Fri Oct 5 00:46:25 2018 flags=40610000
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0001:7: TGT expires:1538714785 CC count:0
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0001:7: Initialized UCC:eric.haupt1@DOMAIN.NAME@DOMAIN.NAME, lifetime:36000 kcc:0x5a907aa0
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0001:7: UCCmap.size = 1, UCClist.size = 1
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0001:7: S4U ======> - NO cached S4U2Proxy ticket for user: eric.haupt1@DOMAIN.NAME server: HTTP/somesite.com@DOMAIN.NAME - trying to fetch
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0001:7: S4U ======> - NO cached S4U2Self ticket for user: eric.haupt1@DOMAIN.NAME - trying to fetch
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0001:7: S4U ======> - fetched S4U2Self ticket for user: eric.haupt1@DOMAIN.NAME
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0001:7: S4U ======> trying to fetch S4U2Proxy ticket for user: eric.haupt1@DOMAIN.NAME server: HTTP/somesite.com@DOMAIN.NAME
Oct 4 14:46:25 F5-APM err websso.1[14917]: 014d0005:3: Kerberos: can't get S4U2Proxy ticket for server HTTP/somesite.com@DOMAIN.NAME - Requesting ticket can't get forwardable tickets (-1765328163)
Oct 4 14:46:25 F5-APM err websso.1[14917]: 014d0024:3: 4e45c27d: Kerberos: Failed to get ticket for user eric.haupt1@DOMAIN.NAME
Oct 4 14:46:25 F5-APM debug websso.1[14917]: 014d0001:7: ctx: 0x5a907250, SERVER: TMEVT_NOTIFY
Oct 4 14:46:25 F5-APM err websso.1[14917]: 014d0048:3: 4e45c27d: failure occurred when processing the work item
04-Oct-2018 11:58
Then right after that - it seems the the F5 then finds the newly cached ticket and processes properly. I won't post debug, but we can trace quite a few GETs with some of them showing S4U===>OK and others seeming like they have no ticket and must request one. Our SSO config is set to be pretty vanilla- send auth always 600 timeout. We are on 11.6.1 HF2 - getting ready to go to 13.1.1.
Within the SSO profile - I've defined the KDC by IP to rule out any DNS issues. This is not our primary Domain/Realm. This is a tenant Domain with the F5 fronting a single webservice. The realm and KDC options have also been defined as a separate realm within /etc/krb5.conf and of course the SPNs and the service accounts are specific to this domain logic.
I should add that the FQDN for the service SPN is not part of the back-end domain... but this shouldn't matter I don't think, as long as the service account in the application pool on IIS is the SPN holder and part of the Realm I'm operating in with my service account with delegation to that SPN.
08-Oct-2018 03:43
Eric,
I've been staring at this for a while and this just caught my eye, "I should add that the FQDN for the service SPN is not part of the back-end domain... but this shouldn't matter I don't think". It actually does matter. The APM SSO service account MUST be in the same realm as the target service. Users don't have to be in the same realm, but these account must be.
09-Oct-2018 06:57
Kevin, So: if the realm is "internal.com" but the user hit the FQDN app1.external.com and the SPN in the @INTERNAL.COM" REALM is HTTP/app1.external.com and the F5 service account is delegated as a host/ for the service SPN
Then the APM SSO service account should be in @INTERNAL.COM? Because it is. The SPN defined for the external service is the only object of the KCD transaction that doesn't define the internal realm in any way.
09-Oct-2018
07:32
- last edited on
02-Jun-2023
08:14
by
JimmyPackets
Yes, the APM SSO service account and the target resource must always be in the same realm.
The "Requesting ticket can't get forwardable tickets" error will generally happen if either delegation settings are misconfigured, or there's a duplicate SPN on an endpoint service. Have you checked for duplicate SPNs?
setspn -X
09-Oct-2018
07:52
- last edited on
02-Jun-2023
08:14
by
JimmyPackets
Getting different error now... the user variables are valid... hrmm. the ticket being requested looks valid. I have an SSO credential object at the end of my policy assigning sAMA... My SSO profile is currently set for explicit IP to KDC with service SPN explicitly defined. The service SPN is not duplicated... this is a small tenant domain... with just one app running.
Oct 9 10:47:19 hostpxnapm13 info websso.1[20055]: 014d0011:6: 27b40f75: Websso Kerberos authentication for user 'eric.haupt1' using config '/Common/ACCS_Kerberos-app1.domain.com'
Oct 9 10:47:19 hostpxnapm13 debug websso.1[20055]: 014d0046:7: 27b40f75: adding item to WorkQueue
Oct 9 10:47:19 hostpxnapm13 debug websso.1[20055]: 014d0021:7: sid:27b40f75 ctx:0x5ab013f0 SPN = HTTP/app1.domain.com@INT.REALM
Oct 9 10:47:19 hostpxnapm13 info websso.1[20055]: 014d0022:6: 27b40f75: Kerberos: realm for user eric.haupt1 is not set, using server's realm INT.REALM
Oct 9 10:47:19 hostpxnapm13 debug websso.1[20055]: 014d0023:7: S4U ======> ctx: 27b40f75, sid: 0x5ab013f0, user: eric.haupt1@INT.REALM, SPN: HTTP/app1.domain.com@INT.REALM
Oct 9 10:47:19 hostpxnapm13 debug websso.1[20055]: 014d0001:7: Getting UCC:eric.haupt1@INT.REALM@INT.REALM, lifetime:36000
Oct 9 10:47:19 hostpxnapm13 debug websso.1[20055]: 014d0001:7: Found UCC:eric.haupt1@INT.REALM@INT.REALM, lifetime:36000 left:32140
Oct 9 10:47:19 hostpxnapm13 debug websso.1[20055]: 014d0001:7: UCCmap.size = 1, UCClist.size = 1
Oct 9 10:47:19 hostpxnapm13 debug websso.1[20055]: 014d0001:7: S4U ======> - NO cached S4U2Proxy ticket for user: eric.haupt1@INT.REALM server: HTTP/app1.domain.com@INT.REALM - trying to fetch
Oct 9 10:47:19 hostpxnapm13 debug websso.1[20055]: 014d0001:7: S4U ======> - NO cached S4U2Self ticket for user: eric.haupt1@INT.REALM - trying to fetch
Oct 9 10:47:20 hostpxnapm13 err websso.1[20055]: 014d0005:3: Kerberos: can't get S4U2Self ticket for user eric.haupt1@INT.REALM - Generic error (see e-text) (-1765328324)
Oct 9 10:47:20 hostpxnapm13 err websso.1[20055]: 014d0024:3: 27b40f75: Kerberos: Failed to get ticket for user eric.haupt1@INT.REALM
Oct 9 10:47:20 hostpxnapm13 err websso.1[20055]: 014d0048:3: 27b40f75: failure occurred when processing the work item
09-Oct-2018
08:44
- last edited on
02-Jun-2023
08:14
by
JimmyPackets
Don't use an SSO Credential Mapping agent for Kerberos SSO. You don't need it. The SSO profile has two session variable inputs, session.sso.token.last.username, and session.logon.last.domain. You simply need to make sure these session variables are populated before the end of the policy, and the domain variable is usually statically set.
session.logon.last.domain = expr { "INTERNAL.COM" }
And your username variable can either be the sAMAccountName (preferred) or UPN.
session.sso.token.last.username = expr { "bob" }
In fact you can isolate SSO for testing by simply assigning these values statically in the VPE.
09-Oct-2018
09:48
- last edited on
05-Jun-2023
21:55
by
JimmyPackets
Okay, so some additional thoughts.
In your APM Kerberos SSO, ensure that the Send Authorization setting is set to "Always" for Microsoft services. For anything else it depends on the version of Kerberos they use (MITv5 or SPNEGO).
Make sure time is good between APM and the KDC and target server.
Set a static SPN in the APM SSO and disable all but one pool member at a time to make sure it's just not one of the servers having an issue.
Expand your Wireshark inspection to include DNS traffic between APM and the DC.
kerberos or dns or http
The KRB_ERR_GENERIC message can happen if,
If you're using a domain user account as the IIS application pool owner, you need to disable Windows Integrated Authentication kernel mode.
To clear the Kerberos SSO cache between tests:
bigstart restart websso
To add debug logging (remember to turn this off when you're done):
tmsh modify sys db log.sso.level value debug
For even more debug logging:
export KRB5_TRACE=/tmp/krb5trace
tail -f /tmp/krb5trace
11-Oct-2018 13:46
Just my 2c, might not be relevant to your situation.
I experienced something similar when I was trying to set up an office online server and attach it to our SharePoint VIP with smart card auth. Turns out I didn't need to mess with SPNs/configure Kerberos or anything. SharePoint ACLs were handling the access to the files and the IIS site used anonymous authentication.
12-Oct-2018 08:28
action_; My deployment definitely requires SSO with KCD variables. My APM essentially does client-cert auth from any user anywhere and then proxies this user through into an internal sharepoint on a private domain/realm. We use identifiers within the cert the client provides to validate the client within the AD domain. All they need to provide is their cert and PIN to use it. APM takes care of the rest. From Sharepoint's perspective, all the IIS front-ends see is the F5 float IP making connections and sending TGS tickets on behalf of domain users.
I went from 11.6.1 HF2 to 13.1.1 last night with no change to this particular issue, but I really didn't expect things to change. My suspicion is something in this tenant domain, but I can't point fingers just yet...
12-Oct-2018 13:46
By tenant domain, do you a separate trusted domain for user accounts?
If so, there are a few things you need to do:
29-Oct-2018 12:43
Kevin,
There won't be any trust between these domains. I call this a "tenant" domain from a network perspective. It exists completely separate from our primary infrastructure for security directives a long time ago.
APM SSO account and the target service are in the same domain
DNS resolution... I configured host entries for this domain (6 servers) on the LTM/APM but it would be better to be able to define DNS "domains" in TMOS for things like this. I have this cluster leveraging a GTM DNS Delivery listener and I use an irule to select the DNS pool based on DNS Question name... so requests with names containing "*tenant.domain" go to the two tenant DCs. I need to revalidate this. This domain/Realm has also been defined in krb5.conf.
The APM SSO account is specific to this domain, but does not contain the full domain name. Is this a requirement or are you suggesting simply for best practice? This will be my only application that is off of my primary domain.
I opened a TAC case on this... but so far nothing has been identified as erroneous in the config.
29-Oct-2018 12:49
There won't be any trust between these domains
I'm confused now. The trust I'm referring to would be between whatever domain the target server is in, and whatever domain the users are in. You absolutely MUST have a two-way transitive trust between these domains to enable cross-domain access through APM Kerberos SSO.
And if this is a multi-domain environment, you should use full SPN values everywhere to remove any ambiguity. Yes, the APM SSO account is specific to this domain, but you should still refer to it by its full SPN name (not a short name).
30-Oct-2018 07:21
Kevin, The APM is performing auth gateway functions for users with two things: 1 - a PKI cert, 2 - a user account in the tenant domain.
The users must have a app domain account to access the service. What APM will do is simply take the cert ID (UPN) and then proxy these users to the service via KCD. I do not care where the user is in the world as long as they have a valid cert with a UPN for a valid account within the app domain.
There is no cross domain here at all except for the LTM/APM being configured to support SSO on two domains depending on the APM policy configs. For this application, you can consider it single-domain - but the F5 itself is supporting two domains but not at the same time for the same application. From the Kerberos realm perspective, these users are all local and the communication is from the serverside float and F5 SPN to the IIS front-end.
31-Oct-2018 02:24
Now it's all starting to come together. 😉
So a few additional questions,
31-Oct-2018 12:16
Kevin,
We can replicate the issue every time. When it works I see the "S4U==OK" - note that I see this after the first attempt. It's just on the first attempt I see the "Cannot get forwardable" ... the second time I get "S4U==OK" but I don't process through fully without a "401" back. The third time I get "S4U==OK" and process through to the webapp with "200 OK" everytime.
In the packet captures we've taken, we see nothing out of the ordinary except a "response too big" or something along those lines for the TGT initially.
I can't allow APM to define the SPN using DNS so my patterns are always fully crafted in the SSO profile - This is just our standard.
I'm working to get access to the KDCs and IIS myself to perform packet captures do hopefully do a more thorough inspection. The config on the F5 side should work... everything is in place and correct and a similar config has worked flawlessly for use for over 2 years with hundreds of clients per day.
Now: the only difference I can think of is the service account. On the primary domain I can use host/service.account and it works fine. That is how I have the other domain configured. I will set a new service account with the host/service.account.domain.name and retest.
I have a case open... but nothing noted just yet. I'm getting ready to go from 13.1.1 to 13.1.1.2 in the next few days as well.
Thanks
31-Oct-2018 13:16
Hmm.
Can you also look at the clear text traffic to the server? I’m assuming, but it’d be good to validate that on every attempt APM actually does pass a Kerberos AP_REQ ticket to the app, and that for some reason the app doesn’t accept it. If you can get a wireshark view of the APM-server traffic you should be able to see the full Kerberos details. And if that’s true, it might imply that the server is at fault, the KDC is issuing a ticket to the wrong service, or there’s something wrong with the ticket.
02-Nov-2018 10:58
Kevin, we have time set aside next week to dig back into this and I'll let you know what we see. I'm working another app issue (the 401 response thing I have another question out for)
Thanks for all your guidance.
20-Nov-2018 09:57
I stood up an i2800 on TMOS 13.1.1.2 configured just for this domain and application. Same indications so it appears Kerberos within this domain either has issues or has a configuration outside the bounds of F5's recommended KCD logic. Every application I configured on F5 in my primary domain works without issue for Kerberos. I was kind of hoping something pointed to F5 under my control, but it appears to be domain specific or related to a configuration attribute outside of the baseline for KCD SSO. Frustrating.... I hate this application.
20-Nov-2018 10:02
Eric, you may have answered this already, but what AD version is this domain on? Resource-based delegation was introduced in Windows 2012 and made the default setting in later versions. APM Kerberos SSLO does not support resource-based delegation.
18-Dec-2018 06:57
Kevin - 2012R2 for both domains. The one that I have many webapps working fine with KCD and this one domain that is being problematic. I suspect there are hardening differences with the configs but that is just speculation. I've taken some time away from this issue due to other things, but I'll have to get back into this after the holiday. I spoke with Jason Wilburn about it and he is going to take a look at my case when I re-open it.