Start Free Trial

asked on

Successful 802.1X Authentication then re-authenticates to MAB

OK, this is a strange one.

I have a functioning 802.1X wired solution using Cisco switches, Cisco ISE and the native Microsoft supplicant (Win7 & Win10). The authentication design uses certificates (machine & user) with PEAP/EAP-TLS. I also have some specific AuthZ profiles that control onward access.

I'm seeing legitimate, corporate machines authenticate via 802.1X with certificates, receiving the correct AuthZ policy and profile. All good, works as per design.

Then, 3 minutes later the very same machine re-authenticates using MAB! Consequently receives a different AuthZ policy & profile and different network access.

At the moment, the solution is still in Monitor Mode so there is no impact to users. However the project is under pressure to move to Low Impact Mode as soon as possible.

This could be something I have configured incorrectly and I'd be happy to accept responsibility but I don't understand why a Windows machine that had successfully authenticated to the network via 802.1X with certs and AD credentials would then return to submit a MAB authentication request?

Cheers
Dave

hello,

Please send the switch config

Is your switchport authentication priority order set to MAB then dot1x?

ASKER

Hi,
I'll supply switch configs tomorrow but no, we are running IBNS 2.0 on the switches where I am observing this behaviour. Either ISE is prompting for a (very) quick re-authentication or the endpoint supplicant is taking matters into its own hands.

Have you try to authenticate to another Machine?

ASKER

This is happening to many machines. Both Windows 10 desktops & laptops as well as Windows 7 desktops & laptops (yes, there are still Win7 machines in Production!).

As explained, the difficulty is that when the machine re-authenticates via MAB, it receives a completely different AuthZ profile with a different dACL. I need to figure out if it is the MS supplicant that is performing the re-auth, or if it is ISE prompting for a re-auth.

In the "successful" Wired_Dot1X AuthZ profile I have a session re-auth timer set at 28,800 seconds (8hrs) so this should be honoured by the switch but it doesn't seem to happen.

Similarly, for the "failed" DOT1X AuthZ profile I have a session re-auth timer set at 3600 seconds (1hr) because in my opinion a device that has authenticated with a mac-address only is less trusted than a device that has successfully authenticated via 802.1X using certifcates...

I'll add some logs tomorrow.

Even with IBNS you have an authentication priority and order. When you have the order set to dot1x then MAB that just tells the switch what to do first. If the priority is set to MAB then dot1x, after a successful dot1x authentication the MAB credentials will be used too, so you get the effect of a dot1x authentication then a subsequent MAB authentication.

ASKER

Default switchport configuration (IBNS 2.0)
interface GigabitEthernet7/0/14
description ** Universal Endpoint Interface **
switchport access vlan xx
switchport mode access
switchport nonegotiate
switchport voice vlan yy
device-tracking attach-policy IPDT_POLICY
authentication periodic
authentication timer reauthenticate server
access-session control-direction in
access-session port-control auto
mab
dot1x pae authenticator
dot1x timeout tx-period 7
dot1x max-reauth-req 3
auto qos trust dscp
spanning-tree portfast
service-policy type control subscriber PORT-AUTH-POLICY
service-policy output AutoQos-4.0-Output-Policy
end
!
Default Policy-Map (IBNS 2.0)
policy-map type control subscriber PORT-AUTH-POLICY
event session-started match-all
10 class always do-until-failure
10 authenticate using dot1x priority 10
event authentication-failure match-first
5 class DOT1X_FAILED do-until-failure
10 terminate dot1x
20 authenticate using mab priority 20

So the port configuration refers to the overall 'PORT-AUTH-POLICY' which contains policy-maps for instructions and class-maps for actions.
The policy-map above states (i)match all conditions, (ii) do until failure (iii) use dot1x and only drop to MAB if dot1x fails. I am seeing successful dot1x authentications, swiftly followed by a MAB authentication. So dot1x has not failed, it succeeded. Why has MAB been invoked?

Please see attached screen dump as well....

User generated image

Cheers

Can you post the output from the following as soon as a client authenticates using dot1x?...

show auth sess int gi7/0/14 detail

Open in new window

Then again after the MAB authentication, please?

Also, please post screenshots of your authz profiles for each authz policy. Just the summary of the RADIUS attributes from the bottom of the page will do.

Your switch policy does look right, so I'm guessing there's an authz failure at the switch.

ASKER

I'll add these extracts tomorrow with a "show clock" leader
Under that command I usually see a dot1x authc status, the port authorized and the device in the data domain with a re-authentication timer of 28800sec
Then minutes later a "dot1x stopped", mab authc success with a new re-authentication timer of 3600sec.

Something is causing that secondary MAB authentication.

I'm starting to think it may be the device profiles. I'm seeing some legitimate machines that AuthC properly with certs getting classed as "Microsoft Workstation" rather than an 'Approved-' profile.

ASKER

A quick update folks. After deeper investigation, we've found a number of things wrong/not right with both the ISE configuration and the resultant GPO delivery to clients.

I have a period of monitoring to do on my pre-Prod platform and if that fixes things, the changes will be promoted into Production.

I will update this page tomorrow evening. It is looking positive at the moment.

Thanks for your help to date..

ASKER

OK, it's been too long since I updated this question. I apologise, things have been developing.

1. The continual reboot was traced down to the default CoA action being 'port bounce' and not, per design, re-auth. Unfortunately, at this stage of delivery, many people have an interest in ISE and I do not have control over them all. CyberArk is my new friend here!
2. The problem with known good assets authenticating with MAB is a deeper, darker topic and one that has driven me to an unhealthy compromise. Since my original design of machine & user certificate (which was working 100%), a second machine certificate has been deployed to all managed assets, unbeknownst to me, which has very different key usages and certificate fields. Due to the processing order of GPO's, this new certificate took precedence in the trusted store and was being offered up by the MS native supplicant as a bone fide machine certificate --which it clearly was not. I've now had to dial down my security design to accommodate this 'other' machine certificate which has annoyed the cr@p out of me.
3. I am committed to resolving this issue, properly and securely once I get the "box tickers" out of my way...

Thanks for your thoughts to date folks. I do appreciate you guys. I'll close this as an 'own solution' even though it has been forced upon me....

ASKER CERTIFIED SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

A few things don't sound right here. Maybe we can sort it without too much fuss.

First, Port Bounce will effectively shut/no shut the port after profiling, so the process would run dot1x again once the bounce occurs. That should only happen once though as it's a profiler function so unless attributes change at each authorization, you will not see it again. If the CoA does occur every time it may be because you don't have the Endpoint Attribute Filter enabled, or your version of ISE doesn't support it.

Second, the GPO pushing your 802.1x config is configured incorrectly by the sound of it. It is likely using simple certificate selection. You can tie this down by specifying which certificate attributes are used to choose which certificate to present as the identity when responding to EAPoL requests. You should not need to relax security simply because you have multiple certificates with multiple EKU policies.

ASKER

Still happening. Annoyingly.
So, machine completes AuthC (/w certificate). Good. Gets MARS. Fine.
User/Colleague doesn't log in for a few minutes. Machine moves to MAB authentication after 4 minutes as posted.
Machine receives a different AuthZ profile /w different access privileges.
If User/Colleague logs in (/w certificate) after machine has done, session stays as "dot1x_wired" AuthC/AuthZ policy and correct AuthZ profile is applied.

Question: If trusted asset (machine only) has completed a dot1x transaction (using internal PKI certs) in the first instance, am I actually interested in any further MAB interactions? Can I safely remove this secondary AuthZ profile?

4 minutes sounds suspiciously like the posture remediation default timer.

Anyway, what's your actual use-case? I'm assuming you're authenticating the computer in order to let it process GPOs, etc? Then after that you authenticate the user in order to authorize them differently (VLAN, dACL, etc.)?

ASKER

Thanks, not doing any posture at all. Customer just wants binary authentication as an initial starting point.

Correct, authenticating the computer first to process GPO's etc. which seems to work fine. I'm starting to doubt my security policy, as mentioned before, in that do I care what the machine (a.k.a. "headless device") does after its successful dot1x AuthC?

My AuthZ policy for wired MAB has definitive use cases for headless devices that cannot take part in 802.1X conversation, so these workstations drop into the "default/deny" rule space :(

Main concern is that I am under a LOT of pressure to move to Low Impact Mode and I don't feel comfortable. So maybe I need to lower the security bar of the design? Really don't want to, but equally don't want to cause unplanned business outages!

You shouldn't need to reduce security to get things the way you want them. ISE is a complicated product with so much granularity (which is a good thing), which means it can be difficult to configure in a way that does exactly what you need it to do. Most of the time it is the logic of Authc/Authz which causes issues, but almost all problems can be sorted with minor tweaks.

Can you post your Authc and Authz policy? Obviously obfuscate any info you don't want to be seen. I think there's some condition in your Authc rules somewhere which is only being matched after the initial computer authentication that is causing the problem.

ASKER

Thanks, will do but tomorrow now.
Not sure if I mentioned but if the machine successfully authenticates with dot1x /w certificate and then the user also authenticates with dot1x /w certificate session stays correct. Re-auth timer of 28800 is deployed to session and things couldn't be better.
Problem seems to centre around successful machine auth via dot1x then no following user login.

Ok cool.

Ah so ok, I think I see what's going on. The bit that's throwing me is why after 4 mins does it do something different. Is the session timer 28800 or the reauth timer? They are actually two distinctly separate things. You can reauth only during a valid session, so the reauth timer is usually less than the session timer. If this is how you have it configured you may have an accounting problem which is causing knock-on issues with session ownership. Which version and patch of ISE are you running and how many PSNs do you have?

You said you just want binary authentication to start with. So does that mean there's no need to actually authenticate the user when they login at the network level? You're not doing any VLAN change, dACL, etc.? If not, set the computer to use "Computer Authentication" rather than the default "User or Computer Authentication". That will simplify things.

ASKER

Thanks,

OK, binary authentication was the customers words. I do want to authenticate both machine & user via certificates and hope to move the solution forward to use TEAP with ISE 2.7 by Q2 2021.

Currently running ISE v2.6p7 + hotfix. Deployment is global, in 6 datacentres each with x2 PSN's behind F5 load-balancers so x12 PSN's in production plus smaller replica pre-Prod/Test deployment /w x2 PAN's & x4 PSN's.
The pre-Prod deployment is not under any change control so can make policy changes any time by design.

Currently no plans to do VLAN changes but have got pxGrid built out /w Tanium to invoke a policy based quarantine condition via dACL for non-compliant/misbehaving endpoints.

I'm pretty sure I've set the re-authentication timer to 28800 via the AuthZ Profile attribute. No access right now but will check in the morning.

Ok, first thing, best advice I can give you is that if you don't need to authenticate the actual user, don't! There's no need to reauthenticate unless you want to do something different with the endpoint so having to reauth only complicates things and puts unnecessary load on ISE, creates multiple sessions, clogs the database, etc. Believe me, I've seen some pretty catastrophic issues when ISE falls over!

Second thing, I wish you'd said they were behind F5s earlier! :-)

That may be the problem. How are you doing session persistence? If you're following the Cisco F5 guide it's not great. I've done loads of deployments with F5s and had to tell Cisco how to do it in the end as their iRule was broken. Accurate session persistence and accounting is absolutely key in a load-balanced deployment and the iRule in the guide just doesn't do it properly. Basically if you have one PSN doing the RADIUS auth and an accounting update is sent to the other PSN in the pair, it will send a CoA as it thinks it's a new session. This is sometimes referred to as a "phantom" session. In ISE 2.7 this problem should go away as the Light Data Distribution function is enabled. This replicates sessions to other PSNs in the deployment. In your case it would enable sessions to be replicated between PSNs in their own Node Group. You are using Node Groups, right? :-)

Can you run a RADIUS Authentication report for a MAC address that you're seeing the issue on? That may help me see what is happening a little better. Also, the F5 config might be useful. I'd need to see the VIP, iRule and Persistence profile configs.

ASKER

OK, would really prefer to auth both machine & user in an ideal sense and when it works, it works sweetly with no issues. PEAP(EAP-TLS) works fine. Yes node groups in each DC with local PSN members only.

F5 session persistence based on calling-station-id and referencing published iRule in the guide by Craig Hyps. F5's managed by separate team, but consistent configs deployed around the globe. Can you detail the changes needed in the iRule at all? Getting a copy of the config might be a stretch but I'll ask.

I don't think I'm seeing the re-auth to a different PSN but I'll check again today.

ASKER

Quick thought. So would disabling/removing the node group help here? That way the calling-station-id session persistence would be honoured by a single PSN throughout. I can change this fairly easily in my pre-Prod environment and monitor behaviour. Also got some switch level RADIUS and Dot1X debugging happening soon -hopefully! Again, I don't have RW access to the network...

I hear what you're saying, but really if there's no technical or operational "need" to do it I would suggest you don't. ISE likes simple :-)

Removing the node-group won't help. Really it's only used for CWA failover where clients are waiting to provide credentials.

What I would do to test is just disable one of the PSNs in a group (disable the node at the F5) and see what happens. That will prove whether it's anything to do with the F5 config or not.

Even if you can just get a copy of the iRule it might help. If you've read any of the Cisco Communities posts related to F5 RADIUS persistence you might see that there was quite a public disagreement around how the iRule worked (or not). Also, the syntax changes slightly between some major versions of LTM code.

ASKER

A couple of excerpts from the machine only successful dot1x authentication followed by MAB re-auth. No change in PSN :(
User generated image

User generated image

User generated image

Re-Auth Timer Config

User generated image

Cheers for the screenshots. I reckon it's the client!

Can you show the supplicant config from that machine, please?

ASKER

Will get them tomorrow. Got a slot for a GPO change coming up soon so if you spot anything, please be quick! :)

ASKER

Is it the 'Fall back to unauthorised network access' option below?

User generated image

User generated image

User generated image

User generated image

User generated image

The fall-back allows the NIC to pass traffic if authentication fails. In open (monitor) mode you want this, or your client won't talk even when the port opens up.

Everything looks ok on the supplicant to be fair. I would try enabling SSO with its default settings and see if that helps. I have seen instances in the past where disabling it caused weird issues.

If you can, get yourself a laptop that doesn't apply the GPO so you can play with the settings.

ASKER

Thanks, appreciate the help. Will research the SSO option. Got some more time to play with now change got binned :(