Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2324
  • Last Modified:

Very Slow Traffic on Wifi Network Using Windows NPS 802.1x Authentication

I am using a Windows 2008 R2 server, and NPS to authenticate all my wireless AP's.  It has worked beautifully until recently.  Now, all my wireless AP's are producing horrible speeds.  I'm getting ping times to local resources of like 400-2000 milliseconds!  If I plug in, those immediately drop to 1-5.

I have done a little testing, and changed some of my AP's to regular WPA, with a key.  When I do that, my ping times are considerably faster.  So it appears to have something to do with NPS, but I could be wrong.

Most of my wireless routers are Linkys boxes.  Either E1000, E2000, and E3000 machines running dd-wrt, or WRT54G routers.

I don't even know where else to look.  Does anyone have any ideas?
0
Jake Pratt
Asked:
Jake Pratt
  • 9
  • 7
  • 2
1 Solution
 
BembiCEOCommented:
NPS first at all means not a lot, there are a lot of possibilities, what you can make with NPS:
So a lot of reasons may affekt your question.

You have to seperate, if you make a machine authentication, user authentication or both.
Machine authentication can be made by a preshared key or a certificate. User authentication can be made as plain text with radius, widown authentication, cert based authentication.

If nothing else than authentication is made by NPS, I would assume, that there are authentication errors which produces fallbacks to other methods.
Win 2008 uses by default 128 Bit envcryption. You should investigate the capabilities of your APs, if they are able to work with 128 bit. Also there are - dependend from the encryption type differnet level, which has to be supported by the AP.

Usually the NPS log and the eventlog should show you some errors, maybe you have to filter the security log to see some events according to your authentication. By default, such authentication tries are logged, but the security log is full by default. THe event logs shows you how the AD connects (means wich methods it uses).

The ping time is unusual at all, what I don't understand is...
> If I plug in, those immediately drop to 1-5.
What do you plugg in? The computer to the network?

Ping times over WLAN should be as fast as on the normal network (1ms or less). If they are higher, there is either a routing problem or the NAS machine is on its end. When the client is authenticated, thers should be no reason, why the speed slows down.

To check the eventlog on the NAS machine (including security).
Check the logs on the router, if there are logs.
Post your configuration details, especially what you configured in NAS.
0
 
Jake PrattAuthor Commented:
Thanks.  When I said "plug in", I was referring to connecting to the LAN with an Ethernet cable.  So yes, basically, when I am plugged into the network, I get 0-2 ms response times.  When I am on wireless, I get about 200-2000 ms response times.  It is CRAZY slow.  And it is happening to all my AP's that I can tell.

I have gone through the event viewer on the NPS server, and haven't found any errors that really stick out.  The security log is mostly audit success events.  There are a few audit failure events peppered throughout, but they don't say much.  I have also enabled logging on a few of my AP's, but I'm also not seeing any information that is very helpful.

I am using NPS as a RADIUS server.  The AP's are set up with WPA2 enterprise 802.1x authentication.  The NPS server is supposed to allow access based on authenticated user, or authenticated computer.  But it seems to only work based on user.  It is authenticating using PEAP, and a certificate.  Via EAP-MSCHAP v2.

The one thing that might play in as far as routing goes is that on all my AP's, I create a default route that points everything (0.0.0.0 0.0.0.0) to the default gateway for that network.  If I don't add that default route, the AP's cannot see the NPS server.  This is the way I set them up from the beginning.  And they have worked great up until recently.

One more thing is that the NPS server is in a virtual environment.  It is running on VMWare.  But it always has been, so I doubt that is a factor.  The system resource usage is very low for CPU, Memory, and Network.  And I have rebooted the server, in case it was a quirky reboot problem.

Nothing is helping me alleviate or find the problem as of yet.  Any other ideas?

Thanks
0
 
Craig BeckCommented:
The NPS server should not be a factor here once the client is authenticated.  However if the AP is trying to re-authenticate lots of clients (or constant single-client attempts) at frequent intervals there may be an overhead associated with that.  This may be putting pressure on the AP's CPU or memory which pushes the ping times up due to the AP being busy.

I would check the NPS log to see if you're getting constant authentication requests when clients connect.  If so, try adjusting the authentication timers on the APs if there's an option for that.

If your APs work with WPA2-Enterprise they will support 128-bit encryption.  Server 2008 NPS won't enforce 128-bit by default - that's usually specified at the AP end by choosing either WEP, WPA or WPA2 unless you use constraints and settings on the NPS access policy.

If you're checking the logs on the NPS server you don't need to check the security log.  There is a separate Network Protection and Access Server log in the Custom Logs section.
0
Who's Defending Your Organization from Threats?

Protecting against advanced threats requires an IT dream team – a well-oiled machine of people and solutions working together to defend your organization. Download our resource kit today to learn more about the tools you need to build you IT Dream Team!

 
Jake PrattAuthor Commented:
I've been looking at the NPS log quite a bit.  Over the last 2 days, I have over 60,000 events.  I assume they are all authentication events.  If I filter them by 1 particular user, I end up with between 1000 and 2000 events in the 2 day period for that particular user.  And if I just sort them all by time, there are usually about 20-24 events in a row for 1 particular user.

For example, if we just look at "myuser" I might have 24 events for MYDOMAIN\myuser over the period of about 2 second.  Then 24 more about an hour later, then 24 more about an hour later, etc.  I don't know if that's excessive, or if that is normal.

I have 24 AP's listed as RADIUS clients on my NPS server.  The weird thing is that this all was working great.  And our infrastructure hasn't really changed.  We have added a few more wireless devices slowly over time, but I don't know where to try and place the blame... on the NPS server or the AP's.  If the problem was just having too may wireless devices in an office, I would expect to see problems with just that AP, and not all the other ones in my organization.  I have taken a look at a few of my AP's, and they are using about 15% CPU, and about 80% memory (far from max).  My server is using about 5% CPU and 15% memory, and 0.2% network.  I don't think it's a problem with maxing out resources.

All of my AP's are running WPA2 Enterprise, which I believe is 128-bit.  And just to note, my server is 2008 R2, not 2008.

Thanks for all your help.  This one really has me stumped.
0
 
BembiCEOCommented:
> And it is happening to all my AP's that I can tell.
Points at least to the fact, that there is an issue on the W2008R2 Server.

The events, which may be interesting need not neccessarily show as error. As long as the authentication works, it is logged as success, but maybe with additional information, how the authentication was made.

Bit I guess you have to seperate some things...
RADIUS is used mainly for devices, which are outside your regular network. The first step is the connection between the AP and your RADIUS Sever. As this is a machine connection, the method is mainly a preshared key or a certificate (if supported by the device). This is the way, how AP and NPS Server communicate.

The second question is, how clients and AP communicate together. From the windows site it is the same, beside this, some router provide additional protection methods, mostly MAC based to protect the network.

The third option is, how the client user authenticates against the AD. Here you have several options from pass through, certbased, plain text etc.

On NPS Site, you can setup, how connections are allowed (Connection request policy) and how users can authenticate (network policies), i.e. if they authenticate against the local machine (which has to resolve the authentication request = inside the domain) or just forward the requewst to another RADIUS Server, who can authenticate.
With health policies, you can enforce some asddtional test.

For any kind of cert based authentication (user or machine), you should make sure, that the certificate is resovable on the other side. Means if you use a certificate on NPS, the AP has to resolve it and should have the root cert in his cert store (if available). Otherwise the device has to resolve the cert externaly and this works only with public certs. This may an issue mit the gateway on the APs.

The last point are encryption methods, which have to fit on both devices (AP+NPS). A preshared ky is simple, for certs is is more complex as there are several encryption methods, which should be setup on both devices (AP and NPS) in the same way.

I'm not quite sure about the possibilities in your APs, but cisco and MS never where really friedly to each other and some defaults are sometimes different. Also the Linksys Routers are Home-Routers, not quite sure how fast there are, encryption takes some additional ressources.

Virtualisation do not care, as long as the machine gets enough ressources. In virtualized environment, the processor laod on the machine itself does not really say the thruth. You have inspect both, virtual and physical machine.

Nevertheless, as a ping is nothing else than an empty data package of some bytes,where not really a lot is to encrypt, it can not produce such a delay. Maybe you can try to make a trace, from the client, from the AP, internal and external targets, just to see, were is the delay and if there are some unusual routes.
0
 
Jake PrattAuthor Commented:
Thanks for the replies.  Let me try to answer some of your specific questions.

1. The radius authentication between the AP and the NPS is preshared key.  I have a preshared key, or shared secret set up on the AP's WPA2 Enterprise settings, and on the NPS's radius client configuration.

2. I'm not positive how the clients communicate to the AP's.  I believe the AP's use layer 2 traffic, which is basically just MAC's.  The clients just connect using WPA2 Enterprise.

3. The AD authentication should all be cert based.  In NPS, I have a connection request policy and a network policy.  The network policy uses both computer and user authentication, and allows anyone that is a member of "domain users" or "domain computers".  The NAS port type is "Wireless - Other OR Wireless - IEEE 802.11".  The authentication method is "Microsoft: PEAP".  And on the PEAP properties, I have a certificate designated from the NPS server.  And I have an EAP type of "Secured password (EAP-MSCHAP v2)".  Encryption is set to 128-bit.  Hopefully that will tell you a little more about how the AD authentication happens.  I don't believe it is possible to add the NPS cert to the root store on the AP.

4. As stated in point 1, I use a preshared key between the AP and NPS, not a certificate, for authentication.

I know that ICMP requests are just empty packets, but in this situation, they are directly proportional to the speed of the connection, so that is what I am using to diagnose.  If I am plugged into the network, and I have a fast network connection, my ping times are <= 1 ms.  If I am on the wireless, and my whole network is behaving very slowly, my ping times are between 200-2500.

I have tried running traceroutes from various devices, but it isn't really showing me a lot useful information.  When I run a traceroute from the client, to another device, all hops are very slow.  There are basically 2 hops: the gateway, and the destination.  When I ping from the NPS server to the client, the first hop to the gateway is fast, and the hop from the gateway to the client is slow.

Does that provide any more useful information?  Thanks.
0
 
Craig BeckCommented:
Without being disrespectful, I don't agree with much of what Bembi said (sorry Bembi).

Firstly, Cisco and Microsoft NPS/IAS do play very nicely together.  I've deployed hundreds of Cisco LANs and WLANs and Microsoft IAS/NPS servers together and never had a problem.

Secondly, the APs don't need the certificate from your RADIUS server to successfully authenticate clients, nor does it need the root certificate.  The client/server conversation is important here - the AP just facilitates the RADIUS connection between the two (which doesn't even need an IP connection between client and server).

Third, the AP authentication is not specified by RADIUS usually.  The AP broadcasts its capabilities to the client BEFORE the client connection attempt happens, so this is not reliant on RADIUS.

Fourth...
> And it is happening to all my AP's that I can tell.
Points at least to the fact, that there is an issue on the W2008R2 Server.
No it doesn't.  It points to the fact that there is an issue affecting the APs, but that could be due to a bad switch, or duplicate IP, slow disk in VM host....... etc.

Fifth, WPA2 allows for a 256-bit encryption key to be used.


The issue here is apparently overhead when using NPS.  However authentication traffic is tiny (just a few bytes) and clients CAN authenticate successfully, so I doubt that the link between the AP and the NPS server is the actual problem.  Taking into account the number of authentication entries in NPS I'd say your APs are sending excessive authentication requests per client, but nothing to worry about.  I'd expect to see a few Information logs per connection, but once the client is authenticated the logs should stop appearing until the authenticated session expires (as per the AP OR NPS access policy setting), or when a client roams to a new AP if you're not using WDS.

If your APs are sending lots of requests to the NPS server the problem could be a CPU or I/O issue even if the AP says it's processor is not under any great load.  You have to consider that the AP must process authentication traffic before user traffic, otherwise it won't be able to allow people to connect successfully.  If it's busy doing that you should expect a delay.


If you set the APs to use WPA/TKIP does it get any better?
0
 
Jake PrattAuthor Commented:
You know, I stated earlier that I tried with WPA (TKIP and AES) and speeds were faster.  But I just tried it again.  I changed one of my AP's back to WPA personal, with both TKIP and AES, and my ping times are still slow.  Not AS slow, but slow nonetheless.  Between 20-300ms, with a few dropped packets here and there.  So maybe it's not authentication.

But it's weird because the AP in question only has 1 client connected to it, so I doubt the AP is overloaded with traffic.

The weird thing is that my infrastructure was working great over 24 AP's and many offices.  Now, all of a sudden, they all seem to be misbehaving, in all offices, regardless of load.

I just started running a continuous ping to my AP, with slow results, then pulled the network connection from the AP.  I immediately started getting consistent 1-2 ms ping times.  It's almost like there's just some kind of traffic on the network that is interfering with my wireless traffic.
0
 
BembiCEOCommented:
craigback: I don't want to open a discussion about Cisco, but the experiences may be differnt.
I agree with you, that they are running of course with MS, but experienced often enough some weidt effects, nevertheless when you are close to cisco devices, you know what to do.
But keep in mind we are talking about LinkSys, what is bought by cisco as a cheap segment. So we are talking about low cost routers.

Back to the topic:
1. AP - NPS oK
2. CL - AP: You may have a look into the config / log of the AP to find out, how the client connects.
3. Have you tried to use the settings step by step?
For testing you can put a empty test policy in front of the live policy. Then you can add condition to condition step by step to see, what happens.
If it works with an empty policy, at least you can find out which setting produces the effect.
4. OK

5.) Ping...
This is what I mean. The trace is more or less also a ping, but when you describe, that the trace from NPS to the client is slow on the second step, the trace from the client to the NPS in the first step,
I would assume, the problem is between the client and the AP.
Do you have a option to ping trace from the AP?
Can you post all ping times NPS - AP, Clinet - AP, AP - NPS, AP - Client?
Old method / New method?

6.) Have I understood you right, that your ping times with PSK (old metzhod) where 20-300ms ?
0
 
Jake PrattAuthor Commented:
I could try an empty policy a little later today.

When I came in this morning, and there was less traffic on the network, I noticed that the ping times are a little better.  Instead of 100-2000, they are about 10-300.  It seems perhaps they get worse with higher network traffic.

I have tried running traceroute on the AP's, but it just hangs after the first hop, and goes nowhere.  The first hop goes to the gateway with a latency of < 1ms.

This morning my ping times are a little better, but here they are on average.  These ping times are all using WPA2 Enterprise, authenticating off of NPS:
NPS to AP (normal):
=1
<1
<1
<1
=1

Client to AP (slow, but not as slow as yesterday under more traffic):
36
371
3
33
75
84
368
226

AP to NPS (normal):
=1
<1
<1
=1
=2
<1

AP to Client (slow, but not as slow as yesterday with heavier traffic):
97
128
107
50
555
7
120

Now, here are the ping times using WPA/AES without the NPS server authentication.  They are better, but still slower than what I'm used to seeing.

NPS to AP (normal):
=1
=1
<1
<1
=2
<1

Client to AP (slower, but not as slow as yesterday):
10
2
8
3
1
2
13
34

AP to NPS (fairly normal):
=5
<1
<1
<1
=5
<1
=1
=50

AP to Client (slower, but not as slow as yesterday):
18
2
4
18
28
7

So, basically, the slower traffic is definitely happening between the client and AP (they are 5 feet away from each other).  The traffic between the AP and NPS on the wired network is to be expected.  The traffic is a little slower using NPS authentication than just using WPA/AES-TKIP.  And it seems that the more people on the network, the worse the wireless traffic gets.
0
 
Jake PrattAuthor Commented:
Here, an hour later, the ping times have become noticeably slower.  Instead of 36-300, we're seeing more like 72-600, and they are consistently more above 100 than an hour ago.
0
 
BembiCEOCommented:
OK, lets exclude the load first, because even without load, the shown ping times are too long in my mind. Means there is an issue, which multiplicates with the load. Ping time without load should be 1ms or less.

While searching a bit, i found this, maybe you give it a try...
http://www6.nohold.net/Cisco2/GetArticle.aspx?docid=266cc1c7b97c458fb04c2da21f985828_List_of_Common_Issues_with_Wireless_N_Routers.xml

Some point i found also in other articles.
By the way, what client do you use for testing? XP, Win7?
0
 
Jake PrattAuthor Commented:
Thanks, I'll check out that article.  My clients are pretty much all 7 and Vista (mostly 7).  The main client I have been using for testing is 7.
0
 
BembiCEOCommented:
OK, there are some issues with vista as I could see, especially with remote differential compression - but should not affect Win7. Nevertheless I disabled this on my Win7 clients too.
0
 
Jake PrattAuthor Commented:
So I spent some times looking at that article.  The only thing I could find in there that seemed useful for my situation was changing my MTU settings on the routers.  Through testing, I determined that the optimal MTU for the AP's in my office here is 1300, rather than the default 1500.  So I changed all those.  I also change the ack timing on all the AP's in this office (I have 4 running on alternating channels 1, 6, and 11) from the default 2000 meters to 50 meters.

I had also noticed that my DHCP scope for this office building is getting very full.  Most of my addresses were handed out.  We have some suites up on another floor that use the same VLAN and same IP addressing as us.  So I even changed out all the IP addressing on the other floors, and put them on their own VLAN's to try and alleviate some of the strain on our VLAN.

None of these changes seem to have helped the situation.  The problem still is not constant.  It seems to get better and worse throughout the day.  My average ping times to the NPS server may be as low as 12, but later in the day they could be as high as 1800.

I still THINK the problem exists in my other offices, on my other AP's, but for right now, I am trying to focus on the AP's in this building.  But I just can't seem to figure this one out.
0
 
BembiCEOCommented:
OK....
then I would like so see the following:
a.) can you make sceen shots of the NPS policies and post them?

b.) You may make a performance counter log on the NPS. Let it run for some time.
Server Manager - Diagnostic - Performance.
You may add, the
processor load,  
physical disc (read / write bytes/sec, Queue length
tcpv4 (connection active)
System (processor queue length)
Memory (Availybe (M/K) Bytes)
Network Interface (read / write bytes / sec)

You can also play around with other values from Network Interface, IPv4, IPSec

You may observer, if you see significant changes when the ping time rise up.

c.) Try 56 or 40 bit encryption
compare the values to 128 bit (performance counters)

This is just to see, if there is any problem on the NPS.

d.) Can you see, which connection speeds the clients are using? Are they using all the same protocols or are they mixed? You may enforce one protocoll i.e. b or g, just to see, if the problem is the same.

e.) You may also try another thing. Network cards have usually autonegotiation set for their speed. Try to manually set the NIC speed to the supported NIC speeds of the APs. So if the APs have 100MBit, set the NPS NIC manually to 100 MBit Full Duplex.
Also other NIC setting may produce nonsense, if not correctlyx supported by the AP NICs. Like all kind of Task offloading.
Axence Net Tools can help to figure out such problems....
http://www.axencesoftware.com/en/nettools
0
 
Jake PrattAuthor Commented:
Ok, I haven't had much time to work on this for the last week and a half or so, because I've been working on some big issues with my McAfee Host Intrusion Prevention/Firewall.  It was causing all kinds of problems, so I ended up temporarily disabling them.  I picked up the laptop I've been using to test the wireless problems today and started getting totally normal ping times.  I'm wondering if this is all because of McAfee's host intrusion prevention, and Firewall.  It wouldn't surprise me, it's been causing so many problems lately.

I've been watching the HIPS log, and have notice that it is blocking a lot of routes from the default gateway.  I wonder if that's keeping devices from learning network routes, and slowing things down.  I'm not sure.
0
 
BembiCEOCommented:
Yes sure....
Not only intrusion detection, seen a lot of cases where firewalls / virusscanners are involving in such problems.
Intrusion Inpesction sniffes the network for the number of TCP / UDP connetions. And these systems are mostly preconfigured, so that manual customizing may be needed to fit the trigger values to your usual network traffic. If such a value is triggered, the software reacts in a way, may be blocking the traffic at all or just cutting connections.
But ping times - due to the nature of the packet as it, would point me more to performance issues of the devices, which realizes the intrusion detection (if excactly this function is the problem). Another possibility is just - because such software has a lot of protection levels, that using them all together can overload the underlying hardware.

I guess, this is not a general problem of McAfee - as long as it doesn't have to do with false triggers because of the signatures of the producer - just has something to do with induvidual customizing, what is sometimes necessary for all such products.

The other point is sometimes, that especially virus scanners are scanning windows services or databases, what results in a significant perrformance decrease, so have a look here, what MS says according to exclusion recomendateion for several server roles and services:

http://social.technet.microsoft.com/wiki/contents/articles/953.windows-anti-virus-exclusion-list-en-us.aspx
0

Featured Post

Evaluating UTMs? Here's what you need to know!

Evaluating a UTM appliance and vendor can prove to be an overwhelming exercise.  How can you make sure that you're getting the security that your organization needs without breaking the bank? Check out our UTM Buyer's Guide for more information on what you should be looking for!

  • 9
  • 7
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now