Very weird DHCP problem ... switch related?

So I've got the following, weird problem:

A bunch of workstations and servers are connected to a ProCurve 2404vl switch. SOMETIMES the workstations take ages to get an IP address from the DHCP server (Windows 2008R2, connected to that same ProCurve) and I can't figure out why ... it seems like the DHCP server doesn't get any requests from the workstations, like he can't see them FOR A WHILE ... until suddenly he does, and then the workstations get their IP address and everything works fine again ...

As a test I've now connected a second, small switch to the ProCurve and connected some of the workstations to that second, small switch. Now these workstations ALWAYS get an IP address, even in situations where the other workstations do not! So it seems like the second switch is filtering out something that prevents the DHCP server from seeing the requests made by the workstations?

Does anyone have any ideas what could be the problem?

Note: I've just replaced the ProCurve chassis > unfortunately the problem remains! So it was not a hardware problem (the problem also appears on connections on all 4 port modules so I doubt it's because of a defective module either).
XeronimoAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Zephyr ICTCloud ArchitectCommented:
Hi,

It could help if you can post the (sanitized) config of the switch ("show run" I think for Procurves)...
0
Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
The exact configuration is blurry. Do you use 2 ProCurve switches or 1?
Are there VLANs?
0
XeronimoAuthor Commented:
Hi, here's the config:

; J8770A Configuration Editor; Created on release #L.11.43

hostname "ProCurve Switch 4204vl" 
snmp-server contact "" 
snmp-server location "" 
module 1 type J8768A 
module 3 type J8768A 
module 2 type J9033A 
module 4 type J8768A 
interface A1 
   name "02A (AS-003)" 
exit
interface A2 
   name "10A (AS-013)" 
exit
interface A5 
   name "05A (AS-004)" 
exit
interface A7 
   name "I-106 (AS-103)" 
exit
interface A8 
   name "I-125 (couloir PopBio)" 
exit
interface A9 
   name "I-111 (AS-103)" 
   speed-duplex auto-100 
exit
interface A10 
   name "I-104 (AS-104)" 
exit
interface A11 
   name "I-105 (AS-103)" 
   speed-duplex auto-1000 
exit
interface A13 
   name "01A (AS-003)" 
exit
interface A14 
   name "I-103 (AS-104)" 
exit
interface A15 
   name "07B (AS-011)" 
exit
interface A16 
   name "29B? (AS-116)" 
exit
interface A17 
   name "Server Rack" 
exit
interface A18 
   name "Server Rack" 
exit
interface A19 
   name "32B (AS-013)" 
exit
interface A20 
   name "Server Rack" 
exit
interface A21 
   name "I-135 (AS-105)" 
exit
interface A23 
   name "I-109 (AS-103)" 
exit
interface A24 
   name "40A (table couloir Compactus)" 
exit
interface C1 
   name "I-115 (AS-101)" 
exit
interface C3 
   name "17B (AS-109)" 
exit
interface C4 
   name "Server Rack" 
exit
interface C5 
   name "30A (AS-203)" 
exit
interface C6 
   name "29A? (AS-116)" 
exit
interface C7 
   name "Server Rack" 
exit
interface C9 
   name "38A (AS-006)" 
exit
interface C10 
   name "39A (AS-012)" 
exit
interface C13 
   name "02B (AS-003)" 
   speed-duplex auto-1000 
exit
interface C14 
   name "18A (AS-109)" 
exit
interface C15 
   name "19A (AS-109)" 
exit
interface C16 
   name "15A (AS-108)" 
exit
interface C17 
   name "17A (AS-109)" 
exit
interface C19 
   name "14A (AS-009)" 
exit
interface C20 
   name "21B (AS-114)" 
exit
interface C21 
   name "23B (AS-113)" 
exit
interface C22 
   name "Switch Labo Bot" 
exit
interface C23 
   name "22B (AS-113)" 
exit
interface C24 
   name "21A (AS-114)" 
exit
interface B1 
   name "26A (AS-112)" 
   speed-duplex auto-1000 
exit
interface B2 
   name "15B (AS-108)" 
exit
interface B3 
   name "13B (AS-008)" 
exit
interface B4 
   name "Ericsson thingie" 
exit
interface B5 
   name "30B (AS-203)" 
exit
interface B6 
   name "12A (AS-008)" 
exit
interface B8 
   name "13A (AS-008)" 
exit
interface B9 
   name "Server Rack" 
exit
interface B10 
   speed-duplex auto-1000 
exit
interface B11 
   name "I-140 (AS-105)" 
   speed-duplex auto-100 
exit
interface B12 
   name "Server Rack" 
exit
interface B15 
   name "I-117 (AS-101)" 
exit
interface B16 
   name "I-118 (AS-101) " 
exit
interface B17 
   name "FreeWifi-WAP" 
exit
interface B18 
   name "abc > FW" 
exit
interface B19 
   name "FreeWifi > FW" 
   speed-duplex auto-1000 
exit
interface B20 
   name "? > FW" 
   speed-duplex auto-1000 
exit
interface B21 
   name "To CCRN" 
exit
interface B23 
   name "To M" 
exit
interface D1 
   name "I-114 (AS-103)" 
exit
interface D2 
   name "I-101 (AS-104)" 
exit
interface D9 
   name "06A (AS-011)" 
   speed-duplex auto-1000 
exit
interface D10 
   name "I-113 (AS-103)" 
exit
interface D11 
   name "I-126 (couloir PopBio)" 
exit
interface D12 
   speed-duplex auto-1000 
exit
interface D13 
   name "Servers" 
exit
interface D14 
   name "Servers" 
exit
interface D15 
   name "Servers" 
exit
interface D16 
   name "Servers" 
exit
interface D17 
   name "Servers" 
exit
interface D18 
   name "Servers" 
exit
interface D19 
   name "Servers" 
exit
interface D20 
   name "Servers" 
exit
interface D21 
   name "Servers" 
exit
interface D22 
   name "Servers" 
exit
interface D23 
   name "Servers" 
exit
interface D24 
   name "Servers" 
exit
ip default-gateway 192.168.2.1 
snmp-server community "public" Unrestricted 
vlan 1 
   name "VLAN_1" 
   no ip address 
   no untagged A1-A24,B1-B24,C1-C24,D1-D24 
   exit 
vlan 20 
   name "LAN_ABC" 
   untagged A1-A24,B1-B16,B18,B22,B24,C1-C24,D1-D24 
   ip address 192.168.2.9 255.255.248.0 
   tagged B23 
   exit 
vlan 30 
   name "FreeWifi" 
   untagged B17,B19 
   tagged B23 
   exit 
vlan 2 
   name "Internet" 
   untagged B20-B21 
   exit 
fault-finder bad-driver sensitivity high 
fault-finder bad-transceiver sensitivity high 
fault-finder bad-cable sensitivity high 
fault-finder too-long-cable sensitivity high 
fault-finder over-bandwidth sensitivity high 
fault-finder broadcast-storm sensitivity high 
fault-finder loss-of-link sensitivity high 
fault-finder duplex-mismatch-HDx sensitivity high 
fault-finder duplex-mismatch-FDx sensitivity high 
qos device-priority 192.168.4.20 priority 7 
qos device-priority 172.16.10.125 priority 7 
primary-vlan 20 
dhcp-snooping authorized-server 192.168.1.5
dhcp-snooping authorized-server 192.168.1.93
dhcp-snooping vlan 20 
spanning-tree
password manager

Open in new window

0
Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

XeronimoAuthor Commented:
note: the VLAN_1 is a legacy thing and not really used anymore (I think ...) but I don't want to remove it because there still might be some obscure device using it? The main VLAN is the vlan20.
0
XeronimoAuthor Commented:
Olemo: there's one main switch, the ProCurve 2404vl. I've attached a small (unmanaged) Netgear switch to it for testing purposes.
0
Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
So adding an unmanaged switch improved DHCP reliability?
0
XeronimoAuthor Commented:
Yes!
0
XeronimoAuthor Commented:
But only for those workstations connected to the unmanaged switch!
0
Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
Strange. I can only guess that the ProCurve has issues with Ethernet broadcasts, and having more than one MAC address per port in the switch port table overcomes that bug.
0
Zephyr ICTCloud ArchitectCommented:
Did you try lowering the setting of "fault-finder broadcast-storm sensitivity high" ? I see also that dhcp-snooping is enabled, it's not a bad thing, but did you try disabling it to see if it improves the situation?
0
XeronimoAuthor Commented:
Qlemo: " I can only guess that the ProCurve has issues with Ethernet broadcasts," > you mean those ProCurves in general? Or that specific ProCurve? Because I've just replaced it and the problem has already occurred again ...
0
XeronimoAuthor Commented:
spravtek: I'll try those then
0
XeronimoAuthor Commented:
spravtek: what about 'multicast filtering'? Should I keep that on 'off'?
0
Zephyr ICTCloud ArchitectCommented:
Thanks, it's indeed a strange thing, the fact that the switch inbetween doesn't show this symptom ... This could mean that it doesn't apply "settings" on uplink ports (or ports where switches are detected so to speak) ...
0
Zephyr ICTCloud ArchitectCommented:
Keep it off for now, we'll try this step by step ... If nothing seems to help we can enable it again.
0
XeronimoAuthor Commented:
sprav: stupid question but how do I lower the setting of 'fault-finder broadcast-storm sensitivity high' ... I'm used to the GUI but I'm on the switch via Telnet now ... thanks!
0
Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
I think it is a general issue, either with firmware or config, and like spavtek that is is related to non-uplink ports.
0
Zephyr ICTCloud ArchitectCommented:
Should be "fault-finder broadcast-storm sensitivity medium action warn" in the cli, don't know the menu-driven way though ...
0
XeronimoAuthor Commented:
Qlemo: the switch has the latest firmware, and I've posted the config above.

and what do you mean with 'related to non-uplink ports'?

one more thing: there are additional (managed) switches connected to the ProCurve via fibre. The workstations on those additional switches also NEVER experience these DHCP problems. So it is ONLY those workstations that are connected directly to the ProCurve that SOMETIMES experience these DHCP problems (while, at the same time, DHCP works well for workstations connected to those additional switches).
0
XeronimoAuthor Commented:
spravtek: but isn't that just related to the logging level? why or how would that make a difference?
0
Zephyr ICTCloud ArchitectCommented:
Yes, you are right, my bad ... Thought it was a setting that prevented things... Sorry, been a while I worked on Procurves, so this wouldn't help at all ... Do you see any messages from it though?

What is meant by the non-uplink and uplink ports is that your problem with DHCP is only noticeable on the directly connected devices like workstations and such, not uplink connected devices like switches ...
0
Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
Compare the switch configs to see differences.
Is the "local" switch the one also having the DHCP machine connected directly?
0
XeronimoAuthor Commented:
sprav: no, no error messages

and yes, only the workstations that are connected directly to the ProCurve sometimes experience those DHCP problem. the workstations who connect to this ProCurve via a different managed switch never do.
0
XeronimoAuthor Commented:
qlemo: yes, the DHCP server is directly connected to the ProCurve. and the weird thing is that this DHCP server ALWAYS gives out addresses to workstations connected to an additional switch (managed or unmanaged) but NOT ALWAYS to workstations connected immediately to the same switch as itself ...
0
XeronimoAuthor Commented:
one more thing: I've connected the unmanaged, small switch to a random port on the ProCurve
0
Zephyr ICTCloud ArchitectCommented:
I see you have two DHCP servers, are they both on the same Switch/VLAN ??? If you temporarely disable one, does it work better or the same?
0
XeronimoAuthor Commented:
Hm, true, weird ... there's only ONE active DHCP server ... I'm not sure why there are two in the config? Both are DNS servers though. I'm gonna check that ...
0
XeronimoAuthor Commented:
I've just entered 'show dhcp-snooping' and I get 'DHCP Snooping: no' as a result ... so it's not activated?
0
Zephyr ICTCloud ArchitectCommented:
Aha ... Yes, maybe remove the one that is no longer valid...
0
XeronimoAuthor Commented:
Ok, so I've removed the incorrect DHCP server and I've enabled DHCP snooping (whie disabling dhcp-snooping option 82). Let's see if the problem continues to occur then ...
0
Zephyr ICTCloud ArchitectCommented:
Was reading up on that Option 82 could cause strange issues happening on Windows clients ... Who knows it might help.
0
Fred MarshallPrincipalCommented:
The switch has been rebooted?
0
XeronimoAuthor Commented:
Rebooted? Since when? After the disabling of option82? No. But it has been booted before of course.
0
XeronimoAuthor Commented:
Update: same problem still occurs :(

Some workstations connected directly to the ProCurve have to wait, at times, for ages to get an IP address by the DHCP server ...

Any other ideas??
0
Zephyr ICTCloud ArchitectCommented:
Maybe it would be interesting to put a sniffer on the switch to see what is (not) happening with the DHCP requests...
Are you using VLAN tags in your DHCP server?
0
XeronimoAuthor Commented:
sprav:
Maybe it would be interesting to put a sniffer on the switch to see what is (not) happening with the DHCP requests...

I've tried that before but it's not easy because it is not happening all the time! So I have to be present at a 'moment' where the problem occurs ... What I saw, if I remember correctly because this has been going on for ages now, was that the workstations sends a broadcast but the DHCP server doesn't see it ... until it suddenly (x minutes later) does!

Are you using VLAN tags in your DHCP server?

What exactly do you mean with that? Yes, I'm using VLANs but the main network is just one VLAN and all the servers and workstations are in that one VLAN.
0
Zephyr ICTCloud ArchitectCommented:
Ok ...

Can you do a "show dhcp-snooping" and post the results?
Just for my sanity sake...
0
Zephyr ICTCloud ArchitectCommented:
Also, if possible post the outcome of "show dhcp-snooping stats"  ... Might be there's not a lot there if no Ip requests are being made...
0
XeronimoAuthor Commented:
spravtek: ah, sweet sanity ... what's that again?? ;)

Here are the screenshots (the latter one is unchanged since yesterday ...):

cc
cc
0
Zephyr ICTCloud ArchitectCommented:
You don't need to trust all ports for DHCP snooping, only the port where the DHCP server is connected to,  check explenation here.

Maybe that's causing the slow reaction in DHCP reply... Probably not that lucky but still...

Did you try completely disabling DHCP snooping one time to see if it improved the situation?
If not, before going that route and if possible, enable following debugging on the switch;

debug destination session

Open in new window

debug security dhcp-snooping

Open in new window


Now we need to keep an eye on the log with "show log"
Make sure this doesn't overload your switch (the debugging) ... To disable it just put "no" in front of the debugging command "no debug security dhcp-snooping"
0
XeronimoAuthor Commented:
DHCP Snooping hadn't been active before. So it seems that it's not related to the problem? The problem occurs whether or not snooping is activated. What changed in the config is that there's only one DHCP server now. That doesn't seem to have an impact either though ...

I'll try your other recommendations soon!
0
Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
You might be able to provoke the issue by setting up WireShark on a client, using a DHCP and broadcast filter, and then frequently call ipconfig /renew. It does make sense to run WireShark on both client and server at the same time.
0
Zephyr ICTCloud ArchitectCommented:
Well, you also disabled the option 82 insertion, which is known for causing issues with Windows clients sometimes...
0
Zephyr ICTCloud ArchitectCommented:
Another question, is the DHCP server on the same switch or is it connected on another switch somewhere, just to learn the lay of your network somewhat ...
0
Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
spravtek, already asked and answered, see http:#a40695548
0
XeronimoAuthor Commented:
spravtek: "debug security dhcp-snooping" gives me an 'invalid input error' for 'security'
0
Zephyr ICTCloud ArchitectCommented:
Might be unavailable on this switch version, I'm not sure ... Can't test it because we don't have this one I'm afraid.
0
Fred MarshallPrincipalCommented:
Can you monitor with Wireshark and a mirror port?
You don't have to be there or know when it might happen as you can capture a lot of data.
Using a capture filter will help keep the files smaller.
Using a round-robin set of files will help keep the individual files manageable.
Then you can capture 24/7.
0
XeronimoAuthor Commented:
Ok, I'll experiment with WireShark again. I'll post the results soon.
0
XeronimoAuthor Commented:
Fred: could you explain this a bit more? Right now I'm running Wireshark on the DHCP server with a bootp filter and I see the DHCP stuff that's going on. Not able to test with a laptop on the troubled network segment though.

What do you mean with mirror port and round-robin set?

I'm sorry but I'm kind of the only IT guy around here and I can't know everything in detail ........... thanks!
0
XeronimoAuthor Commented:
On how do I create a DHCP capture filter ... ? I'm only able to apply a DHCP display filter so far.
0
XeronimoAuthor Commented:
Just a quick screenshot from the DHCP server ...

ss
0
Zephyr ICTCloud ArchitectCommented:
I'm sure Fred will chime in again, but to quickly answer the question, the mirror port is something you set on your switch, you put 1 port in mirror mode which will allow you to see all traffic going over your switch (or confined to 1 VLAN), instead of only seeing traffic from your computer connected to a normal port (not in mirror). This allows you to see the DHCP packets going over the switch to all end-points and back again.

The round-robin set of files refers to automatically rolling-over the files, I think he means ... As I said, sure Fred will be chiming in later.

To set a monitor port on your switch you can do that either via web interface or the cli
For the cli this command will set monitoring for only vlan 20:
[config] vlan 20 monitor

Open in new window

to disable:
[config] no vlan 20 monitor

Open in new window

check status/setting:
show monitor

Open in new window

0
Zephyr ICTCloud ArchitectCommented:
To set a capture filter in the "Capture Options" dialog box see here.

The filter you could use is something like this:
 port 67 or port 68

Open in new window

0
XeronimoAuthor Commented:
spravtek: ok, I've applied that capture filter.

as for the port monitoring: why would I want to do that? Isn't capturing all the DHCP traffic on the DHCP server enough? why hook a laptop to that mirror port? wouldn't it capture the same that the DHCP server does? or is it to get those DHCP 'signals' from the workstations that do not reach the DHCP server?
0
XeronimoAuthor Commented:
on a side note: my ProCurve log has a lot of 'CST Root changed' messages now ... something to worry about?
0
Zephyr ICTCloud ArchitectCommented:
get those DHCP 'signals' from the workstations that do not reach the DHCP server?

Yes, something like that :)
At the moment you get to see everything what is actually arriving at the DHCP server, not what is not arriving on the DHCP server as well as not what is (not) arriving on the workstation ports... That's why it's better to capture using a mirrored port.

'CST Root changed' messages now ... something to worry about?
Well, we definitely should look into this as well, could be you have switches "fighting" over who is the root in the Spannint-Tree topology. But that's a whole other question, maybe ...
0
XeronimoAuthor Commented:
Update: ok, so I've done the following:

- installed WireShark on both the DHCP server and a laptop which I have connected to the mirror port on the switch
- I've enabled capture filtering on both (port 67 and 68)
- the data is being saved in files of 1 MB

- I've logged onto a workstation that's directly connected to the ProCurve and I've released and renewed the IP address, works fine right now ...

- I've removed the dhcp-snooping trust on all ports except those of the DHCP server

So what do I do now? Wait until the problem occurs again?
0
Zephyr ICTCloud ArchitectCommented:
Yes, let's wait and see if we get to see something happening on Wireshark and continue from there ... Or wait until you get complaints of people not getting ip-addresses and check Wireshark accordingly.
0
XeronimoAuthor Commented:
What should I be looking out for on Wireshark?
0
Zephyr ICTCloud ArchitectCommented:
Well, if you see changes in colors start to watch out for things like dropped packets, destination unreachable, things like that, everything that doesn't look normal might be a clue. If in doubt, post it here or send it through personal message if it contains too much sensitive info.
0
XeronimoAuthor Commented:
Ok. So far the color is all the same ... And thanks for the offer.
0
Fred MarshallPrincipalCommented:
Sorry I missed so much!  Looks like you have it all set up fine.
Be aware that capture filters and display filters in Wireshark have *different* notation.  Unfortunate but true.
So, they won't be the same to get the same effect.

The reason for a switch mirror port is so that you can move it around from port to port depending on your needs or what you find out as you go along.  But, in general, you will find you're looking at the same port for the same problem pursuit.
Another good thing is you can limit the traffic on Wireshark with a switch mirror.
But, having Wireshark on the DHCP server port seems a good alternative.  Often one doesn't have that luxury.

I'm also going to be away more than here for the next while so others please continue to help Xeronimo!  I'm sure it's appreciated.
0
XeronimoAuthor Commented:
Update: the problem hasn't occurred again so far and since I can't force it to occur I'll have to wait and see ... but at least now Wireshark is logging all the DHCP stuff going around the switch!
0
Zephyr ICTCloud ArchitectCommented:
Ok, well let's hope it doesn't happen again :)

Maybe next monday we will have some issues again. It's an interesting issue though, but it's always interesting when you're working "blind" and not have anything obvious screaming at you ... Not saying it will be something obvious here though :)

Anyway, carry on, nothing to see here
0
XeronimoAuthor Commented:
> Ok, well let's hope it doesn't happen again :)

I've already hoped that a lot of times! ;) And then, suddenly, the problem popped up again ...

> Maybe next monday we will have some issues again. It's an interesting issue though, but it's always interesting when you're working "blind" and not have anything obvious screaming at you ... Not saying it will be something obvious here though :)

It actually seems to happen more on Mondays than on other days ...

> Anyway, carry on, nothing to see here

Hehe
0
Fred MarshallPrincipalCommented:
At least now you have the Wireshark capture running so when it does happen you should have some data to analyze.
Just make sure that the round-robin file set will contain enough recorded data that you will be able to be alerted to the failure and go back to it in the data.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
DHCP

From novice to tech pro — start learning today.