Link to home
Start Free TrialLog in
Avatar of adamlcohen
adamlcohen

asked on

TFTP Oen Timeout when PXE Booting to WDS Server

Current Config:
1x HP Procurve 5300 series Switch
Vlan 1 = 172.16.x.x Servers
Vlan 2 = 192.168.x.x Clients
IP Helper (DHCP Server address)
Forward Protocol tftp (Port69) to WDS Server
Forward Protocol 4011  to WDS Server

1x DHCP Server = Win 2003 Server
Scopes are set with Option 66 and 67

1x WDS Server = Win Server 2008
Both Servers are on Vlan 1
Client machines are on Vlan 2

Clients on Vlan 1 PXE boot no problem.
Clients on Vlan 2 recieve a DHCP Address but then recieve a TFTP Open Timout message.

Any suggestions are welcome on how to get the clients in Vlan 2 to PXE boot with out this error.
Avatar of pmasotta
pmasotta

something to do with the fact that you are forwarding the TFTP control port (69) but probably not the random port used by the TFTP server for sending DATA packets?
This way the file request reach the TFTP server but the TFTP answer never gets to the client

explained on  RFC 1350
Avatar of adamlcohen

ASKER

Thanks for the reply.

I assume we are talking abou the TID's here? If so, am I correct in assuming Microsoft WDS ia using ports 64000-65000 ? In which case we need to port forward this range from VLAN 2 to VLAN 1? There ports would be open by default, but I guess the clients on VLAN 2 just don't know where to send the traffic.

Shame the dynamic port process doesn't work like FTP, then we wouldn't have to forward all these ports. I guess that is why it is 'trivial'!!!

Cheers,.
if the connection to the TFTP "times out", the clients know where the TFTP server is located and they reach it using UDP port port 69
next the TFTP answer tries to come back on a random port that if not open never reaches the client.
If the clients do not receive the TFTP address or cannot connect to it the message is different than a TFTP timeout.

if you are using WDS and your client gets to the TFTP instance the WDS ports for its RPC communication seem to be working fine, here your problem is not WDS RPC, it's not DHCP/Proxy DHCP, it seems to be only the TFTP DATA packets...

if you feel confident sniffing the protocols give wireshark a try; you'll quickly see where the TFTP traffic gets stuck

ASKER CERTIFIED SOLUTION
Avatar of vivigatt
vivigatt
Flag of France image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks for all the updates.

I've just checked the TFTP from the command prompt:
On both VLAN1 and VLAN2, the response is as follows:
"Error on Server: Transfer mode not supported"
Perhaps this is a typrical response from WDS, or have I got the wrong root folder.
WDS Root folder is <drive>\RemoteInstall ?

Any how, in reponse to your other tests...

1/ NETSTAT outout.
(Is the Error 5 signicant ?)
x: Windows Sockets initialization failed: 5
  UDP    172.16.0.10:4011       *:*
  WDSServer
 [svchost.exe]
  UDP    172.16.0.35:67         *:*
  WDSServer
 [svchost.exe]
  UDP    172.16.0.35:68         *:*
  WDSServer
 [svchost.exe]
  UDP    172.16.0.35:69         *:*
  WDSServer
 [svchost.exe]
  UDP    172.16.0.35:4011       *:*
  WDSServer

2/Firewall,
I've disabled the Windows firewall and tested again.
It made no difference.

3/VLAN routing.
VLAN2 clients can ping the WDS, DHCP and other servers on VLAN1.

4/Procurve firmware.
Recently update to E.11.29.

Thanks for all your help thus far.

the term "forward" here is used in the sense of the first post while describing the server config and not in the "router" sense.
of course UDP is "routed" by routers. The term means that the particular port has to be able to "travel" and not being blocked from one net to another one.

the rest of your explanation seems to forget that "everything works" but the TFTP DATA answer....
reading @adamlcohen last post it seams the problem is not just a "TFTP Open Timout message" as described on the 1st post.
and yes "Windows Sockets initialization failed" it could be important if a requiered socket failed on init.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Good spot!

We actually added 172.16.0.10 address to this server after it was built, this was the address of out WDS/RIS server which also had the same problem. Perhaps that is why the the bindings on this address are not correct.

So using the IP 172.16.0.35, the response from theTFTP -i command is as following on both VLANS.
Error on Server : Access Denied.

So I placed foorbar.txt in the \Boot\x64\ folder and tried again and went with
TFTP -i 172.16.0.35 get \Bookt\x64\foobar.txt and now get the following error on both VLANS:
'Timeout occured'

I've installed Microsoft Netmonitor in the WDS server as well, and following the TFTP command I can se the following:
245      14:38:44 06/05/2011      17.0791241      svchost.exe      172.16.0.218      172.16.0.35      TFTP      TFTP: Read Request - File: \boot\x64\foobar.txt, Transfer Mode: octet       {UDP:113, IPv4:112}
257      14:38:45 06/05/2011      18.0684730      svchost.exe      172.16.0.218      172.16.0.35      TFTP      TFTP: Read Request - File: \boot\x64\foobar.txt, Transfer Mode: octet       {UDP:113, IPv4:112}
293      14:38:47 06/05/2011      20.0685479      svchost.exe      172.16.0.218      172.16.0.35      TFTP      TFTP: Read Request - File: \boot\x64\foobar.txt, Transfer Mode: octet       {UDP:113, IPv4:112}
358      14:38:51 06/05/2011      24.0685915      svchost.exe      172.16.0.218      172.16.0.35      TFTP      TFTP: Read Request - File: \boot\x64\foobar.txt, Transfer Mode: octet       {UDP:113, IPv4:112}
468      14:38:59 06/05/2011      32.0687085      svchost.exe      172.16.0.218      172.16.0.35      TFTP      TFTP: Read Request - File: \boot\x64\foobar.txt, Transfer Mode: octet       {UDP:113, IPv4:112}
623      14:39:07 06/05/2011      40.0687536      svchost.exe      172.16.0.218      172.16.0.35      TFTP      TFTP: Read Request - File: \boot\x64\foobar.txt, Transfer Mode: octet       {UDP:113, IPv4:112}
810      14:39:15 06/05/2011      48.0689737      svchost.exe      172.16.0.218      172.16.0.35      TFTP      TFTP: Read Request - File: \boot\x64\foobar.txt, Transfer Mode: octet       {UDP:113, IPv4:112}
938      14:39:23 06/05/2011      56.0690752      svchost.exe      172.16.0.218      172.16.0.35      TFTP      TFTP: Read Request - File: \boot\x64\foobar.txt, Transfer Mode: octet       {UDP:113, IPv4:112}
1078      14:39:31 06/05/2011      64.0701653      svchost.exe      172.16.0.218      172.16.0.35      TFTP      TFTP: Error - ErrorCode: 0, ErrorMessage: timeout on receive       {UDP:113, IPv4:112}
1081      14:39:31 06/05/2011      64.0724239      svchost.exe      172.16.0.35      172.16.0.218      TFTP      TFTP: Error - ErrorCode: 4, ErrorMessage: Illegal operation error.       {UDP:113, IPv4:112}

NOTE: this is on VLAN1 and we know that PXE booting and imaging works correcrtly on this netowork. I get the same log when connecting with TFTP -i on the VLAN  2.


So tried a PXE boot from the same devices with on VLAN 1 (and it still works OK):
287      14:49:53 06/05/2011      17.6962693      svchost.exe      172.16.0.218      172.16.0.10      DHCP      DHCP:Request, MsgType = REQUEST, TransactionID = 0x35870F22      {DHCP:55, UDP:64, IPv4:63}
290      14:49:53 06/05/2011      17.6967747      svchost.exe      172.16.0.10      172.16.0.218      DHCP      DHCP:Reply, MsgType = ACK, TransactionID = 0x35870F22      {DHCP:55, UDP:64, IPv4:63}
291      14:49:53 06/05/2011      17.6988918      svchost.exe      172.16.0.218      172.16.0.10      TFTP      TFTP: Read Request - File: Boot\x86\pxeboot.com, Transfer Mode: octet tsize: 0       {UDP:65, IPv4:63}
292      14:49:53 06/05/2011      17.7005532            172.16.0.10      172.16.0.218      TFTP      TFTP: Option Acknowledgement - tsize: 25772       {UDP:51, IPv4:63}
293      14:49:53 06/05/2011      17.7007692            172.16.0.218      172.16.0.10      TFTP      TFTP: Error - ErrorCode: 0, ErrorMessage: TFTP Aborted       {UDP:51, IPv4:63}
294      14:49:53 06/05/2011      17.7013935      svchost.exe      172.16.0.218      172.16.0.10      TFTP      TFTP: Read Request - File: Boot\x86\pxeboot.com, Transfer Mode: octet blksize: 1456       {UDP:67, IPv4:63}
295      14:49:53 06/05/2011      17.7030575            172.16.0.10      172.16.0.218      TFTP      TFTP: Option Acknowledgement - blksize: 1456       {UDP:52, IPv4:63}
296      14:49:53 06/05/2011      17.7032664            172.16.0.218      172.16.0.10      TFTP      TFTP: Acknowledgement - Block Number: 0      {UDP:52, IPv4:63}
297      14:49:53 06/05/2011      17.7034037            172.16.0.10      172.16.0.218      TFTP      TFTP: Data - Block Number: 1      {UDP:52, IPv4:63}
298      14:49:53 06/05/2011      17.7038913            172.16.0.218      172.16.0.10      TFTP      TFTP: Acknowledgement - Block Number: 1      {UDP:52, IPv4:63}

Then after F12 on the client.....
1243      14:54:07 06/05/2011      19.0573770            172.16.0.218      172.16.0.35      TFTP      TFTP: Acknowledgement - Block Number: 34      {UDP:56, IPv4:33}
1244      14:54:07 06/05/2011      19.0825033      svchost.exe      172.16.0.218      172.16.0.35      TFTP      TFTP: Read Request - File: \hiberfil.sys, Transfer Mode: octet tsize: 0       {UDP:57, IPv4:33}
1245      14:54:07 06/05/2011      19.0840979      svchost.exe      172.16.0.35      172.16.0.218      TFTP      TFTP: Error - ErrorCode: 4, ErrorMessage: Access violation.       {UDP:57, IPv4:33}
1274      14:54:10 06/05/2011      22.0073586      svchost.exe      172.16.0.218      172.16.0.35      TFTP      TFTP: Read Request - File: \Boot\x86\Images\boot.wim, Transfer Mode: octet tsize: 0       {UDP:68, IPv4:33}
1275      14:54:10 06/05/2011      22.0092173            172.16.0.35      172.16.0.218      TFTP      TFTP: Option Acknowledgement - tsize: 145399718       {UDP:69, IPv4:33}
1276      14:54:10 06/05/2011      22.0093577            172.16.0.218      172.16.0.35      TFTP      TFTP: Error - ErrorCode: 0, ErrorMessage: TFTP Aborted       {UDP:69, IPv4:33}
1277      14:54:10 06/05/2011      22.0094823      svchost.exe      172.16.0.218      172.16.0.35      TFTP      TFTP: Read Request - File: \Boot\Boot.SDI, Transfer Mode: octet tsize: 0       {UDP:70, IPv4:33}
1278      14:54:10 06/05/2011      22.0111048            172.16.0.35      172.16.0.218      TFTP      TFTP: Option Acknowledgement - tsize: 3170304       {UDP:71, IPv4:33}
1279      14:54:10 06/05/2011      22.0113579            172.16.0.218      172.16.0.35      TFTP      TFTP: Error - ErrorCode: 0, ErrorMessage: TFTP Aborted       {UDP:71, IPv4:33}
1280      14:54:10 06/05/2011      22.0113579      svchost.exe      172.16.0.218      172.16.0.35      TFTP      TFTP: Read Request - File: \Boot\Boot.SDI, Transfer Mode: octet tsize: 0 blksize: 1422 windowsize: 4       {UDP:72, IPv4:33}
1281      14:54:10 06/05/2011      22.0132645      svchost.exe      172.16.0.35      172.16.0.218      TFTP      TFTP: Option Acknowledgement - blksize: 1422 windowsize: 4 tsize: 3170304       {UDP:73, IPv4:33}
1282      14:54:10 06/05/2011      22.0524825      svchost.exe      172.16.0.218      172.16.0.35      TFTP      TFTP: Acknowledgement - Block Number: 0      {UDP:73, IPv4:33}
1283      14:54:10 06/05/2011      22.0529849      svchost.exe      172.16.0.35      172.16.0.218      TFTP      TFTP: Data - Block Number: 1      {UDP:73, IPv4:33}
1284      14:54:10 06/05/2011      22.0530058      svchost.exe      172.16.0.35      172.16.0.218      TFTP      TFTP: Data - Block Number: 2      {UDP:73, IPv4:33}

Even with the Access Violations, the WDS image processes still works as expected from VLAN 1.

I am on the WDS server on VLAN1 andI don't get any activity when I monitor the traffic from  a client on VLAN2 traffic.

??

Tftp utility in windows create a readonly file on a get operation.
Subsequent get on the same file will fail with access is denied error
Hmm, I could never get the file transferred, so that does not explain the error ?
I think the Connection Timeout from the TFTP -i GET  \Bookt\x64\foobar.txt
is closes to the problem we are getting (PXE-E32 TFTP Open Timeout).

 What I can not figure out is why the connection is timing out ???
Ok, more information: We have just noticed that if we include the scope option 66 and 67 on VLAN 1, we see the following PXE mesages:
UDSNBP Started Using DHCP Referral
Contacting Server: 172.16.0.35 (Gateway 0.0.0.0) <---- WHAT!
Contacting Server: 172.16.0.35
TFTP Download: Boot\x64\pxeboot.com

Press F12 for entwork service boot.

------
OK, so why is the gateway 0.0.0.0, if we have this gateway also on he VLAN 2, then there is no way the traffic can route between VLAN1 and VLAN2.

Does anyone know where WDS is getting this gateway address from, as this is not configured on the server NIC ?

Cheers ?
This is NOT wds which sets this gateway, but the ship server.  the dhcp scope for vlan2 should set the correct gateway for the hosts in this lan.
can you connect a pc to vlan2 and make it hdd boot with it getting its up config by dhcp? Then check that it can ping its router (the interface on the procure which is in vlan2). And then that it can ping wds server.
After all it may be a routing problem!
Of course i meant dhcp server. Not ship server. Stupid autocorrect in android!
Outside of the PXE boot issue, all our desktop on VLAN2 can connect/ping/map drivers to any of the servers on VLAN1 I can also release and renew the IP address of client on VLAN 2.

Cheers.
Just to clarify your other point;
Can ping the gateway of the VLAN1 and VLAN2 from either VLAN.
Also can ping the SHIP :) server from a client on VLAN2.

So don't think this is a routing issue?
I seem t recall that there used to be some troubles with PXE and the DHCP provided gateway.
Is there a BIOS update for your computer that are supposed to PXE-Boot? You could try that.
Also, you may want to try a "PXE on floppy" just to see if the issue is related to your PXE implementation.
Can you post a trace (a real .cap file...) of what happens on UDP port 67, 68, 69 and 4011 on the WDS server when a client from Vlan2 is booting?
Sorry not been update this recently.
However, as you suggested I attempted it from the PXE boot disk and the whole process worked perfectly from the 192.168.0.0 network.

So now I am confused as to why the intergrated PXE fails to work on all our clients, but the same clients work with the Bootable disk....?
This could be a BUG in the PXE implementation.
It wouldn't get or use the "gateway".
What is the NIC in the clients? And the PXE code version/level?
Check for BIOS updates for your clients...
On this particular machine, the I've just update the BIOS, but Intel UNDI was always  PXE 2.1 (Build 082) which I think is the latest.  This one is from a Realtek Controller, although I the issue will affect all clients on the 192.168.0.0 network and we have around 500+ plus of them.
Intel base code 082 is not bugged, but the issue may be in UNDI driver.
JUst to make sure that I understand:
When you use PXE boot disk on a client which cannot boot through its own embedded PXE, then it works?
Yes thats correct
Correct me if Y am wrong but the boot disk may be using PXE 0.99 (if it is a LanWorks or Argon Technology boot disk). Can you confirm? This could help us trying to understand what actually happens.
The boot disk is from http://rom-o-matic.net/.

The image reports to be gPXE1.0.1+

Cheers.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks for the update. I belive that HP does have configurable monitor port, so I'll see what I can captiure for you.

With regards to STP, I know that this not enabled on the switches. We where advised by HP to disable this as it causes to many problems. Just so I understand, can you fill me in with the what spanning tree will do to help DHCP/pxe traffic ?

Many thanks,
Adam.
Spanning tree will prevent loops in your network (by detecting them and disabling the corresponding ports).
If configured correctly, this is not a problem, usually.

I know a little the ProCurve hardware. Can you tell me what firmware is installed on your 5300 ?
Current firmware is E.11.29.

This deivice was updated about a month ago.

Cheers.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Here you go:
 Internet (IP) Service

  IP Routing : Enabled


  Default TTL     : 64
  Arp Age         : 20

  VLAN         | IP Config  IP Address      Subnet Mask     Proxy ARP
  ------------ + ---------- --------------- --------------- ---------
  default      | Manual     172.16.1.100    255.255.252.0   No
  VLAN2        | Manual     192.168.0.1     255.255.248.0   No
  VLAN3        | Manual     192.168.8.1     255.255.248.0   No
  VLAN2100     | Disabled
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks, yes client on VLAN2 get 192.168.01 as the gateway.
I verified this from gPXE and also from the OS.

I'll set up the port monitoring tomorrow and see what we get.

VLAN3 is our wirless network, so don't generally PXE boot.

Thanks for all you help thus far.

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Could not get any usable info from HP port monitoring.
Even HP could not solve this problem.
Had to resort to a dual-nic/VLAN configuration