Possible MAC / ARP problem.

Possible MAC / ARP problem.

This is the scenario:
I have six different networks with approx. 20 Sun servers on each network. All six networks are connected to one Sun server with dual quad (2x4 ports) NIC's, acting as a boot server for all networks.
The boot server and the servers on the LAN has unique MAC on each interface. The servers on the different LAN's has identical MAC.

I having trouble booting these servers from the boot server. Sometimes they looking for the DHCP server for a very long time and most often fail to get en IP. When looking on the boot server interfaces I see the DHCP IP request but the answer never reaches the requesting server.

I believe that the problem is because of the arp table on the boot server. It contains the same MAC on all interfaces. For example:

qfe4       00:80:37:0e:06:22
qfe5       00:80:37:0e:06:22
qfe2       00:80:37:0e:06:22
qfe1       00:80:37:0e:06:22

A lot of broadcast traffic exists. Approx. 3000 packets per 8 seconds.

Am I correct when assuming that the arp table is the problem?
Is there any configuration or network equipment that could solve my problem without having to change MAC on all these servers?
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

I don't know the answer, but a couple of observations*:

Sun OpenBoot has an option "Local Ethernet Address=(True/False)" (Use `eeprom` command to check); Although you may have allocated Ethernet (MAC) addresses to each interface, if it's set to "False", all interfaces will use the "system" MAC address - `ifconfig -a` should bear this out.  In which case, the arp table is correct.

I guess this is to allow redundancy/load balancing etc.  However, if a boot client on network#1 has the same MAC address as a boot client on network#2, I can see the boot server getting confused about which interface it should be sending the response back via - So a first step might be to ensure the boot server displays unique MAC addresses

* gained from watching a colleague try to boot a Sun server on the network at about 2am - So I wasn't at my most alert ;-)
If the network config is sane it shouldn't matter that the same MAC is used on multiple interfaces. By sane I mean that each of the interfaces connects to a physically separate network. In a switched environment this would mean separate switches for each network or VLAN's. Is that the case?

What does 'ifconfig -a' and 'netstat -nr' show?
A few questions for you:

1. Does your boot server have six IPs each sitting in one of your subnet,
    for subnet 172.18.8 do you have a IP 172.18.8.x in the boot server?

2. Does you boot server kown all the clients hostname, IP, Mac?
    Have you put all the client infor in database (eg, NIS+, or files, /etc/ethers ... etc).

3. When you add a client to the boot server, use the following command line
    /path-to/add_install_client -i new_machine_ip -c networcard_add machine_name platform

   If you thing you have done all the above correct, please check your boot server
and client setup against the Solaris "Advanced Installation guide" (it comes with your
media) or have a look at the online book (you can download the pdf file)

The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

MikaelErikssonAuthor Commented:
Thanks for all the input. Here are some answers to your questions.

Sun OpenBoot option "Local Ethernet Address=(True/False)", set to True.
I understand this gives me a unique MAC for all the interfaces on the boot server.

Output from [ifconfig -a]
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
        inet netmask ff000000  
bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet netmask ffffff00 broadcast
        ether 0:3:ba:6e:e9:15
qfe0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet netmask ffff0000 broadcast
        ether 8:0:20:bd:77:48
qfe1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
        inet netmask ffff0000 broadcast
        ether 8:0:20:bd:77:49
qfe2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 5
        inet netmask ffff0000 broadcast
        ether 8:0:20:bd:77:4a
qfe3: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 6
        inet netmask ffff0000 broadcast
        ether 8:0:20:bd:77:4b
qfe4: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 7
        inet netmask ffff0000 broadcast
        ether 8:0:20:b7:3e:c8
qfe5: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 8
        inet netmask ffff0000 broadcast
        ether 8:0:20:b7:3e:c9
qfe6: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 9
        inet netmask ffff0000 broadcast
        ether 8:0:20:b7:3e:ca
qfe7: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 10
        inet netmask ffff0000 broadcast
        ether 8:0:20:b7:3e:cb

(had to change the IP on bge0 for policy reasons, sorry.)

Output from [netstat -nr]
Routing Table: IPv4
  Destination           Gateway           Flags  Ref   Use   Interface
-------------------- -------------------- ----- ----- ------ ---------           UGH      1      3           UGH      1      3           UGH      1      0           UGH      1      0           UGH      1      0           UGH      1      0           UGH      1      3           UGH      1     17         U        1   2347  bge0             U        1      0  qfe7             U        1    509  qfe4             U        1    526  qfe5             U        1      0  qfe6             U        1   2021  qfe0             U        1    508  qfe1             U        1    512  qfe2             U        1   1779  qfe3           U        1      0  bge0
default              UG       1   5021               UH       2    718  lo0

Answer to yuzh,
1. Yes, my boot server has six IPs for each net. See above.

2. The boot server knows of the IP and MAC. I paste a short version of the output from [arp -a]:
[arp -a | grep 00:80:37:0e:02:22]
qfe5       00:80:37:0e:02:22
qfe4       00:80:37:0e:02:22
qfe2       00:80:37:0e:02:22
qfe0       00:80:37:0e:02:22
qfe1       00:80:37:0e:02:22

Does it really matter if I have the clients hostname in DNS or in files at this level?
I though that it's only necessary later on when the installation is complete and the OS is installed.

3. We use "dhtadm" to add the client networks to the boot server.
Output from: [dhtadm -P]
wild_172.22.0.0         Macro           :CDefFile=/gsn/nodes/172_22_0_0/boot.def:Include=              Macro           :Subnet=
wild_172.21.0.0         Macro           :CDefFile=/gsn/nodes/172_21_0_0/boot.def:Include=              Macro           :Subnet=
wild_172.18.0.0         Macro           :CDefFile=/gsn/nodes/172_18_0_0/boot.def:Include=              Macro           :Subnet=
wild_172.19.0.0         Macro           :CDefFile=/gsn/nodes/172_19_0_0/boot.def:Include=
SUNW.sparc.SUNW,UltraSPARC-IIi-cEngine.SunOS    Macro           :CDefFile=boot.def:BootSrvA=

Just to try to explain the network layout:
Site 1:
Approx. 20 Sun servers (blade servers in a magazine) connected via the magazine backplane to a built in switch.
The built in switch is connected to another switch (ordinary Netgear) to port 25. Port 1 on the switch is connected directly to one of the interfaces on the bootserver.

Exactly the same config but connected to another interface on the boot server. The MAC addresses on the blade servers are unique within the site but exactly the same as the other sites.

For different reasons we can not change the MAC addresses on the blade servers or use different boot servers for all the sites.

I will keep on trying to configure the Netgear switch to not send the MAC address of the servers to the boot server.
Maybe Tagged VLANs is a solution?

Thanks again for the input I hope to hear more from you.

// Mike

Does site1 & site2 connect through a common switch to the boot server? DHCP routing of return packets for multiple interfaces will only work if each interface connects to a physically separate network.
MikaelErikssonAuthor Commented:
No, there are one switch per site connected directly to the boot server interface. Each interface on the boot server is configured as a separate Class B network. That means if I unplug the ethernet cable from the boot server there are no physical connection between the different sites.

When I attach the cable all broad-casting packages from the different sites will be received by the boot server and the arp table starts to update.
Since the MAC addresses are the same between the sites the arp table contains the same MAC on multiple interfaces. My guess is that when the boot server replies to a DHCP request the DHCP reply ends up on the wrong interface, or even worse it recognize the MAC and look in the DHCP table and gives out an IP address already in use by another site.
[arp -a | grep 00:80:37:0e:02:22]
qfe5       00:80:37:0e:02:22
qfe4       00:80:37:0e:02:22

Indicates that the vendor of the MAC is Ericsson Business Comm. Is there a router in the path from the blade magazines to the boot server's interfaces?
Check your /etc/netmasks file to see if it has netmasks for all your subnets.

Did you run:

for the client?

see http:#12469216

Could you please check your boot server config agaist :

Full doc: http://www.bu.edu/systems-support/admin/network/sol/bootserver.html
MikaelErikssonAuthor Commented:
jlevie, no there is no router between the blade servers and the boot server.

yuzh,  I am not responsible for installing the blade servers. I'm not sure if the command "add_install_client" is used.
If it's important to know if we are using that, I can find out for you. I think there is another set of scripts to run when installing the clients. Maybe one of the scripts calls for "add_install_client". Thanks for the links, I will have a look at them today.

Don't you think my problem is because of the identical MAC addresses on the interface?
I think this is a network error, not an OS configuration error.
Are you trying to using the boot server perfrom OS installation for you client box? if it is
the case, you do need to run "add_install_client"

You can check the /etc/bootparams file (a text file) to see if your client box is defined in
the file. The file have the infor about the client hostname, mac add, and boot image etc.
(you can have multiple version OS images installed in your boot server).
Are you sure the boot server has been setup as the boot server? please ask the person
who setup the server to see what has been done.
MikaelErikssonAuthor Commented:
The server has no "/etc/bootparams" file or "add_install_client".
The boot function is custom made for this environment. The clients (blade servers)  does not install Sun Solaris, instead they install another OS that's also custom made. To install a client DHCP is used combined with a lot of macros and scripts.
I understand that it's  almost impossible for you to help me with the setup of the OS and boot-install scripts because I don't know myself and can't tell you how it works.

Anyway, I still think this is more of a network issue because of the MAC address problem. The problem is first noticed when the clients are requesting an IP address from the DHCP service on the boot server. The IP packages comes in to the boot server (via broad cast) but the clients does not receive any IP addresses. That's before the OS installation starts. When we succeed in getting an IP address from the boot server the installation works fine.

I looked for some info about setting up multiple DHCP servers on the same host and force them to only listen to one interface and then create different arp tables for each interface. Unfortunately it does not seams possible to do with Sun Solaris 8. While looking for that info I discovered another possible solution at: http://ebtables.sourceforge.net/. That could be a solution to my problems.

For now we will solve this problem by disable all the network interfaces on the boot server and just use one at a time, when we need to install/re-install the clients on the different sites. That's not a good solution in the long run, but works for now. Maybe ebtables is the way to go.

Thank you all for your effort in trying to solve this!
If you have more ideas, they are very welcome.
Don't you think my problem is because of the identical MAC addresses on the interface?

Yes, and I don't understand why the boot server see's the same MAC from multiple clients. Hence the question about the router. If there was a router in the network path to the blade servers the boot server would see the requests as coming from the router's MAC.

However, if there's no router it becomes a bit more mysterious. You say that the network path is unique from a qfe interface to a bank of blade servers, yet the arp table shows the same MAC on multiple interfaces on the boot server. Unless the blade servers are defective in that more than one has then same MAC there shouldn't be any way to have what you've observed, if each qfe uniqely connects to a bank of blades.
MikaelErikssonAuthor Commented:
There is no router between and the qfe connects directly to the built in switch (in the back plane) at the magazine. No connection between the sites exists, except for physical connection via the boot server. No routing take place between the interfaces.

As I wrote before:
"The MAC addresses on the blade servers are unique within the site but exactly the same as the other sites. "
As far as I know there is only one arp table for all interfaces. This must be a problem since all sites are connected to the same boot server.
Just to summarise:

qfe5: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 8
        inet netmask ffff0000 broadcast
        ether 8:0:20:b7:3e:c9  # This is unique

[arp -a |grep  00:80:37:0e:02:22|grep qfe5]
qfe5       00:80:37:0e:02:22

- This extract does _not_ show the MAC address of qfe5, but of a system on the network qfe5 is connected to. So the arp table is fine: The system with IP address is on the network that qfe5 is connected to.   The confusion arises because there's another (client) system with the same MAC address on another network, as show by:

 [arp -a | grep 00:80:37:0e:02:22]
qfe5       00:80:37:0e:02:22
qfe4       00:80:37:0e:02:22
> The MAC addresses on the blade servers are unique within the site but exactly the same as the other sites.

Which says to me that there's a problem with the blade servers or magazines. Since each blade server is in fact a different computer the MAC's must be unique across all blades on all sites.  Given what you see regarding the MAC's and the nature of the blade servers I'm wondering if it is a result of the magazines being the actual holders of the MAC's. That would make sense in that it allows a blade to be replaced with no other changes. Could it be that there's a config setting for each magazine that allows the base MAC to be set? If there is and each magazine hasn't been configured for a unique range of MAC's it would explain why each blade's MAC being unique within a magazine, but duplicated on another magazine.
MikaelErikssonAuthor Commented:

You are right. It's probably the magazines that is the holder of the MAC's. And unfortenately it must be that way because of redundancy.
As I worte before, we are not allowed to change the MAC's of the blade servers. That's why I'm looking for a network equipment that solves this problem.
There is some boot rom parameter to assign unique addresses to multitail adaptors, i will look around for it, maybe you can dig it up bu typing printenv in boot console (Stop-A to get one at boot)
If you can't change the base MAC on the magazines so that each blade slot has a unique ID I think there are only two possible solutions to DHCP. One would be a separate DHCP server for each magazine. The other would be to run an instance of dhcpd on each interface of the boot server. I haven't examined the dhcpd code to see what would happen on the reply packets when dhcpd is listening on a single interface so I don't know if modifications would be needed to force the replies back out that interface. Obviously, since the MAC's are duplicated each instance of dhcpd must have its own private lease database.

Using multiple instances of dhcpd will solve the DHCP issue, but if other things on the boot server must talk to a blade you'll still run into trouble with the arp table. Only separate boot servers will solve that.
MikaelErikssonAuthor Commented:
To use multiple dhcpd, one for each interface has crossed my mind but as you wrote, the problem will still exist for other services communicating with the blade servers. That’s why I want to know if there is a network product that could handle this.
I can't think of a way of solving the general problem just on the boot server. However, if you placed a two-port router (that can relay DHCP) between each magazine and the boot server it would solve the problem.

I guess I don't understand why you can't change the base MAC for each magazine, assuming it works the way I suspect it does. Blades would still be unit replaceable with no config changes.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
MikaelErikssonAuthor Commented:
It's probably possible to change the MAC addresses in a technical way of looking, My reason is beacuse of other things.
I will try the router sollution. In fact I have already started with that for a few days ago.

Anyway, I should not keep you guys bussy with this question anymore.
Thanks for all your input and I hope that I could help you someday.

jlievie, thanks for the help with this and others of my questions!
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.