[Last Call] Learn about multicloud storage options and how to improve your company's cloud strategy. Register Now

x
?
Solved

Possible MAC / ARP problem.

Posted on 2004-11-01
22
Medium Priority
?
756 Views
Last Modified: 2013-12-23
Possible MAC / ARP problem.

This is the scenario:
I have six different networks with approx. 20 Sun servers on each network. All six networks are connected to one Sun server with dual quad (2x4 ports) NIC's, acting as a boot server for all networks.
The boot server and the servers on the LAN has unique MAC on each interface. The servers on the different LAN's has identical MAC.

I having trouble booting these servers from the boot server. Sometimes they looking for the DHCP server for a very long time and most often fail to get en IP. When looking on the boot server interfaces I see the DHCP IP request but the answer never reaches the requesting server.

I believe that the problem is because of the arp table on the boot server. It contains the same MAC on all interfaces. For example:

qfe4   172.20.8.98          255.255.255.255       00:80:37:0e:06:22
qfe5   172.21.8.98          255.255.255.255       00:80:37:0e:06:22
qfe2   172.18.8.98          255.255.255.255       00:80:37:0e:06:22
qfe1   172.17.8.98          255.255.255.255       00:80:37:0e:06:22

A lot of broadcast traffic exists. Approx. 3000 packets per 8 seconds.

Am I correct when assuming that the arp table is the problem?
Is there any configuration or network equipment that could solve my problem without having to change MAC on all these servers?
0
Comment
Question by:MikaelEriksson
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 7
  • 4
  • +2
22 Comments
 
LVL 21

Expert Comment

by:tfewster
ID: 12467417
I don't know the answer, but a couple of observations*:

Sun OpenBoot has an option "Local Ethernet Address=(True/False)" (Use `eeprom` command to check); Although you may have allocated Ethernet (MAC) addresses to each interface, if it's set to "False", all interfaces will use the "system" MAC address - `ifconfig -a` should bear this out.  In which case, the arp table is correct.

I guess this is to allow redundancy/load balancing etc.  However, if a boot client on network#1 has the same MAC address as a boot client on network#2, I can see the boot server getting confused about which interface it should be sending the response back via - So a first step might be to ensure the boot server displays unique MAC addresses


* gained from watching a colleague try to boot a Sun server on the network at about 2am - So I wasn't at my most alert ;-)
0
 
LVL 40

Expert Comment

by:jlevie
ID: 12468765
If the network config is sane it shouldn't matter that the same MAC is used on multiple interfaces. By sane I mean that each of the interfaces connects to a physically separate network. In a switched environment this would mean separate switches for each network or VLAN's. Is that the case?

What does 'ifconfig -a' and 'netstat -nr' show?
0
 
LVL 38

Expert Comment

by:yuzh
ID: 12469216
A few questions for you:

1. Does your boot server have six IPs each sitting in one of your subnet,
    eg:
    for subnet 172.18.8 do you have a IP 172.18.8.x in the boot server?

2. Does you boot server kown all the clients hostname, IP, Mac?
    Have you put all the client infor in database (eg, NIS+, or files, /etc/ethers ... etc).

3. When you add a client to the boot server, use the following command line
    syntax:
    /path-to/add_install_client -i new_machine_ip -c networcard_add machine_name platform

   If you thing you have done all the above correct, please check your boot server
and client setup against the Solaris "Advanced Installation guide" (it comes with your
media) or have a look at the online book (you can download the pdf file)

   http://docs.sun.com/db/coll/214.7
   
   
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 1

Author Comment

by:MikaelEriksson
ID: 12470510
Thanks for all the input. Here are some answers to your questions.

Sun OpenBoot option "Local Ethernet Address=(True/False)", set to True.
I understand this gives me a unique MAC for all the interfaces on the boot server.

Output from [ifconfig -a]
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000  
bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 111.111.11.11 netmask ffffff00 broadcast 111.111.11.255
        ether 0:3:ba:6e:e9:15
qfe0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet 172.16.0.1 netmask ffff0000 broadcast 172.16.255.255
        ether 8:0:20:bd:77:48
qfe1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
        inet 172.17.0.1 netmask ffff0000 broadcast 172.17.255.255
        ether 8:0:20:bd:77:49
qfe2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 5
        inet 172.18.0.1 netmask ffff0000 broadcast 172.18.255.255
        ether 8:0:20:bd:77:4a
qfe3: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 6
        inet 172.19.0.1 netmask ffff0000 broadcast 172.19.255.255
        ether 8:0:20:bd:77:4b
qfe4: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 7
        inet 172.21.0.1 netmask ffff0000 broadcast 172.21.255.255
        ether 8:0:20:b7:3e:c8
qfe5: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 8
        inet 172.22.0.1 netmask ffff0000 broadcast 172.22.255.255
        ether 8:0:20:b7:3e:c9
qfe6: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 9
        inet 172.23.0.1 netmask ffff0000 broadcast 172.23.255.255
        ether 8:0:20:b7:3e:ca
qfe7: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 10
        inet 172.24.0.1 netmask ffff0000 broadcast 172.24.255.255
        ether 8:0:20:b7:3e:cb

(had to change the IP on bge0 for policy reasons, sorry.)

Output from [netstat -nr]
Routing Table: IPv4
  Destination           Gateway           Flags  Ref   Use   Interface
-------------------- -------------------- ----- ----- ------ ---------
172.20.110.150       172.21.5.66           UGH      1      3
172.20.110.80        172.18.5.66           UGH      1      3
172.20.110.81        172.18.5.66           UGH      1      0
172.20.110.190       172.21.5.66           UGH      1      0
172.20.110.191       172.21.5.66           UGH      1      0
172.20.110.34        172.16.5.66           UGH      1      0
172.20.110.33        172.16.5.66           UGH      1      3
172.40.110.150       172.20.5.66           UGH      1     17
111.111.11.0         111.111.11.1         U        1   2347  bge0
172.24.0.0           172.24.0.1            U        1      0  qfe7
172.21.0.0           172.21.0.1            U        1    509  qfe4
172.22.0.0           172.22.0.1            U        1    526  qfe5
172.23.0.0           172.23.0.1            U        1      0  qfe6
172.16.0.0           172.16.0.1            U        1   2021  qfe0
172.17.0.0           172.17.0.1            U        1    508  qfe1
172.18.0.0           172.18.0.1            U        1    512  qfe2
172.19.0.0           172.19.0.1            U        1   1779  qfe3
224.0.0.0            111.111.11.11         U        1      0  bge0
default              111.111.11.1          UG       1   5021
127.0.0.1            127.0.0.1             UH       2    718  lo0

Answer to yuzh,
1. Yes, my boot server has six IPs for each net. See above.

2. The boot server knows of the IP and MAC. I paste a short version of the output from [arp -a]:
[arp -a | grep 00:80:37:0e:02:22]
qfe5   172.22.8.34          255.255.255.255       00:80:37:0e:02:22
qfe4   172.21.8.34          255.255.255.255       00:80:37:0e:02:22
qfe2   172.18.8.34          255.255.255.255       00:80:37:0e:02:22
qfe0   172.16.8.34          255.255.255.255       00:80:37:0e:02:22
qfe1   172.17.8.34          255.255.255.255       00:80:37:0e:02:22

Does it really matter if I have the clients hostname in DNS or in files at this level?
I though that it's only necessary later on when the installation is complete and the OS is installed.

3. We use "dhtadm" to add the client networks to the boot server.
Output from: [dhtadm -P]
wild_172.22.0.0         Macro           :CDefFile=/gsn/nodes/172_22_0_0/boot.def:Include=172.22.0.0:BootSrvA=17 2.22.0.1:BootSrvN=gprs_qfe5:SrootIP4=172.22.0.1:SrootNM=gprs_qfe5:SrootPTH=/gsn/sw/nib/nib-r1f/nib_R1F:
172.22.0.0              Macro           :Subnet=255.255.0.0:Broadcst=172.22.255.255:Router=172.22.0.1:MTU=1500:
wild_172.21.0.0         Macro           :CDefFile=/gsn/nodes/172_21_0_0/boot.def:Include=172.21.0.0:BootSrvA=17 2.21.0.1:BootSrvN=gprs_qfe4:SrootIP4=172.21.0.1:SrootNM=gprs_qfe4:SrootPTH=/gsn/sw/nib/nib-r1f/nib_R1F:
172.21.0.0              Macro           :Subnet=255.255.0.0:Broadcst=172.21.255.255:Router=172.21.0.1:MTU=1500:
wild_172.18.0.0         Macro           :CDefFile=/gsn/nodes/172_18_0_0/boot.def:Include=172.18.0.0:BootSrvA=17 2.18.0.1:BootSrvN=gprs_qfe2:SrootIP4=172.18.0.1:SrootNM=gprs_qfe2:SrootPTH=/gsn/sw/nib/nib-r1f/nib_R1F:
172.18.0.0              Macro           :Subnet=255.255.0.0:Broadcst=172.18.255.255:Router=172.18.0.1:MTU=1500:
wild_172.19.0.0         Macro           :CDefFile=/gsn/nodes/172_19_0_0/boot.def:Include=172.19.0.0:BootSrvA=17 2.19.0.1:BootSrvN=gprs_qfe3:SrootIP4=172.19.0.1:SrootNM=gprs_qfe3:SrootPTH=/gsn/sw/nib/nib-r1f/nib_R1F:
SUNW.sparc.SUNW,UltraSPARC-IIi-cEngine.SunOS    Macro           :CDefFile=boot.def:BootSrvA=150.132.90.20:BootS

Just to try to explain the network layout:
Site 1:
Approx. 20 Sun servers (blade servers in a magazine) connected via the magazine backplane to a built in switch.
The built in switch is connected to another switch (ordinary Netgear) to port 25. Port 1 on the switch is connected directly to one of the interfaces on the bootserver.

Site2:
Exactly the same config but connected to another interface on the boot server. The MAC addresses on the blade servers are unique within the site but exactly the same as the other sites.

For different reasons we can not change the MAC addresses on the blade servers or use different boot servers for all the sites.

I will keep on trying to configure the Netgear switch to not send the MAC address of the servers to the boot server.
Maybe Tagged VLANs is a solution?

Thanks again for the input I hope to hear more from you.

// Mike



0
 
LVL 40

Expert Comment

by:jlevie
ID: 12472568
Does site1 & site2 connect through a common switch to the boot server? DHCP routing of return packets for multiple interfaces will only work if each interface connects to a physically separate network.
0
 
LVL 1

Author Comment

by:MikaelEriksson
ID: 12473510
No, there are one switch per site connected directly to the boot server interface. Each interface on the boot server is configured as a separate Class B network. That means if I unplug the ethernet cable from the boot server there are no physical connection between the different sites.

When I attach the cable all broad-casting packages from the different sites will be received by the boot server and the arp table starts to update.
Since the MAC addresses are the same between the sites the arp table contains the same MAC on multiple interfaces. My guess is that when the boot server replies to a DHCP request the DHCP reply ends up on the wrong interface, or even worse it recognize the MAC and look in the DHCP table and gives out an IP address already in use by another site.
0
 
LVL 40

Expert Comment

by:jlevie
ID: 12476064
[arp -a | grep 00:80:37:0e:02:22]
qfe5   172.22.8.34          255.255.255.255       00:80:37:0e:02:22
qfe4   172.21.8.34          255.255.255.255       00:80:37:0e:02:22

Indicates that the vendor of the MAC is Ericsson Business Comm. Is there a router in the path from the blade magazines to the boot server's interfaces?
0
 
LVL 38

Expert Comment

by:yuzh
ID: 12479822
Check your /etc/netmasks file to see if it has netmasks for all your subnets.

Did you run:
/path-to/add_install_client  

for the client?

see http:#12469216

Could you please check your boot server config agaist :
http://www.bu.edu/systems-support/admin/network/sol/bootserver.html#configuring

Full doc: http://www.bu.edu/systems-support/admin/network/sol/bootserver.html
0
 
LVL 1

Author Comment

by:MikaelEriksson
ID: 12480621
jlevie, no there is no router between the blade servers and the boot server.

yuzh,  I am not responsible for installing the blade servers. I'm not sure if the command "add_install_client" is used.
If it's important to know if we are using that, I can find out for you. I think there is another set of scripts to run when installing the clients. Maybe one of the scripts calls for "add_install_client". Thanks for the links, I will have a look at them today.

Don't you think my problem is because of the identical MAC addresses on the interface?
I think this is a network error, not an OS configuration error.
0
 
LVL 38

Expert Comment

by:yuzh
ID: 12480911
Are you trying to using the boot server perfrom OS installation for you client box? if it is
the case, you do need to run "add_install_client"

You can check the /etc/bootparams file (a text file) to see if your client box is defined in
the file. The file have the infor about the client hostname, mac add, and boot image etc.
(you can have multiple version OS images installed in your boot server).
 
0
 
LVL 38

Expert Comment

by:yuzh
ID: 12480923
Are you sure the boot server has been setup as the boot server? please ask the person
who setup the server to see what has been done.
0
 
LVL 1

Author Comment

by:MikaelEriksson
ID: 12482347
The server has no "/etc/bootparams" file or "add_install_client".
The boot function is custom made for this environment. The clients (blade servers)  does not install Sun Solaris, instead they install another OS that's also custom made. To install a client DHCP is used combined with a lot of macros and scripts.
I understand that it's  almost impossible for you to help me with the setup of the OS and boot-install scripts because I don't know myself and can't tell you how it works.

Anyway, I still think this is more of a network issue because of the MAC address problem. The problem is first noticed when the clients are requesting an IP address from the DHCP service on the boot server. The IP packages comes in to the boot server (via broad cast) but the clients does not receive any IP addresses. That's before the OS installation starts. When we succeed in getting an IP address from the boot server the installation works fine.

I looked for some info about setting up multiple DHCP servers on the same host and force them to only listen to one interface and then create different arp tables for each interface. Unfortunately it does not seams possible to do with Sun Solaris 8. While looking for that info I discovered another possible solution at: http://ebtables.sourceforge.net/. That could be a solution to my problems.

For now we will solve this problem by disable all the network interfaces on the boot server and just use one at a time, when we need to install/re-install the clients on the different sites. That's not a good solution in the long run, but works for now. Maybe ebtables is the way to go.

Thank you all for your effort in trying to solve this!
If you have more ideas, they are very welcome.
0
 
LVL 40

Expert Comment

by:jlevie
ID: 12483197
Don't you think my problem is because of the identical MAC addresses on the interface?

Yes, and I don't understand why the boot server see's the same MAC from multiple clients. Hence the question about the router. If there was a router in the network path to the blade servers the boot server would see the requests as coming from the router's MAC.

However, if there's no router it becomes a bit more mysterious. You say that the network path is unique from a qfe interface to a bank of blade servers, yet the arp table shows the same MAC on multiple interfaces on the boot server. Unless the blade servers are defective in that more than one has then same MAC there shouldn't be any way to have what you've observed, if each qfe uniqely connects to a bank of blades.
0
 
LVL 1

Author Comment

by:MikaelEriksson
ID: 12484766
There is no router between and the qfe connects directly to the built in switch (in the back plane) at the magazine. No connection between the sites exists, except for physical connection via the boot server. No routing take place between the interfaces.

As I wrote before:
"The MAC addresses on the blade servers are unique within the site but exactly the same as the other sites. "
As far as I know there is only one arp table for all interfaces. This must be a problem since all sites are connected to the same boot server.
0
 
LVL 21

Expert Comment

by:tfewster
ID: 12486306
Just to summarise:

qfe5: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 8
        inet 172.22.0.1 netmask ffff0000 broadcast 172.22.255.255
        ether 8:0:20:b7:3e:c9  # This is unique

[arp -a |grep  00:80:37:0e:02:22|grep qfe5]
qfe5   172.22.8.34          255.255.255.255       00:80:37:0e:02:22

- This extract does _not_ show the MAC address of qfe5, but of a system on the network qfe5 is connected to. So the arp table is fine: The system with IP address 172.22.8.34 is on the network that qfe5 is connected to.   The confusion arises because there's another (client) system with the same MAC address on another network, as show by:

 [arp -a | grep 00:80:37:0e:02:22]
qfe5   172.22.8.34          255.255.255.255       00:80:37:0e:02:22
qfe4   172.21.8.34          255.255.255.255       00:80:37:0e:02:22
0
 
LVL 40

Expert Comment

by:jlevie
ID: 12487342
> The MAC addresses on the blade servers are unique within the site but exactly the same as the other sites.

Which says to me that there's a problem with the blade servers or magazines. Since each blade server is in fact a different computer the MAC's must be unique across all blades on all sites.  Given what you see regarding the MAC's and the nature of the blade servers I'm wondering if it is a result of the magazines being the actual holders of the MAC's. That would make sense in that it allows a blade to be replaced with no other changes. Could it be that there's a config setting for each magazine that allows the base MAC to be set? If there is and each magazine hasn't been configured for a unique range of MAC's it would explain why each blade's MAC being unique within a magazine, but duplicated on another magazine.
0
 
LVL 1

Author Comment

by:MikaelEriksson
ID: 12490764
jlevie,

You are right. It's probably the magazines that is the holder of the MAC's. And unfortenately it must be that way because of redundancy.
As I worte before, we are not allowed to change the MAC's of the blade servers. That's why I'm looking for a network equipment that solves this problem.
0
 
LVL 62

Expert Comment

by:gheist
ID: 12492470
There is some boot rom parameter to assign unique addresses to multitail adaptors, i will look around for it, maybe you can dig it up bu typing printenv in boot console (Stop-A to get one at boot)
0
 
LVL 40

Expert Comment

by:jlevie
ID: 12496340
If you can't change the base MAC on the magazines so that each blade slot has a unique ID I think there are only two possible solutions to DHCP. One would be a separate DHCP server for each magazine. The other would be to run an instance of dhcpd on each interface of the boot server. I haven't examined the dhcpd code to see what would happen on the reply packets when dhcpd is listening on a single interface so I don't know if modifications would be needed to force the replies back out that interface. Obviously, since the MAC's are duplicated each instance of dhcpd must have its own private lease database.

Using multiple instances of dhcpd will solve the DHCP issue, but if other things on the boot server must talk to a blade you'll still run into trouble with the arp table. Only separate boot servers will solve that.
0
 
LVL 1

Author Comment

by:MikaelEriksson
ID: 12502951
To use multiple dhcpd, one for each interface has crossed my mind but as you wrote, the problem will still exist for other services communicating with the blade servers. That’s why I want to know if there is a network product that could handle this.
0
 
LVL 40

Accepted Solution

by:
jlevie earned 750 total points
ID: 12504629
I can't think of a way of solving the general problem just on the boot server. However, if you placed a two-port router (that can relay DHCP) between each magazine and the boot server it would solve the problem.

I guess I don't understand why you can't change the base MAC for each magazine, assuming it works the way I suspect it does. Blades would still be unit replaceable with no config changes.
0
 
LVL 1

Author Comment

by:MikaelEriksson
ID: 12521514
It's probably possible to change the MAC addresses in a technical way of looking, My reason is beacuse of other things.
I will try the router sollution. In fact I have already started with that for a few days ago.

Anyway, I should not keep you guys bussy with this question anymore.
Thanks for all your input and I hope that I could help you someday.

jlievie, thanks for the help with this and others of my questions!
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article will inform Clients about common and important expectations from the freelancers (Experts) who are looking at your Gig.
This month, Experts Exchange’s free Course of the Month is focused on CompTIA IT Fundamentals.
Internet Business Fax to Email Made Easy - With  eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, f…
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…

650 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question