Solved

Possible MAC / ARP problem.

Posted on 2004-11-01
732 Views
Last Modified: 2013-12-23
Possible MAC / ARP problem.

This is the scenario:
I have six different networks with approx. 20 Sun servers on each network. All six networks are connected to one Sun server with dual quad (2x4 ports) NIC's, acting as a boot server for all networks.
The boot server and the servers on the LAN has unique MAC on each interface. The servers on the different LAN's has identical MAC.

I having trouble booting these servers from the boot server. Sometimes they looking for the DHCP server for a very long time and most often fail to get en IP. When looking on the boot server interfaces I see the DHCP IP request but the answer never reaches the requesting server.

I believe that the problem is because of the arp table on the boot server. It contains the same MAC on all interfaces. For example:

qfe4   172.20.8.98          255.255.255.255       00:80:37:0e:06:22
qfe5   172.21.8.98          255.255.255.255       00:80:37:0e:06:22
qfe2   172.18.8.98          255.255.255.255       00:80:37:0e:06:22
qfe1   172.17.8.98          255.255.255.255       00:80:37:0e:06:22

A lot of broadcast traffic exists. Approx. 3000 packets per 8 seconds.

Am I correct when assuming that the arp table is the problem?
Is there any configuration or network equipment that could solve my problem without having to change MAC on all these servers?
0
Question by:MikaelEriksson
    22 Comments
     
    LVL 20

    Expert Comment

    by:tfewster
    I don't know the answer, but a couple of observations*:

    Sun OpenBoot has an option "Local Ethernet Address=(True/False)" (Use `eeprom` command to check); Although you may have allocated Ethernet (MAC) addresses to each interface, if it's set to "False", all interfaces will use the "system" MAC address - `ifconfig -a` should bear this out.  In which case, the arp table is correct.

    I guess this is to allow redundancy/load balancing etc.  However, if a boot client on network#1 has the same MAC address as a boot client on network#2, I can see the boot server getting confused about which interface it should be sending the response back via - So a first step might be to ensure the boot server displays unique MAC addresses


    * gained from watching a colleague try to boot a Sun server on the network at about 2am - So I wasn't at my most alert ;-)
    0
     
    LVL 40

    Expert Comment

    by:jlevie
    If the network config is sane it shouldn't matter that the same MAC is used on multiple interfaces. By sane I mean that each of the interfaces connects to a physically separate network. In a switched environment this would mean separate switches for each network or VLAN's. Is that the case?

    What does 'ifconfig -a' and 'netstat -nr' show?
    0
     
    LVL 38

    Expert Comment

    by:yuzh
    A few questions for you:

    1. Does your boot server have six IPs each sitting in one of your subnet,
        eg:
        for subnet 172.18.8 do you have a IP 172.18.8.x in the boot server?

    2. Does you boot server kown all the clients hostname, IP, Mac?
        Have you put all the client infor in database (eg, NIS+, or files, /etc/ethers ... etc).

    3. When you add a client to the boot server, use the following command line
        syntax:
        /path-to/add_install_client -i new_machine_ip -c networcard_add machine_name platform

       If you thing you have done all the above correct, please check your boot server
    and client setup against the Solaris "Advanced Installation guide" (it comes with your
    media) or have a look at the online book (you can download the pdf file)

       http://docs.sun.com/db/coll/214.7
       
       
    0
     
    LVL 1

    Author Comment

    by:MikaelEriksson
    Thanks for all the input. Here are some answers to your questions.

    Sun OpenBoot option "Local Ethernet Address=(True/False)", set to True.
    I understand this gives me a unique MAC for all the interfaces on the boot server.

    Output from [ifconfig -a]
    lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
            inet 127.0.0.1 netmask ff000000  
    bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
            inet 111.111.11.11 netmask ffffff00 broadcast 111.111.11.255
            ether 0:3:ba:6e:e9:15
    qfe0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
            inet 172.16.0.1 netmask ffff0000 broadcast 172.16.255.255
            ether 8:0:20:bd:77:48
    qfe1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
            inet 172.17.0.1 netmask ffff0000 broadcast 172.17.255.255
            ether 8:0:20:bd:77:49
    qfe2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 5
            inet 172.18.0.1 netmask ffff0000 broadcast 172.18.255.255
            ether 8:0:20:bd:77:4a
    qfe3: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 6
            inet 172.19.0.1 netmask ffff0000 broadcast 172.19.255.255
            ether 8:0:20:bd:77:4b
    qfe4: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 7
            inet 172.21.0.1 netmask ffff0000 broadcast 172.21.255.255
            ether 8:0:20:b7:3e:c8
    qfe5: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 8
            inet 172.22.0.1 netmask ffff0000 broadcast 172.22.255.255
            ether 8:0:20:b7:3e:c9
    qfe6: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 9
            inet 172.23.0.1 netmask ffff0000 broadcast 172.23.255.255
            ether 8:0:20:b7:3e:ca
    qfe7: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 10
            inet 172.24.0.1 netmask ffff0000 broadcast 172.24.255.255
            ether 8:0:20:b7:3e:cb

    (had to change the IP on bge0 for policy reasons, sorry.)

    Output from [netstat -nr]
    Routing Table: IPv4
      Destination           Gateway           Flags  Ref   Use   Interface
    -------------------- -------------------- ----- ----- ------ ---------
    172.20.110.150       172.21.5.66           UGH      1      3
    172.20.110.80        172.18.5.66           UGH      1      3
    172.20.110.81        172.18.5.66           UGH      1      0
    172.20.110.190       172.21.5.66           UGH      1      0
    172.20.110.191       172.21.5.66           UGH      1      0
    172.20.110.34        172.16.5.66           UGH      1      0
    172.20.110.33        172.16.5.66           UGH      1      3
    172.40.110.150       172.20.5.66           UGH      1     17
    111.111.11.0         111.111.11.1         U        1   2347  bge0
    172.24.0.0           172.24.0.1            U        1      0  qfe7
    172.21.0.0           172.21.0.1            U        1    509  qfe4
    172.22.0.0           172.22.0.1            U        1    526  qfe5
    172.23.0.0           172.23.0.1            U        1      0  qfe6
    172.16.0.0           172.16.0.1            U        1   2021  qfe0
    172.17.0.0           172.17.0.1            U        1    508  qfe1
    172.18.0.0           172.18.0.1            U        1    512  qfe2
    172.19.0.0           172.19.0.1            U        1   1779  qfe3
    224.0.0.0            111.111.11.11         U        1      0  bge0
    default              111.111.11.1          UG       1   5021
    127.0.0.1            127.0.0.1             UH       2    718  lo0

    Answer to yuzh,
    1. Yes, my boot server has six IPs for each net. See above.

    2. The boot server knows of the IP and MAC. I paste a short version of the output from [arp -a]:
    [arp -a | grep 00:80:37:0e:02:22]
    qfe5   172.22.8.34          255.255.255.255       00:80:37:0e:02:22
    qfe4   172.21.8.34          255.255.255.255       00:80:37:0e:02:22
    qfe2   172.18.8.34          255.255.255.255       00:80:37:0e:02:22
    qfe0   172.16.8.34          255.255.255.255       00:80:37:0e:02:22
    qfe1   172.17.8.34          255.255.255.255       00:80:37:0e:02:22

    Does it really matter if I have the clients hostname in DNS or in files at this level?
    I though that it's only necessary later on when the installation is complete and the OS is installed.

    3. We use "dhtadm" to add the client networks to the boot server.
    Output from: [dhtadm -P]
    wild_172.22.0.0         Macro           :CDefFile=/gsn/nodes/172_22_0_0/boot.def:Include=172.22.0.0:BootSrvA=17 2.22.0.1:BootSrvN=gprs_qfe5:SrootIP4=172.22.0.1:SrootNM=gprs_qfe5:SrootPTH=/gsn/sw/nib/nib-r1f/nib_R1F:
    172.22.0.0              Macro           :Subnet=255.255.0.0:Broadcst=172.22.255.255:Router=172.22.0.1:MTU=1500:
    wild_172.21.0.0         Macro           :CDefFile=/gsn/nodes/172_21_0_0/boot.def:Include=172.21.0.0:BootSrvA=17 2.21.0.1:BootSrvN=gprs_qfe4:SrootIP4=172.21.0.1:SrootNM=gprs_qfe4:SrootPTH=/gsn/sw/nib/nib-r1f/nib_R1F:
    172.21.0.0              Macro           :Subnet=255.255.0.0:Broadcst=172.21.255.255:Router=172.21.0.1:MTU=1500:
    wild_172.18.0.0         Macro           :CDefFile=/gsn/nodes/172_18_0_0/boot.def:Include=172.18.0.0:BootSrvA=17 2.18.0.1:BootSrvN=gprs_qfe2:SrootIP4=172.18.0.1:SrootNM=gprs_qfe2:SrootPTH=/gsn/sw/nib/nib-r1f/nib_R1F:
    172.18.0.0              Macro           :Subnet=255.255.0.0:Broadcst=172.18.255.255:Router=172.18.0.1:MTU=1500:
    wild_172.19.0.0         Macro           :CDefFile=/gsn/nodes/172_19_0_0/boot.def:Include=172.19.0.0:BootSrvA=17 2.19.0.1:BootSrvN=gprs_qfe3:SrootIP4=172.19.0.1:SrootNM=gprs_qfe3:SrootPTH=/gsn/sw/nib/nib-r1f/nib_R1F:
    SUNW.sparc.SUNW,UltraSPARC-IIi-cEngine.SunOS    Macro           :CDefFile=boot.def:BootSrvA=150.132.90.20:BootS

    Just to try to explain the network layout:
    Site 1:
    Approx. 20 Sun servers (blade servers in a magazine) connected via the magazine backplane to a built in switch.
    The built in switch is connected to another switch (ordinary Netgear) to port 25. Port 1 on the switch is connected directly to one of the interfaces on the bootserver.

    Site2:
    Exactly the same config but connected to another interface on the boot server. The MAC addresses on the blade servers are unique within the site but exactly the same as the other sites.

    For different reasons we can not change the MAC addresses on the blade servers or use different boot servers for all the sites.

    I will keep on trying to configure the Netgear switch to not send the MAC address of the servers to the boot server.
    Maybe Tagged VLANs is a solution?

    Thanks again for the input I hope to hear more from you.

    // Mike



    0
     
    LVL 40

    Expert Comment

    by:jlevie
    Does site1 & site2 connect through a common switch to the boot server? DHCP routing of return packets for multiple interfaces will only work if each interface connects to a physically separate network.
    0
     
    LVL 1

    Author Comment

    by:MikaelEriksson
    No, there are one switch per site connected directly to the boot server interface. Each interface on the boot server is configured as a separate Class B network. That means if I unplug the ethernet cable from the boot server there are no physical connection between the different sites.

    When I attach the cable all broad-casting packages from the different sites will be received by the boot server and the arp table starts to update.
    Since the MAC addresses are the same between the sites the arp table contains the same MAC on multiple interfaces. My guess is that when the boot server replies to a DHCP request the DHCP reply ends up on the wrong interface, or even worse it recognize the MAC and look in the DHCP table and gives out an IP address already in use by another site.
    0
     
    LVL 40

    Expert Comment

    by:jlevie
    [arp -a | grep 00:80:37:0e:02:22]
    qfe5   172.22.8.34          255.255.255.255       00:80:37:0e:02:22
    qfe4   172.21.8.34          255.255.255.255       00:80:37:0e:02:22

    Indicates that the vendor of the MAC is Ericsson Business Comm. Is there a router in the path from the blade magazines to the boot server's interfaces?
    0
     
    LVL 38

    Expert Comment

    by:yuzh
    Check your /etc/netmasks file to see if it has netmasks for all your subnets.

    Did you run:
    /path-to/add_install_client  

    for the client?

    see http:#12469216

    Could you please check your boot server config agaist :
    http://www.bu.edu/systems-support/admin/network/sol/bootserver.html#configuring

    Full doc: http://www.bu.edu/systems-support/admin/network/sol/bootserver.html
    0
     
    LVL 1

    Author Comment

    by:MikaelEriksson
    jlevie, no there is no router between the blade servers and the boot server.

    yuzh,  I am not responsible for installing the blade servers. I'm not sure if the command "add_install_client" is used.
    If it's important to know if we are using that, I can find out for you. I think there is another set of scripts to run when installing the clients. Maybe one of the scripts calls for "add_install_client". Thanks for the links, I will have a look at them today.

    Don't you think my problem is because of the identical MAC addresses on the interface?
    I think this is a network error, not an OS configuration error.
    0
     
    LVL 38

    Expert Comment

    by:yuzh
    Are you trying to using the boot server perfrom OS installation for you client box? if it is
    the case, you do need to run "add_install_client"

    You can check the /etc/bootparams file (a text file) to see if your client box is defined in
    the file. The file have the infor about the client hostname, mac add, and boot image etc.
    (you can have multiple version OS images installed in your boot server).
     
    0
     
    LVL 38

    Expert Comment

    by:yuzh
    Are you sure the boot server has been setup as the boot server? please ask the person
    who setup the server to see what has been done.
    0
     
    LVL 1

    Author Comment

    by:MikaelEriksson
    The server has no "/etc/bootparams" file or "add_install_client".
    The boot function is custom made for this environment. The clients (blade servers)  does not install Sun Solaris, instead they install another OS that's also custom made. To install a client DHCP is used combined with a lot of macros and scripts.
    I understand that it's  almost impossible for you to help me with the setup of the OS and boot-install scripts because I don't know myself and can't tell you how it works.

    Anyway, I still think this is more of a network issue because of the MAC address problem. The problem is first noticed when the clients are requesting an IP address from the DHCP service on the boot server. The IP packages comes in to the boot server (via broad cast) but the clients does not receive any IP addresses. That's before the OS installation starts. When we succeed in getting an IP address from the boot server the installation works fine.

    I looked for some info about setting up multiple DHCP servers on the same host and force them to only listen to one interface and then create different arp tables for each interface. Unfortunately it does not seams possible to do with Sun Solaris 8. While looking for that info I discovered another possible solution at: http://ebtables.sourceforge.net/. That could be a solution to my problems.

    For now we will solve this problem by disable all the network interfaces on the boot server and just use one at a time, when we need to install/re-install the clients on the different sites. That's not a good solution in the long run, but works for now. Maybe ebtables is the way to go.

    Thank you all for your effort in trying to solve this!
    If you have more ideas, they are very welcome.
    0
     
    LVL 40

    Expert Comment

    by:jlevie
    Don't you think my problem is because of the identical MAC addresses on the interface?

    Yes, and I don't understand why the boot server see's the same MAC from multiple clients. Hence the question about the router. If there was a router in the network path to the blade servers the boot server would see the requests as coming from the router's MAC.

    However, if there's no router it becomes a bit more mysterious. You say that the network path is unique from a qfe interface to a bank of blade servers, yet the arp table shows the same MAC on multiple interfaces on the boot server. Unless the blade servers are defective in that more than one has then same MAC there shouldn't be any way to have what you've observed, if each qfe uniqely connects to a bank of blades.
    0
     
    LVL 1

    Author Comment

    by:MikaelEriksson
    There is no router between and the qfe connects directly to the built in switch (in the back plane) at the magazine. No connection between the sites exists, except for physical connection via the boot server. No routing take place between the interfaces.

    As I wrote before:
    "The MAC addresses on the blade servers are unique within the site but exactly the same as the other sites. "
    As far as I know there is only one arp table for all interfaces. This must be a problem since all sites are connected to the same boot server.
    0
     
    LVL 20

    Expert Comment

    by:tfewster
    Just to summarise:

    qfe5: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 8
            inet 172.22.0.1 netmask ffff0000 broadcast 172.22.255.255
            ether 8:0:20:b7:3e:c9  # This is unique

    [arp -a |grep  00:80:37:0e:02:22|grep qfe5]
    qfe5   172.22.8.34          255.255.255.255       00:80:37:0e:02:22

    - This extract does _not_ show the MAC address of qfe5, but of a system on the network qfe5 is connected to. So the arp table is fine: The system with IP address 172.22.8.34 is on the network that qfe5 is connected to.   The confusion arises because there's another (client) system with the same MAC address on another network, as show by:

     [arp -a | grep 00:80:37:0e:02:22]
    qfe5   172.22.8.34          255.255.255.255       00:80:37:0e:02:22
    qfe4   172.21.8.34          255.255.255.255       00:80:37:0e:02:22
    0
     
    LVL 40

    Expert Comment

    by:jlevie
    > The MAC addresses on the blade servers are unique within the site but exactly the same as the other sites.

    Which says to me that there's a problem with the blade servers or magazines. Since each blade server is in fact a different computer the MAC's must be unique across all blades on all sites.  Given what you see regarding the MAC's and the nature of the blade servers I'm wondering if it is a result of the magazines being the actual holders of the MAC's. That would make sense in that it allows a blade to be replaced with no other changes. Could it be that there's a config setting for each magazine that allows the base MAC to be set? If there is and each magazine hasn't been configured for a unique range of MAC's it would explain why each blade's MAC being unique within a magazine, but duplicated on another magazine.
    0
     
    LVL 1

    Author Comment

    by:MikaelEriksson
    jlevie,

    You are right. It's probably the magazines that is the holder of the MAC's. And unfortenately it must be that way because of redundancy.
    As I worte before, we are not allowed to change the MAC's of the blade servers. That's why I'm looking for a network equipment that solves this problem.
    0
     
    LVL 60

    Expert Comment

    by:gheist
    There is some boot rom parameter to assign unique addresses to multitail adaptors, i will look around for it, maybe you can dig it up bu typing printenv in boot console (Stop-A to get one at boot)
    0
     
    LVL 40

    Expert Comment

    by:jlevie
    If you can't change the base MAC on the magazines so that each blade slot has a unique ID I think there are only two possible solutions to DHCP. One would be a separate DHCP server for each magazine. The other would be to run an instance of dhcpd on each interface of the boot server. I haven't examined the dhcpd code to see what would happen on the reply packets when dhcpd is listening on a single interface so I don't know if modifications would be needed to force the replies back out that interface. Obviously, since the MAC's are duplicated each instance of dhcpd must have its own private lease database.

    Using multiple instances of dhcpd will solve the DHCP issue, but if other things on the boot server must talk to a blade you'll still run into trouble with the arp table. Only separate boot servers will solve that.
    0
     
    LVL 1

    Author Comment

    by:MikaelEriksson
    To use multiple dhcpd, one for each interface has crossed my mind but as you wrote, the problem will still exist for other services communicating with the blade servers. That’s why I want to know if there is a network product that could handle this.
    0
     
    LVL 40

    Accepted Solution

    by:
    I can't think of a way of solving the general problem just on the boot server. However, if you placed a two-port router (that can relay DHCP) between each magazine and the boot server it would solve the problem.

    I guess I don't understand why you can't change the base MAC for each magazine, assuming it works the way I suspect it does. Blades would still be unit replaceable with no config changes.
    0
     
    LVL 1

    Author Comment

    by:MikaelEriksson
    It's probably possible to change the MAC addresses in a technical way of looking, My reason is beacuse of other things.
    I will try the router sollution. In fact I have already started with that for a few days ago.

    Anyway, I should not keep you guys bussy with this question anymore.
    Thanks for all your input and I hope that I could help you someday.

    jlievie, thanks for the help with this and others of my questions!
    0

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone. Privacy Policy Terms of Use

    Featured Post

    Course: MongoDB Object-Document Mapper for NodeJS

    NodeJS (JavaScript on the server) is awesome, but some developers get confused about NoSQL when it comes to working in Node with MongoDB (NoSQL database). Do you need a better explanation of how to use Node.js with MongoDB? The most popular choice is the Mongoose library.

    Social networking sites such as Facebook have become an immensely popular way to connect with friends, coworkers, and relatives on the internet.  Most are very user-friendly and provide methods to e-mail, chat, share pictures and videos, and even se…
    This is an article about my experiences with remote access to my clients (so that I may serve them) and eventually to my home office system via Radmin Remote Control. I have been using remote access for over 10 years and have been improving my metho…
    Viewers will learn how to connect to a wireless network using the network security key. They will also learn how to access the IP address and DNS server for connections that must be done manually. After setting up a router, find the network security…
    After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…

    875 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    14 Experts available now in Live!

    Get 1:1 Help Now