Solved

Unable to ping until arp cache cleared

Posted on 2014-02-04
149
3,763 Views
Last Modified: 2014-02-16
Hello Experts

I have 3 servers connected to Cisco Catalyst Switch C2960 and this switch has uplink to one of the access switch and ultimately this access switch connected to our Core Switch.

Today I encountered an issue these server are unreachable from vlans other that its own.  I just cleared arp-cache and it started pinging.

Please can you help what could be the issue to avoid in future

Thanks
0
Comment
Question by:cciedreamer
  • 81
  • 38
  • 20
  • +2
149 Comments
 
LVL 14

Expert Comment

by:BlueCompute
ID: 39832077
Did you dump the arp cache before you cleared it?  What were the incorrect entries in the ARP table?  How did they get there?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39832092
Actually I was able to see the mac address of the server on Core Switch before clearing it.

Thanks
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39832167
This occurred again few minutes ago.

I cleared arp-cache and started working.

Please help me to solve this.

Thanks
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39832196
Just to wanted to let you know that we have HSRP running.

Thanks
0
 
LVL 26

Expert Comment

by:Soulja
ID: 39832985
You state you are running HSRP, yet you state the access switch is uplinked to the Core switch. Is there another core switch that isn't being mention.

How are these HSRP devices talking to each other. Is there a direct link between them?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39833004
Sorrt for the missing information.
Access Switch ha redudant links Core 1 and Core 2

And both core has direct link between them.
0
 
LVL 26

Expert Comment

by:Soulja
ID: 39833023
When this issue occurs do you notice any changes in STP or HSRP in your logs? How is your STP setup? Are you load balancing  VLANS between HSRP members, or just sending all to the primary switch?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39833126
The most amazing part is I dont recievr any logs on of the switch
Actually we using 2 core switches as Active and Standby scenario.
Currently our core 1 is active and all traffic  is coming to core 1. I have default confiugration of STP.

Thanks
0
 
LVL 26

Expert Comment

by:Soulja
ID: 39833158
Do you have a layer 3 or Layer 2 link between the cores? If layer two, are all vlans spanned across it?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39833171
Its Layer 2 and all vlans accross it.
0
 
LVL 26

Expert Comment

by:Soulja
ID: 39833242
I think the key here is you need to collect some logs to determine if there is anything going on during this ARP issue.
0
 
LVL 26

Expert Comment

by:Soulja
ID: 39833247
Can you post your hsrp config?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39833260
The problem has occurred now.

Since last 4 hours I was to pinging this servers from other vlan worksation and there was no problem
Once I stopped the ping the problme happened after 5 min.

If I'll make clear arp cache from on core 1 it will work,

Thanks
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39833312
This is HSRP config of that vlan

Core1

interface Vlan1 description 
 ip address 10.1.1.254 255.255.255.0
 ip route-cache flow
 standby delay minimum 20 reload 25
 standby 1 ip 10.1.1.1
 standby 1 priority 110
 standby 1 preempt

Open in new window


Core 2

interface Vlan2
 ip address 10.1.1.253 255.255.255.0
 ip route-cache flow
 standby delay minimum 20 reload 25
 standby 2 ip 10.1.1.1
 standby 2 priority 95
 standby 2 preempt
end

Open in new window

0
 
LVL 26

Expert Comment

by:Soulja
ID: 39833345
Why do you have two different vlan interface numbers and standby numbers?

Can you post the sh standby output from each switch?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39833363
Sorry I just copied the wrong the interface from Core 1

interface Vlan2
 ip address 10.1.1.254 255.255.255.0
 ip route-cache flow
 standby delay minimum 20 reload 25
 standby 1 ip 10.1.1.1
 standby 1 priority 110
 standby 1 preempt

Open in new window


sh standby Core 1

Vlan2 - Group 2
  Local state is Active, priority 110, may preempt
  Hellotime 3 sec, holdtime 10 sec
  Next hello sent in 1.143
  Virtual IP address is 10.1.1.1 configured
  Active router is local
  Standby router is 10.1.1.253 expires in 9.844
  Virtual mac address is 0000.0c07.ac02
  1 state changes, last state change 8w1d
  IP redundancy name is "hsrp-Vl2-2" (default)

Open in new window


sh standby Core 2

Vlan2 - Group 2
  Local state is Standby, priority 95, may preempt
  Hellotime 3 sec, holdtime 10 sec
  Next hello sent in 1.931
  Virtual IP address is 10.1.1.1 configured
  Active router is 10.1.1.254, priority 110 expires in 7.440
  Standby router is local
  43 state changes, last state change 8w1d
  IP redundancy name is "hsrp-Vl2-2" (default)

Open in new window

0
 
LVL 12

Expert Comment

by:Infamus
ID: 39833406
Why is vlan1 and vlan 2 virtual ip the same?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39833416
No it was typo mistake I have rectified it.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39833464
Please help me this case is repeatedly happening.
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39833555
Are you using any dynamic routing? if yes, which one?
0
 
LVL 26

Expert Comment

by:Soulja
ID: 39833556
Would it be possible to disconnect the second core and see if the issue reoccurs?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39833559
Disconnect the core from where I mean access switch or core 1 itself
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39833563
I don't have any dynamic routing I have 1 default static route
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39833575
when you do sh ip route, what is the difference between when it is working and not working?

Is it possible to do what Soulja recommended?

If not, can you have one workstation configure the gateway to actual IP of the VLAN interface and do a continuous ping and see what happens?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39833622
I have lot of workstation in this vlan 2 but no one facing any issues accept this 3 servers

One thing to be noted when do I continous ping to the server from workstation on other vlan I dont face the issue. Once stop the ping I appears again .
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39833636
The problem appeared again.
0
 
LVL 26

Expert Comment

by:Soulja
ID: 39833658
What are your logs saying. Can you post recent logs?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39833664
No logs Its empty.

Thanks
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39833666
did you try "sh log" on the switch?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39833671
Yes.there old logs nothing recent.
0
 
LVL 26

Expert Comment

by:Soulja
ID: 39833675
Are you logging to a syslog server?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39833681
Yes. I have seen the logs there but nothing related to this issue.
0
 
LVL 45

Accepted Solution

by:
Craig Beck earned 500 total points
ID: 39834310
Do you have a duplicate IP address anywhere?

If you check the logs on the active HSRP router what do you see?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39834893
Hi,

No I don't have duplicated IP address. No logs on Core Switches related with this issue.

I keep continuous ping to server, once stopped I have to clear arp cache to bring connection up.

There are 2 physical server having IP 10.1.1.14, 10.1.1.15 and 10.1.1.17. The HA setup. The 3rd IP is VIP (10.1.1.17)

I have to keep continuous ping to 10.1.1.17 keep connections live.



Thanks
0
 
LVL 45

Assisted Solution

by:Craig Beck
Craig Beck earned 500 total points
ID: 39835148
That really does sound like a duplicate IP issue.  Do you have Proxy-ARP running on any devices on that VLAN?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39835187
As I mentioned they 2 physical server and there IP's 10.1.1.14 and 10.1.1.15. They have HA and its connected to Virutal IP 10.1.1.17.

The client they see the VIP

Now one my core I am seeing 2 IP's ( 10.1.1.15 and 10.1.1.17) with same MAC.
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39835217
Seeing two IPs with the same MAC is fine, but seeing two MACs with the same IP would be bad.

So what I think is happening is the core is seeing the wrong MAC address mapped to 10.1.1.17 IP when the server is inaccessible from the other VLAN.

When you try to get to the server from a different VLAN and it fails, instead of clearing the ARP cache can you do:

show ip arp | include 10.1.1.

instead, then clear the ARP cache, and re-run the commmand and post the two results?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39835326
Hi,
Here is the result during the issue

Core1 # sh arp | in 10.1.1.17          
Internet  10.1.1.17               7   0014.5ebc.7466  ARPA   Vlan2

Core1 # sh arp | in 10.1.1.15        
Internet  10.1.1.17               7   0014.5ebc.7466  ARPA   Vlan2

I cleared the arp-cache on Core then I did sh arp again. I received same result as above.
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39835520
Can you post what I asked for and what 'actually' comes back with no edits?
Core1 # sh arp | in 10.1.1.15        
Internet  10.1.1.17               7   0014.5ebc.7466  ARPA   Vlan2
The output you posted isn't consistent with what you asked the switch, and I asked for the complete output for the 10.1.1.x subnet, not just the IP in question.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39835532
Ok I'll post it.

Meanwhile the server team have flushed the arp cache on both servers, so far its working I'll have to monitor than an hour I'll get back to you.

Thanks
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39835545
Ok that's a step in the right direction.  You shouldn't have to specifically flush the ARP cache though, ever.  If you do there might be something wrong with the way the vIP is working.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39835700
Sir,

The problem appeared again

Here is the result as you requested

Before flushing arp cache

Core1#sh ip arp vlan 2   
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  10.1.1.11               0   0015.1736.a65d  ARPA   Vlan2
Internet  10.1.1.9               34   001e.4a16.edf5  ARPA   Vlan2
Internet  10.1.1.14               5   0014.5ebc.0c84  ARPA   Vlan2
Internet  10.1.1.15              13   0014.5ebc.7466  ARPA   Vlan2
Internet  10.1.1.13               0   0015.1736.a3fd  ARPA   Vlan2
Internet  10.1.1.1                -   0000.0c07.ac02  ARPA   Vlan2
Internet  10.1.1.26               7   001a.6424.88b0  ARPA   Vlan2
Internet  10.1.1.27               0   0019.9903.c098  ARPA   Vlan2
Internet  10.1.1.24               0   0014.5efc.2328  ARPA   Vlan2
Internet  10.1.1.25               8   001a.6424.87e0  ARPA   Vlan2
Internet  10.1.1.30               0   0019.9903.bf69  ARPA   Vlan2
Internet  10.1.1.31              35   0019.9903.c093  ARPA   Vlan2
Internet  10.1.1.28               0   0019.9902.166c  ARPA   Vlan2
Internet  10.1.1.29               0   0019.9903.bfac  ARPA   Vlan2
Internet  10.1.1.19              34   0015.1736.a3fd  ARPA   Vlan2
Internet  10.1.1.17               5   0014.5ebc.7466  ARPA   Vlan2
Internet  10.1.1.22              34   0015.1736.a65d  ARPA   Vlan2
Internet  10.1.1.41              34   0030.057a.52ff  ARPA   Vlan2

Open in new window


After flushing arp cache

Core1#sh ip arp vlan 2          
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  10.1.1.11               0   0015.1736.a65d  ARPA   Vlan2
Internet  10.1.1.9                0   001e.4a16.edf5  ARPA   Vlan2
Internet  10.1.1.14               0   0014.5ebc.0c84  ARPA   Vlan2
Internet  10.1.1.15               0   0014.5ebc.7466  ARPA   Vlan2
Internet  10.1.1.13               0   0015.1736.a3fd  ARPA   Vlan2
Internet  10.1.1.1                -   0000.0c07.ac02  ARPA   Vlan2
Internet  10.1.1.26               0   001a.6424.88b0  ARPA   Vlan2
Internet  10.1.1.27               0   0019.9903.c098  ARPA   Vlan2
Internet  10.1.1.24               0   0014.5efc.2328  ARPA   Vlan2
Internet  10.1.1.25               0   001a.6424.87e0  ARPA   Vlan2
Internet  10.1.1.30               0   0019.9903.bf69  ARPA   Vlan2
Internet  10.1.1.28               0   0019.9902.166c  ARPA   Vlan2
Internet  10.1.1.29               0   0019.9903.bfac  ARPA   Vlan2
Internet  10.1.1.19               0   0015.1736.a3fd  ARPA   Vlan2
Internet  10.1.1.17               0   0014.5ebc.7466  ARPA   Vlan2
Internet  10.1.1.22               0   0015.1736.a65d  ARPA   Vlan2
Internet  10.1.1.41               0   0030.057a.52ff  ARPA   Vlan2

Open in new window

0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39835718
Just wanted to inform that when this occurs, the servers cannot ping the VLAN VIP but I can ping SVI L3 Vlan interface
0
 
LVL 45

Assisted Solution

by:Craig Beck
Craig Beck earned 500 total points
ID: 39835722
Ok can you get the ARP output from the servers when the issue happens?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39835769
Actually the server administration team is in germany it might get delayed and we are in KSA.

Different Time Zones
Is there any remote utility I can use to get the output of the server.
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39835806
If you can use PSEXEC or something similar you could execute the command on the server remotely.  Or, RDP to the server?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39835836
Its a linux server and I dont have access to this servers
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39835906
Ah ok so someone should be able to do that remotely via SSH.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39835940
I have got the credentials how I can arp output on linux
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39835980
I will arp result once the problem appear.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39836007
this is interface configuration on 2 servers.

Server 1

bond0     Link encap:Ethernet  HWaddr 00:14:5E:BC:0C:84
          inet addr:[b]10.1.1.14[/b]  Bcast:10.1.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:319772441 errors:0 dropped:0 overruns:0 frame:0
          TX packets:438580960 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:99762580571 (95141.0 Mb)  TX bytes:438533096008 (418217.7 Mb)

bond1     Link encap:Ethernet  HWaddr 00:14:5E:BC:0C:85
          inet addr:192.168.128.30  Bcast:192.168.128.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:4117467 errors:0 dropped:0 overruns:0 frame:0
          TX packets:94902 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:482387910 (460.0 Mb)  TX bytes:4648806 (4.4 Mb)

Open in new window


Server 2

bond0     Link encap:Ethernet  HWaddr 00:14:5E:BC:74:66
          inet addr:[b]10.1.1.15[/b]  Bcast:10.1.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:916867429 errors:0 dropped:0 overruns:0 frame:0
          TX packets:765935729 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:900507151734 (858790.5 Mb)  TX bytes:538632706090 (513680.1 Mb)

bond0:3   Link encap:Ethernet  HWaddr 00:14:5E:BC:74:66
          inet addr:[b]10.1.1.17  [/b]Bcast:10.1.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

Open in new window


Server 2
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39836124
Sir,
I came to conclusion
the problem is having with only server 2 that has 10.1.1.15 ( the interface details as mentioned above)
I have no problem with server 1

Here is the arp -n result from server 2 ( when problem occurred again)

sdm2ha:~ # arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
10.1.1.29                ether   00:19:99:03:BF:AC   C                     bond0
10.1.1.30                ether   00:19:99:03:BF:69   C                     bond0
10.1.1.22                ether   00:15:17:36:A6:5D   C                     bond0
10.1.1.11                ether   00:15:17:36:A6:5D   C                     bond0
10.1.1.24                ether   00:14:5E:FC:23:28   C                     bond0
10.1.1.18                ether   00:14:5E:BC:0C:84   C                     bond0
10.1.1.28                ether   00:19:99:02:16:6C   C                     bond0
10.1.1.81                ether   00:19:99:80:38:8A   C                     bond0
10.1.1.1                 ether   00:E0:81:B6:50:1B   C                     bond0

sdm2ha:~ #
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39836128
Here is the interface details on server 2

bond0     Link encap:Ethernet  HWaddr 00:14:5E:BC:74:66
          inet addr:10.1.1.15  Bcast:10.1.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:916867429 errors:0 dropped:0 overruns:0 frame:0
          TX packets:765935729 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:900507151734 (858790.5 Mb)  TX bytes:538632706090 (513680.1 Mb)

bond0:0   Link encap:Ethernet  HWaddr 00:14:5E:BC:74:66
          inet addr:10.1.1.21  Bcast:10.1.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

bond0:2   Link encap:Ethernet  HWaddr 00:14:5E:BC:74:66
          inet addr:10.1.1.23  Bcast:10.1.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

bond0:3   Link encap:Ethernet  HWaddr 00:14:5E:BC:74:66
          inet addr:10.1.1.17  Bcast:10.1.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

Open in new window

0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39836143
Once the problem occur I cannot ping all the interfaces IP of server 2

Thanks
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39836552
Someone correct me if I'm wrong.

I was reading through the post and found out that you assigned same IP (.17) on both of the servers, is that correct?

How is HA cofigured to use the VIP (.17)?

Can you remove .17 from server 2 and see what happens?  (meaning able to ping .15).

Also the difference on the ARP table from before and after is that this ip is no longer there after clearing the cache.

Internet  10.1.1.31              35   0019.9903.c093  ARPA   Vlan2

Do you know what this device is?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39836811
No after discussing with server team 17 is not assigned to server 1

Server 1 has no issues.

The only issue with server This is confirmed.

No I cannot remove .17 they some bonding on linux server.

No idea about this device may one of the workstation in this vlan
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39836849
Can you post sh run int gixx/xx on the switchport that server 2 is connected?

I know there are connected to multiple ports, right?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39836856
I have no access to this switch.
Only the information I have for switch is


WS-C2960G-24TC-L
Catalyst 2960G IOS 12.2(25)SEE

May be IOS version could be the cause ?

Thanks
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39836862
I'm thinking more of hardware NIC issue.

I was thinking about bpduguard or port security might be blocking since there are multiple IP's on the same MAC but that might not be the case.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39836886
Continuous ping to this server keeps connection running but that's not fix :)
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39836942
This is weird issue making us mad. Please I request to bear with me.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39836988
I have noted one more thing while doing continous ping to 10.1.1.17 the reply is 1ms but at some intervals it gives 10ms then again to 1ms
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39837014
Just a thought, is there a way to compare the NIC driver version between Server 1 and Server 2 if they are same hardware?

Any chance you can have server admin to update server 2's NIC driver?
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39837034
I think they've just got the bonding config wrong.  It happens quite a lot.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39837037
But it was working and no changes took place. This is started happening since yesterday.
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39837045
Yeah, I'm trying to think of all the possibilities since this thread is going nowhere.
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39837048
Hardware always goes bad

:P
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39837055
See once I stopped the ping to any of the ip of interface of server the problem appears

Thanks
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39837071
It looks to me like the two servers have different bonding configurations.

Maybe server 1 was owning the VIP until you started experiencing issues, and when server 2 assumed the VIP that's when you started to see issues.

Thinking about it, that just looks like a pure bond.  We can't tell how the bond is working from your outputs.  Is it 802.3ad, or active/passive, etc?

But yes, hardware could be causing an issue too.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39837219
How about Cisco IOS version ?
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39837247
I don't think this is a switch issue.  Granted though the version of IOS could do with an update, but generally it's fine.

I can't see any issues with that version of code related to ARP.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39838167
Please let me know your question that will help us to identify the problem I will forward to the server team.
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39839229
Since this is a physical server and have multiple NICs, they can disable one of the NICs one at a time and try to find out which NIC is having an issue.
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39840405
+1

I'd ask the server team to disable bonding and VIP and see if issues persist.
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 3

Author Comment

by:cciedreamer
ID: 39844369
Any more suggestion on this

The problem we cant reboot the servers and switches.
Rebooting the server required 45min downtime and we cant' afford that.

Anything troubleshooting we can do within the switch core and access switch.

(2 Servers- Active/Active Clustering)<--->(2960 SW)<--->(Access Switch)

                                                                                                |               |

                                                                                                |               |

                                                                                          Core 1         Core2
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39844380
You don't need to reboot the servers, just disconnect all NICs apart from the primary (or just leave one connected) and see if that has an effect.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39844384
Anything else I can check more apart from disconnecting NIC's Thanks for your understanding I appreciate your help.
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39844760
I think you should really try above recommendations first to isolate the possible major cause.

As Craig mentioned, you don't need any down time by removing one NIC from the bond.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39844982
Actually I recieved the wrong information from one of the adming

The configuration is Active/Passive.

They are saying duplicate IP address but I cannot see any duplication.

Thanks
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39845337
Please see the attached interfaces configuration. The cluster is Active/Passive.

Thanks for the help.
interfaces-configuration.docx
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39845457
Why do your servers have multiple IP addresses?

Can you show the routing table from the servers?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39845474
Really have no clue but the default gateway to our core HSRP ip
The servers have ip range under 10.1.1. subnet to make sure that if ip fails so they can use other.
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39845492
So the servers have multiple IPs on the 10.1.1. subnet in case one fails??

That's not how bonding is supposed to be used.  You assign one IP to the bond and it chooses which NIC to use.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39845514
This i was meant but may be could't explain properly.
So far we have done most troubledhooting on the swutching but didnt got any clue
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39845526
I really don't think it's a switch issue.  If normal clients are accessible there can't really be a switch problem.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39845537
I am totally going crazy and trying all the possible any ways.

Why pinging to server 2  IP addressed ( any of them from multiple ip's in 10.1.1.X) keeps the connection alive once stopped I cannot ping after few minutes and then necessarily I have flush arp cache for this vlan on Core1
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39845556
I don't know, but you really need to just connect each server with one NIC.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39845704
I have noted something.

I have observed the arp entry for one of the IP of server 2 on our Core 2

Is that normal ?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39845708
For example, server 2 has got the following ip addresses. But on core 2 I can see only 10.1.1.15

but on core 1 I can see them all

10.1.1.15 > 0014.5ebc.7466

10.1.1.17 > 0014.5ebc.7466

10.1.1.23 > 0014.5ebc.7466

10.1.1.21 > 0014.5ebc.7466
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39845722
That IP will be the primary IP address.  Things like broadcast traffic will come from that address so it's normal to see only that IP address especially if the services using the other IP addresses aren't being used for a period of time.

What would be bad is:

1] Seeing the same IP address with multiple MAC addresses.
2] Seeing the same MAC address via multiple ports on the core switches.

If [1] is true you have a duplicate IP address.
If [2] is true you have a loop somewhere.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39845744
Doing wireshark from worksation out this vlan and sending ping packets to the server during the problem will help ??
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39845800
It might do, but I think there'd be more worth in trying to check the NIC/bond config on the servers at this point.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39855689
We'll shutdown the server 2 and lets see if that make any differnce.
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39855768
Are the servers redundant, or just the NICs in each server?  What I mean is, does server1 provide the same services as server2?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39855773
Yes. they do.
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39855798
Ok so how is the redundancy configured?  Are the servers active/passive?  Is the redundancy facilitated by DNS?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39855820
They are active/passive. Its DNS redundancy.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39855821
The problem appear again after switching off server 2 :( :(.

I cannot ping to server 1 readl IP and VIP
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39855849
What if I decrease the arp timeout to 20 min ??
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39855887
No you shouldn't need to touch ARP timers at all.

Why does each server have multiple IP addresses on the bond?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39855895
After shutting down the server 2 all the IP addresses were on server2 now switch to serever 1. Bold are the IP addresses switched

Internet  10.1.1.14               1   0014.5ebc.0c84  ARPA   Vlan2
Internet  10.1.1.18               4   0014.5ebc.0c84  ARPA   Vlan2
Internet  10.1.1.17               4   0014.5ebc.0c84  ARPA   Vlan2
Internet  10.1.1.23               4   0014.5ebc.0c84  ARPA   Vlan2
Internet  10.1.1.21               4   0014.5ebc.0c84  ARPA   Vlan2
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39855901
Ok so let's just confirm...

Does server1 have any of the same IP addresses as server2?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39855909
The active server takes this IP addresses

Internet  10.1.1.17               4   0014.5ebc.0c84  ARPA   Vlan2
Internet  10.1.1.23               4   0014.5ebc.0c84  ARPA   Vlan2
Internet  10.1.1.21               4   0014.5ebc.0c84  ARPA   Vlan2
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39855914
Now we will shutdown server 1 and make 2 up
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39855944
Ok so I don't understand how each server knows that it's active or passive?

Your issue is duplicate IP addresses as I said right at the start of this thread.  Here you've just said that the server picks-up the addresses...
Internet  10.1.1.17               4   0014.5ebc.0c84  ARPA   Vlan
...and earlier you showed us that same IP address with a different MAC...
Internet  10.1.1.17               5   0014.5ebc.7466  ARPA   Vlan2
So, we need to understand how the active/passive is configured between the servers?  If I'm correct there's no way you can do this with a plain bond on the NICs - they're server independent so each server doesn't know about the other's configuration.  Therefore it could easily be the case that the switch is seeing the wrong server with the IP address, or even worse - both servers with the same IP at the same time.
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39855951
Also there's just something that's sitting at the back of my mind, but are you running ARP inspection on the switches?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39855971
They are running veritas cluster software for manging the clustering .

the respective active server takes this IP 10.1.1.17,23.21

Switch shows the mac address of the active  server

Server 1 ( Active) 0014.5ebc.0c84      
10.1.1.14
10.1.1.17
10.1.1.18
10.1.1.21
10.1.1.23

Server 2 ( if active)  0014.5ebc.7466
10.1.1.15    
10.1.1.17
10.1.1.21
10.1.1.23


If the server is on standby the switch will show only the real ip and mac address associated with it.
Real IP addresses of servers

Server 1 10.1.1.14 and 10.1.1.18    0014.5ebc.0c84          
Server 2 10.1.1.15                           0014.5ebc.7466

Once
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39855975
Ok so why does server1 have two real IP addresses but server2 only has one?

The servers are supposed to be the same.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39855993
Sorry this is also a logical IP addresses not real.
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39856011
So why are there two redundancy/logical IP addresses?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39856012
ARP inspection is disabled on our Core

Core1#sh ip arp inspection vlan 2

Source Mac Validation      : Disabled
Destination Mac Validation : Disabled
IP Address Validation      : Disabled

 Vlan     Configuration    Operation   ACL Match          Static ACL
 ----     -------------    ---------   ---------          ----------
    2     Disabled         Inactive                      

 Vlan     ACL Logging      DHCP Logging
 ----     -----------      ------------
    2     Deny             Deny
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39856016
Each IP addresses having some services associated with it. It's called as PAC Radiology Server.
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39856033
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39856039
I turned on the arp inspection on this vlan

The problem appeared, I cleared arp cache on Core and this it didn't work.

I disabled the arp inspection again and cleared cache then it work. started pinging.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39856059
We got something the problem is with both servers.
Whichever of them is active, it will stop pinging I have tried by shuting down the server one by one.
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39856093
You don't want ARP inspection - that's why I was asking if you had it enabled.

In the link I posted the problem seems to be similar, so you should try the solution in that post.

After working with Veritas and Redhat, we have concluded that the problem is our router does not reply to "ARP REPLY" type Gratuitous ARP packets.  This is the default behavior for VCS.  We had to modify the 'online' script within VCS so that it sends out "ARP REQUEST" type Gratuitous ARP packets.

I would also check on the core switches that you don't have the no ip gratuitous-arp command configured.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39856104
no ip gratuitous-arp  should be configured or not configured ??
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39856105
Ok We'll the try the above solution also
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39856128
You don't want gratuitous ARP to be turned off.  So if you issue the following command...

show run | inc gratu

...and get nothing back you are ok.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39856135
We tried above solution but same thing.

 ip gratuitous-arp is disabled.
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39856148
I think you would want gratuitous arp to be enabled on the switch.

conf t
ip gratuitous-arp
end
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39856171
but it was working before without this command.
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39856259
It should be enabled by default, so you won't see that line in the config.  So I think it's enabled, not disabled.

Gratuitous ARP will help to update the ARP table on other devices when the passive server takes the IP address from the other server.  Without this you won't be able to see the newly-active server for a few minutes once it assumes the IP address as the MAC-IP mapping will be wrong.

AFAIK the switch will only respond to ARP-REQUEST packets, not ARP-REPLY packets like the servers are sending, so you need to modify the server-side to send ARP-REQUEST packets instead.

As to why it was working before - I don't know.  What changed?
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39856487
Have you tried to disable NIC's one at a time?

If everything was working fine and nothing was changed then I only can think of hardware failure.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39856495
i tried shuting the server 1 by 1 and both has some problem
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39856716
Here is the latest update to this problem.

Today we tried shuting down the server 1 by 1 to see if the this problem remain.Yes, firstly we turned off Server 2 so all the resume transeffer to Server 1, there was a problem after a certain period I cannot ping the real and logical ip addresses of server 1

then we did vice versa, Server 1 down and Server 2 up, still the same problem.

Then we run both the servers at the same time ( active/passive) same problem.

Just for more  info.

Server 1 has real ip address 10.1.1.14

Server 2 10.1.1.15

Logical IP addresses : 10.1.1.18, 10.1.1.21, 10.1.1.23 ( services associated to it )

VIP: 10.1.1.17

When one of the server is active it takes the above logical addresses and core switch show arp table to 1 single mac addresses

and standby show arp entry on core for only realy ip address

For eg: If server 1 is active, here is the arp table on core

Internet  10.1.1.15               0   0014.5ebc.7466  ARPA   Vlan2

Internet  10.1.1.18              96   0014.5ebc.7466  ARPA   Vlan2

Internet  10.1.1.17              96   0014.5ebc.7466  ARPA   Vlan2

Internet  10.1.1.23              96   0014.5ebc.7466  ARPA   Vlan2

Internet  10.1.1.21              96   0014.5ebc.7466  ARPA   Vlan2

Server 2 arp entry on Core

Internet  10.1.1.14              0  0014.5ebc.0c84  ARPA   Vlan2

And if the server 2 is active, this will be arp table

Internet  10.1.1.14              0  0014.5ebc.0c84  ARPA   Vlan2

Internet  10.1.1.18              96  0014.5ebc.0c84 ARPA   Vlan2

Internet  10.1.1.17              96  0014.5ebc.0c84 ARPA   Vlan2

Internet  10.1.1.23              96  0014.5ebc.0c84 ARPA   Vlan2

Internet  10.1.1.21              96  0014.5ebc.0c84 ARPA   Vlan2

Server 1 arp entry on Core :

Internet  10.1.1.15               0   0014.5ebc.7466  ARPA   Vlan2
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39856738
Have you discuss about this with your server admin from the link Craig mentioned?

Here is the modification we had to make to /opt/VRTSvcs/bin/IP/online:

     `$arping -A -c 5 -I $Device $Address`;
 
to
 
    `$arping -U -c 5 -I $Device $Address`;
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39856745
They have tried but no luck.

Within the same subnet, we are not facing any issue.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39856757
What if I create static arp entry on Core Switch ??
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39856773
You can't create a static ARP entry if you have redundancy between the two servers - the MAC will change everytime the active drops and the passive takes over.

If you don't see the issue on the same subnet this is definitely an ARP problem - it has to be.  Can you show us the config for the SVI on each core?
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39856785
Core 1

interface Vlan2
 ip address 10.1.1.254 255.255.255.0
 ip route-cache flow
 standby delay minimum 10 reload 25
 standby 2 ip 10.1.1.1
 standby 2 priority 110
 standby 2 preempt
end

Open in new window


Core  2

interface Vlan2
 ip address 10.1.1.253 255.255.255.0
 ip route-cache flow
 standby delay minimum 20 reload 25
 standby 2 ip 10.1.1.1
 standby 2 priority 95
 standby 2 preempt
end

Open in new window


I have been also reading forums what if add static arp entry on Cluster Server 1 and 2 for default gateway ??
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39856826
I have found something very interesting.

On my core switch, the arp entry of HSRP VIP ( 10.1.1.1)  is 0000.0c07.ac02

the arp entry on active server is also 0000.0c07.ac02


but when the problem occur the arp entry on server changed to 00:E0:81:B6:50:1B
After tracing this mac, this is one of the Access Switch in the network.

??? Any clues
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39856853
That MAC address is a TYAN device - they typically make server and PC motherboards.

If you use the show mac address-table | inc 501b command that should tell you which port on the access switch it's connected to.  Find out what that device is and disconnect it if you can, then test again.
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39856859
I have been also reading forums what if add static arp entry on Cluster Server 1 and 2 for default gateway ??
You could do that, but you shouldn't have to.  I don't think it would help anyway.

It's technically ok to do this, although I suggested not doing it on the switch for the servers, as the switches use a virtual MAC address for the standby address, but the servers aren't.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39858678
Voila !! I add static arp entry on severs pointing default gateway (HSRP IP) and the problem disappeared.

As I mentioned above I have found something very interesting.

On my core switch, the arp entry of HSRP VIP ( 10.1.1.1)  is 0000.0c07.ac02

the arp entry on active server is also 0000.0c07.ac02


but when the problem occur the arp entry on server changed to 00:E0:81:B6:50:1B
After tracing this mac, this is one of the Access Switch in the network.


Thanks
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39859222
Basically I didn't understand its clustering issue or switch ??

Thanks
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39859261
It's not the switch as far as I'm concern.
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39859264
Did you look for the Tyan device as Craig suggested?
0
 
LVL 45

Assisted Solution

by:Craig Beck
Craig Beck earned 500 total points
ID: 39859279
I think you're just masking a problem.  What you've done is force the servers to ignore the MAC address of the other 'rogue' device when it talks using the 10.1.1.1 address.  This is more of a workaround than a permanent fix, and it doesn't indicate that the switch is causing a problem.

You really should find the device with the 00:E0:81:B6:50:1B MAC address and disconnect it, then test without the static ARP mapping.
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39859303
I totally agree and that's why I asked.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39859352
I deleted the static arp.
Then the problem occurred.

I traced that mac address and found on one of the access switch. I disabled the port.

The client started pinging to server this without flushing cache or static entries.

Let me investigate about this device.
0
 
LVL 45

Assisted Solution

by:Craig Beck
Craig Beck earned 500 total points
ID: 39859384
That device must have the 10.1.1.1 address on it - or it's running Proxy ARP.
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39859501
What I found so far that a non-manage swithc ( TP-Link) is connected to the access switch port and a host with problematic mac address is connected to this unmanaged switch.

I am trying to find this host now in the network. Anyhow I disabled  this port.
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39859528
That's scary...

:P
0
 
LVL 3

Author Comment

by:cciedreamer
ID: 39862527
Well found it. Their was a PC connected to this non-managed switch and the gateway (HSRP) IP was assigned to this PC.

Very scary :(
0
 
LVL 45

Expert Comment

by:Craig Beck
ID: 39862579
Excellent, so I was right with my first and second post then!

Oh well, glad you got it sorted now :-)
0
 
LVL 3

Author Closing Comment

by:cciedreamer
ID: 39862672
Thanks craigbeck and infamous for your help. Finally puting an end to this thread.

Just a summary of the solution-

- During the problem, I collected the arp result on the server.
- And noticed the server was pointing to wrong to mac address
- We started the tracing the mac on in network.
- Once the device was found we noted the PC was having a duplicated HSRP IP configured.
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39862911
Glad it is resolved.

Good Luck!!!
0
 
LVL 12

Expert Comment

by:Infamus
ID: 39862974
On the second thought, you should enable port security and bpduguard on the access ports if not enabled.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Every server (virtual or physical) needs a console: and the console can be provided through hardware directly connected, software for remote connections, local connections, through a KVM, etc. This document explains the different types of consol…
Shadow IT is coming out of the shadows as more businesses are choosing cloud-based applications. It is now a multi-cloud world for most organizations. Simultaneously, most businesses have yet to consolidate with one cloud provider or define an offic…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now