Dropped packets from switch to Appliance on VLan after restart

Hello:

I needed to restart our Telephone system last week and I noticed some problems with IP packets getting dropped on our Voicemail VLan - 101.  This problem is pretty consistent and the fix is pretty consistent; but there must a better and more permanent fix.  The telephone appliance is directly connected to this HP ProCurve GB Ethernet Switch.

Here is the scenario:

1.  There are times that I must restart/power off our Voicemail appliance.  Before I properly power off the appliance, I make sure that I can ping all appropriate telephone IP addresses.
          a.  Main Voicemail Config IP = 172.21.1.21(VLan-101)
          b.  Voicemail IP for Unified messaging = 172.21.1.23 (VLan-101)
          c.   Corporate Web Interface for management = XXX.XX.XX.20 (VLan-1)
          d.  I run continuous pings on the above IP addresses.

2.  I then proceed to properly power off the voicemail appliance.
         a.  I wait 1 minute and then power the appliance back on.
         b.  The approved documentation that I have has been verified by the voicemail appliance vendors (re-verified recently as well).
         c.  I have used the documented approach to power off the appliance, many times in the past without incident.

3.  It may be important to note that no dropped packets occur with the other VLans.
         a.  VLan-1
         b.  VLan-100

4.  What happens when I turn the Voicemail appliance back on, is the following
         a.  After 1 minute I can ping the VLan-1 management IP address XX.XX.XX.20.
         b.  I cannot ping the 2 VLan-101 IP addresses even if I wait longer than 5 minutes.
         c.  I used to be able to be operational and ping everything in about 4 minutes after I turned on the appliance.
         d.  I need to actually leave a voicemail to an extension to actually begin getting replies on the 172.21.1.23 IP address.
         e.  So I guess leaving a voicemail helps the data packets to transfer.
         f.  But 172.21.1.21 is still not pingable.

5.  Then I must un-plug the ethernet cable for 172.21.1.23, from the HP Switch and plug in the 172.21.1.21 cable to another port on the switch.
        a.  Then move the 172.21.1.21 to another VLan-101 port on the same switch until I get replies.
        b.  At this point, when I plug in 172.21.1.23  back into a VLan-101 port that does not reply but the previoulsy problematic 172.21.1.21 address still is replying back.
        c.  now, I have to again leave a new voicemail message and that triggers the 172.21.1.23 to begin replying from pings.

6.  My question is why are VLan-101 data packets being dropped after the appliance is turned off and then turned back on?
       a.  No other devices on the other Vlans has this problem.
       b.  Even the management ethernet connection on VLan-1, from the same appliance, comes up quickly and consistently.
       c.  Should I clean out the ARP tables or any cached information on that switch?  
       d.  Could it be becasue of teh Quality of service assigned to VLan-101?
       e.  This dropped packet problem was not always the case; but the QoS has always been on that switch before I began working here.

The configuration of the HP ProCurve Switch is shown below:

hostname "MKE"
time timezone -6
module 1 type J9147A
interface 1
   speed-duplex 100-full
exit
ip default-gateway XX.XX.XX.254
ip routing
vlan 1
   name "DEFAULT_VLAN"
   untagged 5-46,48
   ip address XX.XX.XX.249 255.255.255.0
   no untagged 1-4,47
   exit
vlan 100
   name "WAN"
   untagged 1,47
   ip address XX.XX.100.3 255.255.255.248
   exit
vlan 101
   name "Voice"
   untagged 2-4
   qos priority 5
   ip address 172.21.1.254 255.255.255.0
   exit
timesync sntp
sntp unicast
sntp server priority 1 XX.XX.XX.X 3
ip route 0.0.0.0 0.0.0.0 XX.XX.XX.254
no autorun
password manager
password operator

MKE(config)#
LVL 1
PkafkasNetwork EngineerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

PkafkasNetwork EngineerAuthor Commented:
Once the telephone system is up and running there are no dropped packets.  It is just getting it back online is the tricky part.
0
nociSoftware EngineerCommented:
Pulling & re-inserting a cable seems to hint on ARP issues.
as pulling cables makes entries disappear and need them to be rerequested.  (waiting 1 minute will do the same btw).
OTOH as the equipment is not changed the MAC address should still be the same, so it should not make a difference.

Now did you look at the forwarding tables on the switch. Do they change when the PBX is powered down?
or forward to a different port?
0
PkafkasNetwork EngineerAuthor Commented:
I did not look at the forwarding tables on the switch.  how can I check that out?

So then I can see the before and after affect.
0
What were the top attacks of Q1 2018?

The Threat Lab team analyzes data from WatchGuard’s Firebox Feed, internal and partner threat intelligence, and a research honeynet, to provide insightful analysis about the top threats on the Internet. Check out our Q1 2018 report for smart, practical security advice today!

nociSoftware EngineerCommented:
under status / mac table in the web interface?
0
PkafkasNetwork EngineerAuthor Commented:
I really do not want to restart the PBX appliance unless I have to.  Is there another way to investigate this problem?
0
nociSoftware EngineerCommented:
When there is trouble do make a dump from all traffic to a copy port. and log from that copy port using wireshark and save to files. Then the traffic patterns can be analysed for missing bits or wrong info.
The copy port can be setup beforehand, and connected to a PC with wireshark installed , and wireshark can be started when the problem is happening.
Note that the copy port cannot be used for normal use.

Also you need to have the trouble active to notice anything.
When you have the trouble active take a note of timestamps when you do actions. the recording of the network traffic should show what is lacking. like missing ARP or wrong info.
Also you need a list of relevant MAC & IP addresses (of the system where pings come from, the PBX, router etc.).
0
PkafkasNetwork EngineerAuthor Commented:
Is there any type of maintenance that can be done such as updating the version of the switch, restarting the switch, ir clearing out some old cached entries?
0
nociSoftware EngineerCommented:
For a solution you need to know the problem.  
The problem you describe doesn't exactly translate to something sane wrt. network components.
Probing & recording the actual traffic during an outage means one can try to find the root cause.

ie. After leaving a voicemail..., you can access a service on the PBX..., ie. the PBX misses something during reboot, or fails to start something. So how can the switch compensate for it. And are you sure that leaving the voicemail is the trigger, or is waiting some time sufficient.
Another thought, how about the amount of storage on the PBX, missing some processes up front might be caused by not being able to log records.. but that is just wild guessing ... Is there new firmware for the PBX?
Another thing,it supposedly used to work more straight forward,  what was changed to the PBX surroundings between the last time it worked normally and when now.
0
PkafkasNetwork EngineerAuthor Commented:
Yes, we can think of theoretical scenarios until the cows come home.  I want to focus this questions on what I can do from a networking perspective.  Perhaps clear the ARP table or something like that.

1.  This firmware is very new (only 1 year old).
         a.  The vendor stated that new firmware does not come out very often.
         b.  They also stated that they have not seen this problem with this firmware elsewhere.
         c.  Not to say that the firmware on this box is not the problem; but, it did work about 6 months ago.

2.  It appears that this is a networking issue though.  
          a.  Perhaps it i the switch?

3. It is not a time out issue; because I can trigger the problem and fix it with consistent results.
         a.  If  wait 5 minutes or longer or if I try it sooner before I leave a voicemail.
         b.  Even well past 2 hours unless I un-lug the port and then plug it in to another port and leave it for a little while I get surprised that the 172.21.1.21 IP gets replies.
         c.  But the other IP address (that needed a new voicemail) is now off-line until I leave another viocemail.
                i. Right on que.

I do want to focus on the ethernet switch, is there anything that I can do for maintenance?

Or perhaps connect a different switch to the equation.  See how a different switch operates.
0
nociSoftware EngineerCommented:
The biggest problem i see, is that a network switch has NO knowledge on what data is passes, it just uses the MAC address to forward data. ARP is an IP construct,not a Ethernet construct. So is ping (ICMP) etc.

So leaving a voice mail is just a bunch of packets going through (either analog lines/ voip)  which only changes state on the PBX.
not on a switch. if Leaving a VM makes a difference then it must be something on the PBX that changes.
(it appearantly causes a restart of some service on the PBX)

Now the switching a cable from port A to B on a switch does change state on a switch, it causes the forwarding for a specific (set of) MAC addresses going from port A -> B and it changes state on ports (Port A Down, Port B up).
Also it changes state on the PBX port, which goes DOWN and UP again. causing routing tables to get updated and possibly more execution of scripts/interface management code (DHCP etc.)
my estimate/expectation (read wild guess) is that the state change on the PBX is more invasive & changing then the switchpart.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
PkafkasNetwork EngineerAuthor Commented:
For a trouble-shooting perspective, during the time that 172.21.1.21 was not ping-able, we did take a laptop and assigned an IP address of 172.21.1.19.  

We then tried pinging the PBX IP address by directly connecting the laptop to the PBX.  In the process we obviosly disconnected the PBX from the ethernet switch.  The results were that the PBX was pingable directly from the lapotp and hence the PBX's ethernet card was not to blame.

We then plugged the PBX back into the switch; but 172.21.1.21 as still not pingable via the switch.  Until we started swapping ports and un-plugging cables and reconnecting them.

What is interesting is that once the connection has been established then there are no problems.  I wonder if the same problems would happen with another HP ProCureve Ethernet switch if it was configures the same way?
0
nociSoftware EngineerCommented:
Well if it is the same switch but not the same port then the setting is port related, not switch global settings.
So what is the difference between a working & nonworking port.

did you check for full/half duplex issues? (or auto setup for speed & duplex).
The duplex might get miscommunicated (10 & 100Mbps interfaces), 1Gbps both ends should not have to suffer from this.
Some cards/switches do not always setup duplex correctly, but that would be noticable with massive traffic not with ping.
Ping seems to work then, but more heavy traffic will get disrupted if duplex is half on one side and full on the other.
0
PkafkasNetwork EngineerAuthor Commented:
The ports are auto-negotiate.  And the ports that do not work and do work are the same.  If I eventually un-plug the cable and then re-lug into another port they may or many not work.  But then when I plug it back into the original port it may start working if I wait a little bit or un-lug them and then plug them in to another port.

The same ports that did not work work later after I re-plug the same ethernet cable into them.  And the plugs that were not used before will eventualy work if I try the same process.  Once I get it working and pingable I leave it alone; but, I have noticed that once it is working (ping-able) and I un-plug from a rot and then plug the ethernet acable into a different port on teh same VLan the pings still work.

It seems as if the PBX has difficulty getting started after a restart.  But when it does it is more consistent. If the same thing happens on another switch or if the same problem doe snot happen on another network switch wil tell a lot don't you think?
0
nociSoftware EngineerCommented:
is this an 100 Mbps interface? if so try to setup for Fixed speed, with fixed duplex just to remove possible issues with autonegotiation.  with both the same on both sides.
0
PkafkasNetwork EngineerAuthor Commented:
I will try some of this out with another switch that is near by.  I doubt creating a fixed speed will help since the setting already auto-negotiate to the same.  I am not sure what the  PBX requires anyway or how to change that setting on the PBX.

Like I mentioned before this used to work better the previous time I restarted the PBX like 6 months ago.
0
nociSoftware EngineerCommented:
Auto negotiate might fail on 10 & 100 Mbps, especialy if not both sides are auto.
Speed will match but duplex might not.
This will cause the HalfD  side stop sending packets when the FD side starts to send.
This will cause a lot of PacketTooShort + CRC errors on the FD side.
and a higher collision count on the HD side (hard to notice).

So impact depends on traffic paterns, also auto-negotiate might not fail allways.
Disconnecting & reconnecting cables will restart a new negotiation giving new chances to get it right or wrong.
0
PkafkasNetwork EngineerAuthor Commented:
I am going to try this out later with another switch and I do not want to keep from rewarding the points to the participants.  If there was a duplex mismatch I believe there would be a world of problems and the problems would be consistent not intermittent.

Thanks,
0
nociSoftware EngineerCommented:
duplex mismatch will express when there are bursts of traffic esp if the full duplex side is sending.
It Expresses itself as slowness in communication. TCP has a bigger problem with this then UDP.
Voice traffic may be bad from HD side to FD side. (This is because the HD side stops sending when a packet from the FD side comes in, the FD side doesn't need to pause before sending to allow HD side to complete a transmission.

So they might be wel hidden until a certain thershold is passed.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Networking

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.