Solved

Network packet storm created by backup over routed Cisco network

Posted on 2008-10-20
7
592 Views
Last Modified: 2012-05-05
We have an obvious problem in that whenever a backup is initiated using IBM Tivoli and its new super fast VTL tape library, network problems occur due to some sort of packet / broadcast storm.

We have recently moved this backup service to a building 100 metres away connected by 1GB fibre in order to serve as an immediate off-site backup for the time being until we sort out replication. The backup server now sits on another subnet and must now traverse  a 'vlan interface' default gateway in order to do all of its backup to all the servers in the computer room (when the server was local a gateway was obviously not required).

Our HQ building and the large branch office building are connected via a routed vlan. The vlan at the branch building is vlan 29 (on the 172.29 subnet) - and the HQ building here with the server room is on vlan 20 (172.20 subnet). In order for this backup server to reach vlan 20 from its access vlan 29 Cisco 3560 switch, is must now go to the 172.29.1.252 address of the vlan 29 interface before leaping onto vlan 20. The vlan 29 interface is configured on our core switch (comprising 6 stacked 3750s as one logical unit) in the HQ central computer room. The core switch acts as the server vlan db - whereas the branch buildings floor switches are the client vlan db.

So as soon as the backup starts off from access vlan 29 addressing all of its servers to backup on vlan 20 all hell breaks loose on the vlan 20 network - which unfortunately includes some WAN bridged connections on the same subnet which now disconnect because, being on the same subnet and not partitioned off, also take on board all this excess traffic, swamp the slow bridged 2mb link, and drop off.

Of course I will partiion off these critical bridged connections in response to this now - but we never had this problem before when the backup server was plugged into access vlan 20 at HQ straight on the same subnet as the servers. So what has inter-vlan routing go to do with changing things so dramatically like this? And how can I solve it?
0
Comment
Question by:klwn
  • 4
  • 3
7 Comments
 
LVL 10

Assisted Solution

by:kyleb84
kyleb84 earned 500 total points
Comment Utility
I'm afraid that an issue such as this requires quite a bit more attention and testing before a conclusion can be made as to what the direct issue is.

If I were you, I would isolate the backup server even more by putting it on VLAN 21 for example, setting the VLAN priority to 2 across all the switches, and making a VLAN 21 path all the way to the server, with another network card in it.

So the backup server would be 172.21.0.10 and the main server's second network adaptor's IP would be 172.21.0.20.

This would completely isolate all backup traffic, and since the VLAN's priority is 2, all other VLAN traffic would default to Pri-3 and even when the backup is processing it should not take as much of a hit on your network.

--------------------

I doubt its a broadcast storm, but it's possible there could be a routing loop.

Some things I would look at are:
- The bytes/second rate on the server's NIC, and compare that to your switches uplinks/trunks along the data path.
- Plug a laptop into switches along the way, doing a port mirror + short packet capture on each switch for it's trunking ports. Looking for duplicate packets + decreasing TTL values.

If it's not a routing loop issue, time to look at bottlenecks / interface issues:
- Is it GbE all the way through?
- Are there packet errors on any interface on the way?
- Check for large % of CRC errors on interfaces
- Check for duplex mismatches

Lastly double check your configs:
- Though its easier to do it via VTP, maybe manually configure each switch for VLAN membership
- Disable ip routing on switches that don't need to perform it
- Use more VLANs!


Good luck, and let me know how you go!
0
 

Author Comment

by:klwn
Comment Utility
Good advice and yes segmenting traffic into more VLANs is the way to go except, I still don't understand why this has just started after moving to a different VLAN?
And with your example about creating a dedicated VLAN to the server - I'm afraid it's not just the one server - but a Computer Room full of servers (about 80 of them). So the TSM backup starts from VLAN 29 and inter-vlan routes immediately to VLAN 20 where all these servers (not just the one server) get backed up very very quickly (due to the new VTL). Unfortunately this same VLAN 20 extends throughout the rest of the building and extends out further to some bridged WAN connections affecting all these also. Basically anything on VLAN 20 is hammered! And this seems to have happened since moving the backup server to another VLAN (or it might be also due to the new VTL library making backup streams run very quickly thus possibly intensifying network traffic in doing so).
Can you think of a quick fix to control this traffic for now? The backup server is connected via a single 1GB NIC, can I lower the traffic priority coming from the 3560 port it's plugged into for example until I find quality time to get to the bottom of it?
0
 
LVL 10

Accepted Solution

by:
kyleb84 earned 500 total points
Comment Utility
Hmm since its on VLAN 20 with many other devices, you'd still have to isolate the backup server to reduce it's priority across most of the network.

A few "get by" solutions that might help:

Move the backup server back to the other servers?
Create another VLAN all the way to the Backup server from the core switches?
Create ACLs in the routing switch to set the 802.1p value of the backup traffic?

----------------------------

From a diagnostic point, when you say the "WAN" devices become swamped with data because they're on the same VLAN as the Backup server - how do these WAN bridges connect to the network in relation to the Backup server?

How does this VTL work?
- Does it use broadcast/multicast to transfer the data?
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 

Author Comment

by:klwn
Comment Utility
Thanks for the suggestions... because we are looking for a work around, we are certainly going to move the backup server back onto the same network as the other server thus keeping this pack on the same VLAN 20. Although theoretically inter-vlan routing should have nothing to do with the increased traffic unless you think otherwise - putting it back on the same VLAN just puts it back to its original location where problems were not prevalent.
A VTL is a 'Virtual Tape Library' - which is a bunch of spinning hard disks pretending to be tapes - hence the much quicker backup times now that we have recently changed from backing up to tapes to backing up to disks - which is why I think the traffic is so much more intense. However, if things go back to normal by bringing the backup back onto VLAN 20 then I guess its nothing to do with this factor.
Your other suggestion of putting the backup server onto its own VLAN surely wont eliminate the fact that it would still have to inter-vlan route out to VLAN 20 where all of the 80 servers it has to backup exists? So why would you do that?
I think changing the 802.1p value of the backup traffic would have merit if it means lowering the priority of the packets coming out of the switchport. For example our backup server sits on switchport Gi0/3 - and I would like to lower the priority of all the packet streams coming out of that port - what command would you put in to achieve this? I don't think you would need an ACL, just a direct command on that switchport but an example would be nice.
The WAN bridges are simply connection to other very small branch office containing not more than 5 people. At the time it seemed prudent just to bridge these connection which effectively puts them on the same subnet which of course I see as something 'not' to do in future. They have never been affected before - but we have never had such a busy backup traffic problem before either. Nevertheless - lesson learnt there despite the small numbers involved.
0
 
LVL 10

Expert Comment

by:kyleb84
Comment Utility
Hmm you seem to have misunderstood some of my questions..

"From a diagnostic point, when you say the "WAN" devices become swamped with data because they're on the same VLAN as the Backup server - how do these WAN bridges connect to the network in relation to the Backup server?"

Where I was going with this is that if the WAN devices are plugged directly into the switches that handle this backup data it still could be the case of the switches are being overutilised.

"How does this VTL work?
- Does it use broadcast/multicast to transfer the data?"

I know what VTL is, I want to know it's method of communication when backing up.
TCP? UDP? Multicast?

Where I was going with this is that if it uses multicast, a configuration error could lead to network chaos.


0
 

Author Comment

by:klwn
Comment Utility
After all this the problem has been solved!

It ended up being the network card rather than anything else. The backup administrator had plugged the backup server into the switch whilst it was still configured as a team pair - but the without the second pair member. Also while he was there he upgraded the NIC with the latest drivers and hey presto, absolutely no problems exist at all now.

After this even I will want to be looking for a respectable and easy to use network monitoring software allowing me to be more proactive in finding problems like this - so the search starts from now. Nothing too expensive if I can avoid it.
0
 
LVL 10

Assisted Solution

by:kyleb84
kyleb84 earned 500 total points
Comment Utility
Wow, how odd is that?

OpenNMS is a favourite of mine for SNMP/Net monitoring.


Have a look at:
http://en.wikipedia.org/wiki/Network_monitoring_comparison

Its got a feature list / Licensing info of some common NMS applications.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

The worst thing when starting a new job is when the previous Network Administrator left behind no documentation. How do you get into the devices? If you've been in this situation or just accidently mistyped your password, this article will hopefully…
This tutorial will go through the steps required to write a script that will back up the configuration settings of a HP-ProCurve switch. You will need to get the following things to follow this tutorial: Telnet Scripting Tool e.g. TST10.exe …
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now