asked on

Network packet storm created by backup over routed Cisco network

We have an obvious problem in that whenever a backup is initiated using IBM Tivoli and its new super fast VTL tape library, network problems occur due to some sort of packet / broadcast storm.

We have recently moved this backup service to a building 100 metres away connected by 1GB fibre in order to serve as an immediate off-site backup for the time being until we sort out replication. The backup server now sits on another subnet and must now traverse a 'vlan interface' default gateway in order to do all of its backup to all the servers in the computer room (when the server was local a gateway was obviously not required).

Our HQ building and the large branch office building are connected via a routed vlan. The vlan at the branch building is vlan 29 (on the 172.29 subnet) - and the HQ building here with the server room is on vlan 20 (172.20 subnet). In order for this backup server to reach vlan 20 from its access vlan 29 Cisco 3560 switch, is must now go to the 172.29.1.252 address of the vlan 29 interface before leaping onto vlan 20. The vlan 29 interface is configured on our core switch (comprising 6 stacked 3750s as one logical unit) in the HQ central computer room. The core switch acts as the server vlan db - whereas the branch buildings floor switches are the client vlan db.

So as soon as the backup starts off from access vlan 29 addressing all of its servers to backup on vlan 20 all hell breaks loose on the vlan 20 network - which unfortunately includes some WAN bridged connections on the same subnet which now disconnect because, being on the same subnet and not partitioned off, also take on board all this excess traffic, swamp the slow bridged 2mb link, and drop off.

Of course I will partiion off these critical bridged connections in response to this now - but we never had this problem before when the backup server was plugged into access vlan 20 at HQ straight on the same subnet as the servers. So what has inter-vlan routing go to do with changing things so dramatically like this? And how can I solve it?

SOLUTION

kyleb84

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

klwn

ASKER

Good advice and yes segmenting traffic into more VLANs is the way to go except, I still don't understand why this has just started after moving to a different VLAN?
And with your example about creating a dedicated VLAN to the server - I'm afraid it's not just the one server - but a Computer Room full of servers (about 80 of them). So the TSM backup starts from VLAN 29 and inter-vlan routes immediately to VLAN 20 where all these servers (not just the one server) get backed up very very quickly (due to the new VTL). Unfortunately this same VLAN 20 extends throughout the rest of the building and extends out further to some bridged WAN connections affecting all these also. Basically anything on VLAN 20 is hammered! And this seems to have happened since moving the backup server to another VLAN (or it might be also due to the new VTL library making backup streams run very quickly thus possibly intensifying network traffic in doing so).
Can you think of a quick fix to control this traffic for now? The backup server is connected via a single 1GB NIC, can I lower the traffic priority coming from the 3560 port it's plugged into for example until I find quality time to get to the bottom of it?

ASKER CERTIFIED SOLUTION

kyleb84

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

klwn

ASKER

Thanks for the suggestions... because we are looking for a work around, we are certainly going to move the backup server back onto the same network as the other server thus keeping this pack on the same VLAN 20. Although theoretically inter-vlan routing should have nothing to do with the increased traffic unless you think otherwise - putting it back on the same VLAN just puts it back to its original location where problems were not prevalent.
A VTL is a 'Virtual Tape Library' - which is a bunch of spinning hard disks pretending to be tapes - hence the much quicker backup times now that we have recently changed from backing up to tapes to backing up to disks - which is why I think the traffic is so much more intense. However, if things go back to normal by bringing the backup back onto VLAN 20 then I guess its nothing to do with this factor.
Your other suggestion of putting the backup server onto its own VLAN surely wont eliminate the fact that it would still have to inter-vlan route out to VLAN 20 where all of the 80 servers it has to backup exists? So why would you do that?
I think changing the 802.1p value of the backup traffic would have merit if it means lowering the priority of the packets coming out of the switchport. For example our backup server sits on switchport Gi0/3 - and I would like to lower the priority of all the packet streams coming out of that port - what command would you put in to achieve this? I don't think you would need an ACL, just a direct command on that switchport but an example would be nice.
The WAN bridges are simply connection to other very small branch office containing not more than 5 people. At the time it seemed prudent just to bridge these connection which effectively puts them on the same subnet which of course I see as something 'not' to do in future. They have never been affected before - but we have never had such a busy backup traffic problem before either. Nevertheless - lesson learnt there despite the small numbers involved.

kyleb84

Hmm you seem to have misunderstood some of my questions..

"From a diagnostic point, when you say the "WAN" devices become swamped with data because they're on the same VLAN as the Backup server - how do these WAN bridges connect to the network in relation to the Backup server?"

Where I was going with this is that if the WAN devices are plugged directly into the switches that handle this backup data it still could be the case of the switches are being overutilised.

"How does this VTL work?
- Does it use broadcast/multicast to transfer the data?"

I know what VTL is, I want to know it's method of communication when backing up.
TCP? UDP? Multicast?

Where I was going with this is that if it uses multicast, a configuration error could lead to network chaos.

klwn

ASKER

After all this the problem has been solved!

It ended up being the network card rather than anything else. The backup administrator had plugged the backup server into the switch whilst it was still configured as a team pair - but the without the second pair member. Also while he was there he upgraded the NIC with the latest drivers and hey presto, absolutely no problems exist at all now.

After this even I will want to be looking for a respectable and easy to use network monitoring software allowing me to be more proactive in finding problems like this - so the search starts from now. Nothing too expensive if I can avoid it.

SOLUTION

kyleb84

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial