• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1833
  • Last Modified:

Backup very SLOW in Linux using CommVault

I have a Dell 2850 running CentOS 3.9 (a flavor or RH Linux). I am running a SCSI PowerVault 124T - LTO-3 backup system on a different Dell Server
All the connections are via a Gb switch and the eth. cards a re set to 1000/Full Duplex
So from the Linux Server via the switch to the backup server via SCSI to the tape drive.
I am backing up 6 Servers - all the Windows servers perform exceptionally (very fast) - now when I'm going to Linux - It takes forever.
Monitoring the switch port via Solarwinds monitoring, shows over 150mb/s in the first 15 min of the backup job - to drop to a sustained rate of 20-25 mb/s for the next 18 hours or so.
In the other side - this makes no sense because what I back up is about 250GB worth of data.

From the backup server to the Linux box - I can copy files at a rate of 10 Mb/s.
Attached is a graph of the entire backup process seen at the switch port
Appreciate any help in this matter.

Thank you.

Bandwidth.png
0
atvrocks
Asked:
atvrocks
  • 7
  • 3
  • 2
  • +1
1 Solution
 
squifyCommented:
Hi,

I would try and set the NIC's on the client, backup server, switch to Auto Negotiate. The client's NIC will not transmit data until it can negotiate transfer at 1000MB. By changing them all to Auto Negotiate you will get data transferring regardless of the negotiated speed.

Also look at the type of data you are coping. If all the files are many small files and are at varying path depths you will get slower backup speeds. If you have a few large files you will get good speeds. You also have issues if you only have one client running at 10MB/s transferring to a LTO3 tape library that must be run at 80MB/s. If the tape library does not get data at that rate consistently it will think the job has finished and rewind the tape. This is called shoe-shining and will ruin your tapes and drives in a matter of months.

Let me know how changing the network speed helps.
0
 
atvrocksAuthor Commented:
The current network setup is:
---------------------------------------------------------------------
Settings for eth2:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: umbg
        Wake-on: d
        Current message level: 0x00000007 (7)
        Link detected: yes
------------------------------------------------------------------------

The mii-tool - known to show the wrong speed on CentOS, shows the following:
-------------------------------------------------------------------------
eth2: negotiated 100baseTx-FD flow-control, link ok
-------------------------------------------------------------------------

The speed on the switch port is set for 1000/Full - Autonegotiation enabled

What about the "flow-control"  - do I need that?

In this specific Linux system I do backup a lot of inages - all the same size - small files. I have yet to destroy a tape - this is been running for a while now.

Thank you


0
 
squifyCommented:
Does the linux server have it's mounts spread across multiple partitions/LUNs? If so then i would up the readers used on the properties of the default subclient to equal the number of LUNs presented to the server. You can also increase the number of streams going to a storage policy so that multiple clients can backup at once. You might also want to look at multiplexing the data by a factor of 2 initially.

I would also setup perfmon on the MediaAgent to measure disk read/write times, network bandwidth, and CPU load to check that there are enough resources on the MediaAgent to cope with your backups. You can also look at using Synthetic backups for the linux client so that your backup window will be reduced during full backups.

Hope these suggestions help.
0
Nothing ever in the clear!

This technical paper will help you implement VMware’s VM encryption as well as implement Veeam encryption which together will achieve the nothing ever in the clear goal. If a bad guy steals VMs, backups or traffic they get nothing.

 
atvrocksAuthor Commented:
The Linux server is Stand Alone - no external storage
I did try to set the NIC adn the switchport to 1000/full/no auto .... without any change in performance. It is back to autonegotiation.
I was looking at multiplexing and for that I need a license from CommVault. I did contact them already - I have to see what the price is ....
I took the liberty and ran a sniffer (tethernet) and I found out that when I run teh backup I get this:
----------------------
0.003498 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=33580 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.003505 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=35040 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.003513 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=36500 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.003519 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=37960 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.003523 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=39420 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.003527 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=40880 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.003532 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=42340 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.003536 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=43800 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.003541 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=45260 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.003567 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=46720 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.003859 192.168.0.201 -> 192.168.0.204 TCP 2712 > 40451 [ACK] Seq=0 Ack=4380 Win=65535 Len=0
  0.023360 192.168.0.204 -> 192.168.0.201 TCP [TCP Previous segment lost] 40451 > 2712 [ACK] Seq=63144 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043345 192.168.0.204 -> 192.168.0.201 TCP [TCP Previous segment lost] 40451 > 2712 [ACK] Seq=188096 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043352 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=189556 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043358 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=191016 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043363 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=192476 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043368 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=193936 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043372 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=195396 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043377 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=196856 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043381 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=198316 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043385 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=199776 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043390 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=201236 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043394 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=202696 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043398 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=204156 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043410 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=205616 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043414 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=207076 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043418 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=208536 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043422 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=209996 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043427 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=211456 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043432 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=212916 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043436 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=214376 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043442 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=215836 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043446 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=217296 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
  0.043451 192.168.0.204 -> 192.168.0.201 TCP 40451 > 2712 [ACK] Seq=218756 Ack=0 Win=5840 [TCP CHECKSUM INCORRECT] Len=1460
------------------------------------------

And I get a lot of them. .201 is the backup server - .204 is the Linux server
Again - I get this ONLY when the backup is running.

Thank you
0
 
atvrocksAuthor Commented:
Anyone - why would I get all these checksum errors ?
0
 
atvrocksAuthor Commented:
been a while ... no expert comments ?
0
 
JIEXACommented:
Well, I would start not from the NIC/network tricks, but from the Linux machine configuration.
When you copy all the files that are wished (~250GB, you've said) to /dev/null or to another directory, or creating a tar file, what is the data rate?
What is on the Linux machine /var/log/messages? Probably, some filesystems might be corrupted. Or too fragmented.

The TCP checksum errors are currently ignored by me. It might be because someone talks raw packets, or NIC driver is wrong, or MTU of source/destination/hub mismatch (or hub/switch bug on packets re-parsing). Try to set up the same MTU for all 3 components...
0
 
JIEXACommented:
You might wish also to play with hdparm on the Linux machine to check the DMA/UDMA/PIO modes and transfer/read rates.
0
 
atvrocksAuthor Commented:
hdparm works only on SATA/ATA HDD's .... not on Raid 5
0
 
NE-GrouseCommented:
eth2: negotiated 100baseTx-FD flow-control, link ok
-------------------------------------------------------------------------

The speed on the switch port is set for 1000/Full - Autonegotiation enabled

I've seen this a lot, on both Win & Linux.  If you can't set the NIC to 1000/Full and turn off AutoNeg.  You may want to consider a different brand of NIC.  You can also try nailing the switch port 1000/Full.



0
 
atvrocksAuthor Commented:
That didn't change anything .... considering the fact that there are thousands of small files - do you think that multiplexing will solve the problem?

In order to do that I have to upgrade the software -
0
 
JIEXACommented:
1000Mbps connection is very fragile. Try to replace cables to better ones, and to shorter ones. I have no currently another ideas.
0
 
atvrocksAuthor Commented:
Upgrading the backup software fixed the problem ..... MULTIPLEXING
0

Featured Post

Granular recovery for Microsoft Exchange

With Veeam Explorer for Microsoft Exchange you can choose the Exchange Servers and restore points you’re interested in, and Veeam Explorer will present the contents of those mailbox stores for browsing, searching and exporting.

  • 7
  • 3
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now