How to get SCSI HDD throughput > 200 MB/sec sustained

We have a backup solution which uses D2D2T. (VMWare VM backup using VCB to tranfer the VMDK files to the VCB backup proxy server for backing up to tape).

I need to increase the Hard disk throughput of the staging disks on the proxy/backup server to greater 200MB/sec sustained (combination of sequential read & sequential write) to ensure our backups are completed in the backup window.
The source data is on a SAN which is fibre attached to the backup server. We have confirmed we can read data to the staging disks (Holding tank) at 110 MB/sec sustained throughput.
We have also confirmed we can write to the Backup tape at sustained speed of 110 MB/sec from the staging disks. (Tape drives rated at 240MB/S native)
However, since the backup server can multiplex jobs and be reading and writing from the staging disks at the same time, the staging disks need to be able to achieve 220 MB/sec when reading from san/writing to tape at the same time. At the moment, when reading & writing at the same time, reading & writing throughput are both halved or slightly less (as expected).

Now for the tech stuff....
OS = windows 2003 server.
Dual 2000 MHz CPU + 2GB Memory. Neither CPU or memory is being significantly loaded during file transfers.
Staging disks are 3 x Hitachi Ultrastar 300 GB SCSI Ultra 320 15K with 16 MB cache each rated at 72~123 MB sustained TP configured on "software raid0" attached to a LSI LOGIC Ultra320 PCI-X 133 MHz scsi card.
The scsi card for the staging disks is on a dedicated 133MHz PCI-X bus (fibre card to san and scsi card for the tape device are also on separate dedicated 133MHz pci-x buses and are both rated at 133MHz so all buses are running at best speed).

Raid 0 has been selected as redundacy of the staging area is not needed and to my knowledge, striping will give the best IO throughput. I have tried different raid block sizes 64K up to 8 MB but with only marginal improvements.

Windows performance monitor shows that disks are constantly being accessed , disk queueing on the staging disks is around 20 and disk access time is over 20 ms so these are definately reaching their limits.

Before investing in additional hardware,my questions are:
1) Will adding more disks (identical models) into the raid0 increase the throughput considerably ?  I have seen articles both for and against this. Has anyone actually proven this, not just a theoretical answer please.
2) If yes, to 1) above, theoretically, how many more of the same model disks would be needed to get to 200MB/sec sustained?
3) Will changing the Ultra 320 scsi card to a Ultra320 SCSI hardware raid controller (hardware raid versus software raid) increase throughput considerably? Remember neither CPU or Memory is being heavily utilised on the server during backups and i believe this is the biggest difference between HW & SW raid other than data loss prevention in power failure which is not a concern.

We do not wish to change our backup topology or invest in 100K worth of large SSD HDD's so please keep answers within the scope of the questions being asked.

Thank you




gsawanAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

nappy_dThere are a 1000 ways to skin the technology cat.Commented:
Three things you have not mentioned; How much data is being backed up with I assume VCB, how long does the backup take and what is the current allotted backup windows time?
0
nappy_dThere are a 1000 ways to skin the technology cat.Commented:
Sorry you did mention VCB :)
0
gsawanAuthor Commented:
actual data about 1T
backup takes 14h via Ethernet to MSL4048 LTO 4 tape drives
we want to reduce it to 6h or less.
0
Redefine Your Security with AI & Machine Learning

The implications of AI and machine learning in cyber security are massive and constantly growing, creating both efficiencies and new challenges across the board. Check out our on-demand webinar to learn more about how AI can help your organization!

nappy_dThere are a 1000 ways to skin the technology cat.Commented:
why over ethernet?  If you have fibre connectivity to your SAN, present all of the LUNS to the VCB server to backup the guests.  This would be faster.

This is what I do and I backup 400GB of guests overnite in under 3 hours.

I have a stand alone LTO3 drive.  I just attached a 500GB USB drive to my server just for the VCB.
0
gsawanAuthor Commented:
Ethernet is the current setup, we want to get away from Ethernet and use Fiber instead,
we will present all the LUNs to proxy servers when using FC, that is what we will setup
we did some tests in the lab and the MAX throughput we can export the vmdk files to the proxy server is about 120GB/h and we also want to speed this up----> HOW ?
and we also want to speed up the read throughput from the VCB mountpoint (holding tank) to the tape , MAX throughput we can can is only 110MB/s and we want it at least 200MB/s or more---> HOW ?
What is the speed you get when you read from 500 GB USB drive to the tape ?


0
nappy_dThere are a 1000 ways to skin the technology cat.Commented:
I have never performed a read test.  Let's assume that because of specs of USB 2.0, I am getting 480Mbs less 25% for overhead and bottlenecks.

However, based on this scenario below, it's fairly quick.
  1. My backup starts at 8pm nightly.  I use ArcServe 11.5sp4
  2. i have set my backup to pause for 90 minutes so that VCB can copy the guests to my 500GB USB drive
  3. The backup then begins to write to tape at 9:30pm.  I then receive an email notice at 11pm informing me of backup competion.
0
nappy_dThere are a 1000 ways to skin the technology cat.Commented:
Oh is your Ethernet connectivity on a separate VLAN that rest of your network? Are you doing this over 1oo or gigabit?

If you could, try teaming NICs together to help improve throughput.
0
gsawanAuthor Commented:
pls dont talk about Ethernet as it is no longer relevant, we did a test with Fiber cable attached directly to the SAN and the VCB Proxy server ruuning Arcserve 12.5.
that is the setup we are talking about .
pls refer back to the original post for our question, thanks!
0
nappy_dThere are a 1000 ways to skin the technology cat.Commented:
1) Will adding more disks (identical models) into the raid0 increase the throughput considerably ?  I have seen articles both for and against this. Has anyone actually proven this, not just a theoretical answer please.
  • Adding more disks I don't believe will improve the throughput significantly if the system bus is not able to support it.
2) If yes, to 1) above, theoretically, how many more of the same model disks would be needed to get to 200MB/sec sustained?
3) Will changing the Ultra 320 scsi card to a Ultra320 SCSI hardware raid controller (hardware raid versus software raid) increase throughput considerably? Remember neither CPU or Memory is being heavily utilised on the server during backups and i believe this is the biggest difference between HW & SW raid other than data loss prevention in power failure which is not a concern.
  • Changing the raid controller will not make a difference either.  U320 is U320 using software or hardware raid.
What is the server your are connecting to?  Is it a brand name such as Dell, IBM or HP? if so what is the model?

Is it a clone utilizing an ASUS or some other brand's motherboard?

You may think my questions are trivial but considering that I feel you have left out some info here and I DO have this functioning and functional well, I just need further info to assist you.




0
gsawanAuthor Commented:
Just to clarify, our current backup strategy has been to use ethernet, However, we are testing FIbre backup solutions now (due to significantly increased data volume in the near future) and the original posting, hardware specs and throughput is based upon direct FC backup from our SAN to the VCB Proxy server.  Neither the throughput to the VCB server or Throughput to our tape server is presenting any problems (both have more to give if we need it).  It is purely the Holding tank which is local to the VCB Proxy which needs higher throughput to be able to read from the SAN and write to the tape at the same time at a combined throughput of 200+ MBytes/sec.

The current SAN data is about 1T/Byte which we can successfully backup in around 7~8 hours via FC.
ie
export from SAN 1000GB/100MB/sec = 2 H 46 mins (approx)
VCB snapshot job initialisation wait time: 4 mins/VM =  1 H 20 mins (cannot be avoided)
Backup to tape = 1000GB/100MB/sec = 2H 46 mins (approx)
Total  = 6 H 52 mins

However, the SAN data will soon be increased to about 2.5 TB, increasing our backup time to 15.5 Hours even on FC.  If the holding tank can read @ 100 MB/Sec & write @ 100 MB/sec simultaniously , our backup time will be 8.5 ~ 9 H (inside our 12 H window)

As per my original posting.

Proxy server buses are PCI-X (64Bit) running at 133 MHZ (theoretical throughput is 1GB/Sec). All cards are on different buses and all cards are also PCI-X 133 MHz. Therefore the bus and card throughput is not a bottleneck.

Also, Server is a Dell Poweredge 2650.  The onboard raid controller has been bypassed for disks 3 ~ 5 using a dedicated U320 SCSI card. (The onboard Perc 3/Di controller was limiting throughput to holding tank to 50MB/Sec so we eliminated this)

So from your response above,
1) Bus speed is not the bottleneck, so if you accept that the HDD throughput IS the slowest link, will adding more HDD to the Raid0 improve throughput.. ie More spindles, more heads, more cache = more  throughput  in a stripe configuration ? . But has anyone actually shown this works before i purchase more disks.

3) Yes, Ultra320 is U320..but we are not getting U320 speeds and it is not a bus problem, so based upon the specs of the HDD provided , should we be able to get better throughput than we have advised ? Is the software raid likely to be adding addition overhead to the throughput that would be eliminated using hardware Raid0 (even though cpu and memory usage is not high anyway)

many thanks

0
nappy_dThere are a 1000 ways to skin the technology cat.Commented:
In my situaion on another server, my current HP DL 360 with 4TB DAS, RAID 5, it takes 12 hours to backup all data using multiplexing with an LTO3 drive with ArcServe 11.5.

Theorectically, you "should" get better throughput if you increase the number of drives in your stripe set.

 Are you including verification time(if being done) in that backup schedule?

Are you doing a full backup or differential backup nightly?

I ask this because it is a scenario that made me change my backup methodology a few years back when I was exceeding 2TB of data backup.

That being said, I changed to a differential Mon-Thurs and Full backup Fridays.

Now, if you do not get the throughput you desire, I would suggest that you consider doing data snapshots, to cheap disks and then backing up in the day. This way your production environment can continue while you perform full daily backups if this is the case.
0
andyalderSaggar maker's bottom knockerCommented:
Adding disks ought to speed it up, I moved from 4*15K to 6*15k and saw some improvement (not with VCB, I was just trying to get a LTO4 to run at full speed to test it). You might actually be better replacing the SCSI disks with big cheap SATA ones since the areal density is higher.

Multiplexing jobs to a disk pool can cause its own problems since you have several jobs at once the heads have to seek all over the place for the read stage. Here's an experiment to do on the current 3 disks - break the RAID and set them up as 3 seperate disks and split the job into three and see how it affects total performance. It should be about the same so then you know if you had 4 disks you could split it into 4 and get a performance improvement.

What's performance like if you skip the staging area and backup direct to tape?
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
gsawanAuthor Commented:
Hi AndyAlder

Approximately what % improvement did you see moving from 4 disks to 6 disks ? was it significant enough to justify the cost of the disks? I do not mind the outlay costs if the return makes it justified.
As for direct SAN to Tape, we backed up some data from a test NTFS volume on the SAN to tape @ 120+ MB/sec but VCB backup of VMFS volume requires a hold tank on the proxy server.

Nappy d:
4 TB in 12 H, thats 92MB/sec to LTO3 average, not bad. But if you were forced to put that through a staging volume like VCB, this would be 24 hours.
We are doing full backups each night in line with IT policy guidelines set by our parent company. Verification time has not been included in our backup estimates as it is not mandatory in the policy.
Nightly snapshots & daily backups may be an option if we cannot get the throughput we desire but this would not be as desirable as current policy requires that previous nights backups must go offsite next morning.
0
nappy_dThere are a 1000 ways to skin the technology cat.Commented:
Keep in mind, with my backup speed, this is a RAID 5 configuration, not RAID 0.
Something else to consider, what about adding a second LTO4 drive into your tape Library. Create a second backup group in ArcServe.  Split your backup and multiplex over two drives.  This may also help reduce your backup window and cost for storage.

Technically, your backups "can" go off-site the next day. It's could just be a matter of scheduling your pickups with your off-site storage company.  Many companies do pickups and drop-offs 24/7.  So, if you decide on a solution like this, keep this in consideration.


0
andyalderSaggar maker's bottom knockerCommented:
Performance improvement was nearly linear, but I was using a MSA2000 SAN rather than software RAID. Would it improve speed if you did a couple of NTFS/agent backups to tape while doing the staging for VCB.

Are you sure you can read from the SAN at 240MBPS? You've only got 120MBPS from SAN to tape so far although I expect the SAN is busy doing other things.


0
nappy_dThere are a 1000 ways to skin the technology cat.Commented:
I think this should be a split of points as we have both provided a means to assist the author.
0
andyalderSaggar maker's bottom knockerCommented:
I agree.
0
gsawanAuthor Commented:
I am happy to split the points. Both provided good input.

I will be implementing more disks via a seperate HP FC SAN next month to improve the throughput of the staging area.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Storage

From novice to tech pro — start learning today.