Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1549
  • Last Modified:

Arcserve 11.5 - Scheduling Multiple backup jobs - Order of execution

Using Arcserve 11.5 on W2K3 with Ultrium 2 LTO library.

I have a GFS scheme that does daily differentials with Friday Fulls. Backing up 4 NAS boxes plus numerous servers and the odd workstation. I've scheduled 8 jobs which together backup everything needed in approximately 70 hours which just nudges into monday morning. The separate jobs mean that if a job fails, it can be rerun in isolation without running another 70 hour job!

The jobs are scheduled at hourly intervals with the first at 1700 and the last at 2359 and are in a specific order (most critical data first). The daily differential jobs usually take up to about an hour or an hour and a half so sometimes a job is scheduled to start whilst a job is in progress and arcserve logs the fact that the job couldn't start every hour until it is able to begin. On weekdays this isn't a problem. The jobs will run in order in most cases. Usually there is never more than one job waiting to run whilst a job is in progress.

My issue is with the full backups. The first Job takes about 15 hours and after 7 of those have elapsed there are 7 backups waiting to run whilst the first is in progress and this gives me various problems.

1) sometimes, some of the jobs never start all weekend.
2) when the first job finishes the next in the queue is not necessarily the one that runs next and my careful job ordering goes right out of the window.

So my question is:
Faced with multiple jobs that have passed their scheduled start time, in what order will arcserve start the waiting jobs and why will it sometimes give trying to start some jobs altogether?
  • 3
  • 3
  • 3
1 Solution
There is no way to determine which job will get picked first when there are multiple jobs ready to go. In any case all jobs should be run.

Set the Job Engine in debug mode and perhaps something will show up in the activity log indicating why those jobs are not running.

If not already installed, install ARCserve 11.5 SP1 and Device Support Update 6, oh and if your NAS is OnStor there is one for that also.

With ARCServe 11 (and pretty much any version since 2000 or perhaps 6.5), you can set scheduling priority for the nodes.  This means that you can have a SINGLE job, rather than many, for all of your hosts, and you can set the most important ones to go first.  Additionally, since you're using v11.5, you have the option for multiplexing your backups (having more than one client stream data to you at one time, which significantly decreases the backup window, and maximizes network and tape drive utilization).  LTO drives (or pretty much any tape drive, for that matter) like to run at full speed, and constantly have data streaming to it at full speed.  if your single client hiccups or studders, and doesn't send data to the drive as fast as the drive wants it, the drive has to stop, rewind, reposition, and then begin writing again.  This can add a very significant amount of time to the individual backup job.  Having multiple servers sending data at the same time generally resolves this slowdown...which may decrease that 70 hour backup window.

How much data are you backing up nightly, and what kind of networking equipment do you have (100Mb, 1000Mb, etc?)

Source Priority is for multiple targets within the same job and will not effect the order multiple jobs from within the queue run.
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Yes, that was my point.  Combine all of the jobs into one main job, then set the source priority, so that the important systems get backed up first.

If a specific target is missed, just do a one-off backup job after the main job is done, rather than auto-rescheduling it to run the full job again.
jahboiteAuthor Commented:
Thankyou for your input folks. Very helpful.

I think you're right that there's no way to choose which job will get picked first when there are multiple jobs that have reached their scheduled start time. I'm still trying to work out if arcserve chooses based on certain criteria, such as the lowest Job No. - I haven't been able to find anything written down so far so I'll collect some more data and see if I can see a pattern.
Thanks also for putting me on to the update 6.

The reason I've got multiple jobs is basically because we've moved from arcserve2000 and a 100GB/tape native AIT library (and not backing up anywhere near as much as we needed to) to 11.5 and 200 GB/tape native LTO2 library.  I'm still getting to grips with the whole thing and having multiple jobs means more flexibilty when things go wrong.
Having said that, I think that I've ironed out most of our issues which basically arose because of the huge amounts of superfluous data, so I'm thinking that your suggestion of 1 job is more practicable than it was at the start. I'm going to try it next full backup!

I'm regularly archiving data as it becomes redundant so a full backup is currently just over 1.3TB in 83 hours!

I'm also very interested in the multiplexing since you suggest the tape library will thank me for it like a dog with two dicks!  I'd kind of dismissed the idea of multiplexing partly because I thought the NIC on the backup server (which due to constraints is also storage for 0.6TB of live data) couldn't cope and partly because I don't currently know enough about it.  But I'm going to make the effort to find out more.
Most of the targets are on 1000MB NICs and we've got 2 1000MB switches (as well a two 100MB ones) - I'm going to have to make sure of this and perhaps put all targets through the same switch if they're not already.
The one backup target which has a scsi pipe straight to the tape library is reported as backed up at a rate of more than 1000MB/min. Other targets are lucky to achieve 300MB/min so if the NIC on that machine could handle it, I could possibly see a 45 hour reduction in time.

Thanks again for the info and I'll post the results of your suggestions when I've had a go.
Theoretically, gigabit ethernet should max out at 450GB/hr, and assuming only 30% efficiency (which is just about right for ethernet), you should still get 125GB/hr.  Your LTO-2 tape drive can max out (for very large database files, streamed continuously without delay) at about 110GB/hr, so network bandwidth shouldn't be the issue there.  

Send as much as you can to it as fast as you can.  Backup servers are meant to be punished and pummelled whenever possible.  :-)

I always tell my network engineers that if they screw anything up, I'll find out by the following morning!  :-)

Let us know how it goes, and if you need any help configuring the job, feel free to ask.  Basically, though, you add all of your clients, set the multiplexing level (defaults to 4, and that's the max, I believe, without purchasing more licenses for it), and then set the source priority, and you should be golden.

Theoretical finish time should be within 11 hours, but real-world situations dictate that it'll probably take you closer to 20-30.  Still considerably better than the 83 it is/was taking.

Good luck!
As for picking from multiple jobs all ready to go in the queue, there is no order, it is totally random, just up to which one it happens to hit next. Having a priority order is already on the suggestion request list.

In general multiplexing will increase the total throughput. However to find the best takes some experimentation as to how many targets will be used at the same time, too many and throughput goes down. Also from what I have seen multiplexing has its own overhead which will take up around 10 and sometimes as much as 20% of the space on tape.
jahboiteAuthor Commented:
It does seem random, but the last two weekends, the jobs have started in the same (random) order. So maybe it isn't quite random....
jahboiteAuthor Commented:
This thread seems to have two streams, one of which slowed to a trickle just as another trickle was swelling magnificantly!

So to the original question (which technically is two questions):

"Faced with multiple jobs that have passed their scheduled start time, in what order will arcserve start the waiting jobs and why will it sometimes give trying to start some jobs altogether?"

This stream has run dry because I switched to using a multiplex job in place of the multiple jobs when I opened this thread.  Hence I never managed to get much data to see if a pattern emerged.  I also stopped seeing jobs that decided not to run; maybe because they knew I was on to them or maybe because I was fiddling too much...
So this stream ends and the questions remain unanswered.
But the points will not die.

For there is that other stream:

markthomasrosenecker dropped a bombshell and my eyes widened to the possibilities of multiplexing.  And what a difference it made!
The first week i tested it and was overjoyed to see the 1.3TB rocket on to tape in 31 hours (down from 80 odd!). In fact it was better than that because all but one of the machines finished inside 24 hours (and only four multiplexing streams at one time) with one machine taking just over 30 hours.
During the week after that we ditched Symantec AV and rolled out eTrust which seems to be much less of a resource hog!
The following weekend; 1.3TB in 18 hours!

And that my friends is what I consider a right royal result.

Which means markthomasrosenecker is a worthy winner of the poinks and thank you sir!
Thankee to all who participated!

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 3
  • 3
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now