Backup tapes do not write up to their capacity

Hey everyone,

I have this weird issue with a backup library and ArcServe 11.5. I have a graphics department, working with their own server. Quite often, the have to backup their data to tapes. On the hardware side, they've got the HP MSL2024 tape library with two LTO3 tape devices - and LTO3 tapes. As a backup software, they are using ArcServe 11.5 SP4.

Now, quite often (if not all the time) the data written on tapes is way under LTO 3 tape capacity of 400GB minimum (800GB with 2:1 compression). This recent job could be used as an example (log attached). Out of tape's capacity of 400GB, only 276GB are actually written, before ArcServe requests next tape in sequence. I have disabled file estimation so that ArcServe would write up until the real end of the tape, to no avail.

Any ideas? I've been battling with this for quite a while now. I've even sent the library for a checkup and it returned clean. I often perform head cleaning (last one done yesterday).
ArcServeLog.txt
LVL 4
KeterHDAsked:
Who is Participating?
 
Thomas RushCommented:
The important section from the drive assesment test:
|    |__        1.8 m/sec. tape speed: Effective capacity: 72.3%   Margin: -51.2%   (1.5/1.5 GB written using 171.5 metres of tape)
    |    |__                  Channel variation: 36.0%   Channel variation margin: -44.0%
    |    |__        2.1 m/sec. tape speed: Effective capacity: 71.6%   Margin: -56.1%   (1.5/1.5 GB written using 172.9 metres of tape)
    |    |__                  Channel variation: 43.4%   Channel variation margin: -73.6%
    |    |__        2.4 m/sec. tape speed: Effective capacity: 76.9%   Margin: -20.9%   (1.2/1.2 GB written using 117.8 metres of tape)
    |    |__                  Channel variation: 35.2%   Channel variation margin: -40.6%
    |    |__        2.7 m/sec. tape speed: Effective capacity: 73.0%   Margin: -46.5%   (1.2/1.2 GB written using 122.2 metres of tape)
    |    |__                  Channel variation: 40.5%   Channel variation margin: -62.0%
    |    |__        3.0 m/sec. tape speed: Effective capacity: 69.9%   Margin: -25.3%   (1.2/1.2 GB written using 126.3 metres of tape)
    |    |__                  Channel variation: 46.7%   Channel variation margin: -86.7%
    |    |__        3.4 m/sec. tape speed: Effective capacity: 67.9%   Margin: -35.4%   (1.2/1.2 GB written using 128.9 metres of tape)
    |    |__                  Channel variation: 49.5%   Channel variation margin: -98.1%
    |    |__        3.7 m/sec. tape speed: Effective capacity: 61.6%   Margin: -33.7%   (1.2/1.2 GB written using 139.2 metres of tape)
    |    |__                  Channel variation: 55.4%   Channel variation margin: -100.0%
    |    |__        4.0 m/sec. tape speed: Effective capacity: 52.5%   Margin: -69.8%   (1.2/1.2 GB written using 159.1 metres of tape)
    |    |__                  Channel variation: 65.8%   Channel variation margin: -100.0%
    |    |__        forward direction: Effective capacity: 47.6%   Margin: -100.0%   (3.9/3.9 GB written using 580.4 metres of tape)
    |    |__                  Channel variation: 109.8%   Channel variation margin: -100.0%
    |    |__        reverse direction: Effective capacity: 81.6%   Margin: 33.0%   (6.2/6.2 GB written using 557.5 metres of tape)
    |    |__                  Channel variation: 6.2%   Channel variation margin: 75.3%
    |    |__ Overall drive margin: -30.7%
    |    |__ Worst-case margin (forward direction): -100.0%
    |    |__ Worst-case channel variation margin (forward direction): -100.0%
    |    |__ The LTO Drive Assessment Test has checked the history and operation of the selected drive, and
    |    |__ problems have been reported.
    |    |__ The drive is no longer recommended for use.

As for storing disks on a shelf -- nobody that knows hard disk technology would recommend this for more than a short period of time.  There is a low-level firmware process in spinning hard drives that periodically reads each sector of a disk and checks for signal strength.  If it's low, the sector is re-written and checked again.  If too low, the data is copied to a reserved area of the disk, and the original sector is remapped to the new location.   You as a user won't ever know that this has happened.

If the drive is sitting without power on a shelf, that process won't happen, and the bits on the disk are free to flip from 1 to 0 (or 2/3, even, possibly :)  ).  You won't ever know until you try to read the disk, and find some or all of your data gone.   I can't tell you if that will be likely to happen in six months, or a year, or five years, or never for a particular disk, because nobody knows.  I don't believe that a single disk vendor publishes a "data lifetime (unpowered)" statistic for their disks, because they don't test it.   Disk is not designed for long-term powered-off storage.  

Tape has huge redundancy built in.   I've heard that you can cut over an inch out of an LTO tape and still recover 100% of the data.  And tape is tested and rated for 20 - 30 years life sitting on a shelf without more than basic climate control (and sometimes not even that -- see this story on 40-year-old NASA tapes that were kept in a garage for over 20 years of that time, yet were read with 100% success: http://www.nasa.gov/topics/moonmars/features/LOIRP/  )
0
 
KeterHDAuthor Commented:
In the meanwhile, I ran another back on the same tape, and now it backed up 283GB...
0
 
andyalderCommented:
You sent the library off for a checkup? HP's Library and Tape Tools is a free download that will fully test the library, upgrade the firmware etc and it will test it using your server so as to eliminate any hardware problem with your server's controller and SCSI cable as well as the library. Not only that but it will do a test using your media and it could easily be your media that is at fault.
0
Network Scalability - Handle Complex Environments

Monitor your entire network from a single platform. Free 30 Day Trial Now!

 
Gerald ConnollyCommented:
Don't overuse the cleaning tapes they are very abrasive!

Not an expert on Arcserve, but are you sure your tapes don't already have something on them, a failed backup maybe?
0
 
KeterHDAuthor Commented:
andyalder: Of course, mate ;) I use LT&T and so far they don't seem to show much of an issue. I've upgraded the firmware for both tapes and the library itself. Due to the fact we have a service agreement for the library, I sent it off for advanced checkup, which returned nothing.

connollyg: I set all backup jobs to overwrite the tapes. Also, I've erased and formatted the tapes - to no avail, unfortunately.
0
 
andyalderCommented:
Can you get the statistics from a tape using Arcserve or L&TT, it might show a large amount of soft write errors meaning that you won't get the full capacity since when read-after-write fails it writes the data again further down the tape to avoid going back to re-write which would stop it streaming.

I presume there aren't any other multiplexed jobs using the same tape as they use space on it and won't appear in this job log.
0
 
KeterHDAuthor Commented:
andyalder: Will check it out. Do you remember where exactly the setting is? If not, it's okay, I'll find it. Don't think I have the option is AS, but LT&T surely must have the option.

And yes, there's only one job running at a time.
0
 
KeterHDAuthor Commented:
Rather, it should be L&TT.
0
 
Thomas RushCommented:
This behaviour is often a symptom of a tape head going (gone?) bad.  In essence, when the tape head deteriorates, it writes more and more bad blocks.  Each bad block has to be written again until it is 'right'; each failed write skips the badly written chunk of tape and writes the next sector.   So lots of bad writes mean lots of skipped tape which means way less usable capacity which means... maybe only 283GB on a 400GB tape.

This could also be caused by a dirty head (but you've cleaned it) or bad media.  If this happens to all tapes, it's probably the drive problem.

Library and Tape Tools has both media and drive tests.   I encourage you to run them again to check what it says about the drive in particular.   You can generate a "support ticket" which will help identify the actual problems.

HP also has TapeAssure, another free utility that can monitor tape use in real time, including performance and compression.   It's a bit more challenging install, you'll need to also install CommandView Tape Library (free) to show the results.   See http://www.hp.com/go/tapeassure
TapeAssure is also really good in complex environments, as it lets you monitor how much each individual tape drive is being used (hey, why is this drive only used for three hours a day, but the other seven are being used 20 hours a day?).

0
 
KeterHDAuthor Commented:
@SelfGovern: Thanks a lot. In general terms, I am aware of these issues. My main problem right now is how to find the necessary data. For example, where can I see the soft write errors? As far as I can see, ArcServe does not log those, can I see it from within L&TT?
0
 
Thomas RushCommented:
Run the drive test in L&TT.   It will show you what happens during that test run.

You can also generate a support ticket in L&TT.  That will tell you what's been going on with the drive (such as, "tape head is at 48% of margin.  Replace tape drive."

If it is the tape drive that needs to be replaced -- LTO-4 will read and write LTO-3 tapes, doubling your capacity per tape, and possibly increasing performance by up to 50% (if your disks can feed the data fast enough).   LTO-5 can read LTO-3 tapes, but cannot write them.   Capacity is 1.5TB native (almost 4x your current tapes) and performance is 140MB/sec native (75% faster than LTO-3).

In addition, both LTO-4 and LTO-5 support hardware encryption at the drive, for no loss of performance or compression.   I don't remember if your version of ARCserve handles encryption, but whether it does or not, the MSL Encryption Kit from HP will work with LTO-4 and LTO-5 drives to encrypt data regardless of the backup application.

Oh -- with some applications, go to a written piece of media, right click and select "properties" to get the write error info, space used, etc., at least in rudimentary form.
0
 
andyalderCommented:
Umm, I don't think you'll find an LTO4 drive will double the writeable capacity of an LTO3 tape, after all an LTO3 drive may have to read it again.
0
 
Thomas RushCommented:
Andy, you're correct -- native capacity of LTO-4 tape is twice that of an LTO-3 tape, and sticking an LTO-n tape into an LTO-(n+1) drive doesn't change the tape's capacity.   Sloppy choice of words on my part.

If the LTO-3 is swapped out for an LTO-4, the questioner would be able to read and write to his LTO-3 tapes, and use of LTO-4 tapes would double the capacity (and potentially increase performance).
0
 
KeterHDAuthor Commented:
Wait, wait, wait... Maybe it's the bad night's sleep I had, but is your suggestion to swap the LTO3 tape drives for LTO4 tape drives?

Overall, it might not be such a bad idea, although, to be honest, I am thinking of switching the backup method for a hard drive based one (for example, buy some small server with disks inside, then buy a few external HDDs and have double backups of everything).
0
 
KeterHDAuthor Commented:
Above are results of DAT and support tickets for both drives. The results aren't perfect (is it possible some of the errors are due to SCSI?), but I'm having a hard time deciphering all this information.
0
 
Thomas RushCommented:
One of the main things to look for is "margin".  I can't tell you how they came up with that word, but 10)% is factory new excellent condition.  Over time the drive degrades until it gets to 0% margin, which represents normal end-of-life.   At some point after 0% margin, you'll start to get unacceptable error count, and margin continues to go more and more negative.

Your drive 1 shows consistent negative margins, and should be replaced.
Your drive 2 is a bit more interesting; I see one negative margin warning in the support ticket, but other things look good... so this might be one to float by HP.

You can run your backups to disk; many people are.   Be aware, though, that disk is a poor choice for long-term storage of backup data, or archival use.   (You don't want to keep spending money on electricity to power the disks for years, and, you want your archive offsite where one disaster won't take out everything.
0
 
KeterHDAuthor Commented:
@SelfGovern: When I was talking about disks, I meant external disks, that would be kept in a safe place. In such scenario, are they worse than tapes? Does HDD quality deteriorate over tape faster than that of tapes?
0
 
KeterHDAuthor Commented:
Also, I am not entirely sure I can see the negative margins you are talking about. Could you point me to an example?
0
 
KeterHDAuthor Commented:
That's so excellent insight, thanks a lot, mate.

The company that provides us service for the library claims the issue might be with the SCSI controller. I guess I'll have to haul the library to another set of servers I've got, connect it there and see if it resolves the issue. I'll keep this updated as soon as possible.
0
 
KeterHDAuthor Commented:
Hey, thanks a lot for the help. Apparently - yes, the issue is with the library. When I connect another separate backup tape I've got, it backs up just fine, using same SCSI cables, power cable and same cartridge.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.