virgo0880
asked on
TSM Tape drive failure Error
Hi All,
I got following message on my TSM server errpt
"daemon:notice root: IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION 5537AC5F 0217171511 P H rmt26 TAPE DRIVE FAILURE "
daemon:notice root: IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION 5537AC5F 0217171511 P H rmt16 TAPE DRIVE FAILURE "
Now the problem is there was one space reclamation which was running and it seems it is using this tape drive, now my other migration processes are waiting for the tape drives, as 8 tape drives are in use for other backups. I tried to cancel it, but it is not getting cancel. kindly let me know what can be done in this situtation ? Tape library is IBM L3494.
Thanks
virgo
I got following message on my TSM server errpt
"daemon:notice root: IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION 5537AC5F 0217171511 P H rmt26 TAPE DRIVE FAILURE "
daemon:notice root: IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION 5537AC5F 0217171511 P H rmt16 TAPE DRIVE FAILURE "
Now the problem is there was one space reclamation which was running and it seems it is using this tape drive, now my other migration processes are waiting for the tape drives, as 8 tape drives are in use for other backups. I tried to cancel it, but it is not getting cancel. kindly let me know what can be done in this situtation ? Tape library is IBM L3494.
Thanks
virgo
ASKER
I logged a call with IBM and the IBM engineer is going to replace the Tape Drive, as the tape got stuck in the drive. He will be replacing it now. But the problem is, he is just a Library engineer, he doesnt know how to make drive online /visible to TSM. Can u tell me the steps for the same. Currently "q path" is showing drive is not online.
Thanks
virgo
Thanks
virgo
I hope your engineer is capable of making the drive known to the library, since that's too complex to explain it here.
Once the drive is replaced remove its definition from AIX with "rmdev -dl rmt26" and "rmdev -dl rmt16" (it's obviously an "alt_pathing" device)
Seems that the drive is not in use by TSM, but if it is you must free it some way. In the worst case, if anything fails you will have to restart TSM.
Now run "cfgmgr". Does the new drive show up with the same device name(s) as the old one?
I don't know if your engineer will update the serial number of the new drive to be the same as the one of the old drive.
If you can keep the old number, you'll just have to issue on TSM (dsmadmc):
Upd path servername drivename srctype=server desttype=drive library=libraryname online=yes
and
Upd drive libraryname drivename online=yes
servername, drivename and libraryname are the TSM internal names, not things like hostname or /dev/rmtx!
If you get a new serial number it could well be that you'll have to remove the complete path/drive definition from TSM and recreate it, so TSM will recognize the new serial number.
If you don't know how to do that please let me know. I'll assist you.
wmp
Once the drive is replaced remove its definition from AIX with "rmdev -dl rmt26" and "rmdev -dl rmt16" (it's obviously an "alt_pathing" device)
Seems that the drive is not in use by TSM, but if it is you must free it some way. In the worst case, if anything fails you will have to restart TSM.
Now run "cfgmgr". Does the new drive show up with the same device name(s) as the old one?
I don't know if your engineer will update the serial number of the new drive to be the same as the one of the old drive.
If you can keep the old number, you'll just have to issue on TSM (dsmadmc):
Upd path servername drivename srctype=server desttype=drive library=libraryname online=yes
and
Upd drive libraryname drivename online=yes
servername, drivename and libraryname are the TSM internal names, not things like hostname or /dev/rmtx!
If you get a new serial number it could well be that you'll have to remove the complete path/drive definition from TSM and recreate it, so TSM will recognize the new serial number.
If you don't know how to do that please let me know. I'll assist you.
wmp
ASKER
Yes, he has used the old serial number with the new drive. Also, I have 4 paths showing as defined as when he removed the drive. which are showing in defined state : output of lsdev -Cc tape
snbc108:/# lsdev -Cc tape
lmcp0 Available LAN/TTY Library Management Control Point
rmt0 Available 1Z-08-00-1,0 LVD SCSI Tape Drive
rmt1 Available 1n-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt2 Available 1n-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt3 Available 1n-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt4 Available 1n-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt5 Available 1n-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt6 Defined 1n-08-02 IBM 3592 Tape Drive (FCP)
rmt7 Available 1n-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt8 Available 1n-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt9 Available 1n-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt10 Available 1n-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt11 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt12 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt13 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt14 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt15 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt16 Defined 1A-08-02 IBM 3592 Tape Drive (FCP)
rmt17 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt18 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt19 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt20 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt21 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt22 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt23 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt24 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt25 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt26 Defined 2M-08-02 IBM 3592 Tape Drive (FCP)
rmt27 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt28 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt29 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt30 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt31 Available 2U-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt32 Available 2U-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt33 Available 2U-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt34 Available 2U-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt35 Available 2U-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt36 Defined 2U-08-02 IBM 3592 Tape Drive (FCP)
rmt37 Available 2U-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt38 Available 2U-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt39 Available 2U-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt40 Available 2U-08-02-PRI IBM 3592 Tape Drive (FCP)
Also, can u let me know first i have to removed the paths, do cfgmgr and then give the tsm commands to update the drive right...? Will it affect any backups to tapes which are running currently or migration processes runing currrently. Also where can i get servername,drivename and libraryname details...is there any command for that ?
snbc108:/# lsdev -Cc tape
lmcp0 Available LAN/TTY Library Management Control Point
rmt0 Available 1Z-08-00-1,0 LVD SCSI Tape Drive
rmt1 Available 1n-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt2 Available 1n-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt3 Available 1n-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt4 Available 1n-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt5 Available 1n-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt6 Defined 1n-08-02 IBM 3592 Tape Drive (FCP)
rmt7 Available 1n-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt8 Available 1n-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt9 Available 1n-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt10 Available 1n-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt11 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt12 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt13 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt14 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt15 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt16 Defined 1A-08-02 IBM 3592 Tape Drive (FCP)
rmt17 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt18 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt19 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt20 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt21 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt22 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt23 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt24 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt25 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt26 Defined 2M-08-02 IBM 3592 Tape Drive (FCP)
rmt27 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt28 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt29 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt30 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt31 Available 2U-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt32 Available 2U-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt33 Available 2U-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt34 Available 2U-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt35 Available 2U-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt36 Defined 2U-08-02 IBM 3592 Tape Drive (FCP)
rmt37 Available 2U-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt38 Available 2U-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt39 Available 2U-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt40 Available 2U-08-02-PRI IBM 3592 Tape Drive (FCP)
Also, can u let me know first i have to removed the paths, do cfgmgr and then give the tsm commands to update the drive right...? Will it affect any backups to tapes which are running currently or migration processes runing currrently. Also where can i get servername,drivename and libraryname details...is there any command for that ?
lsdev does not show paths, but devices. Please avoid misunderstandings!
OK, your lsdev shows four primary devices which are defined, but not present.
Why is that? Which is the failing one?
On TSM issue
Q PATH * failing_tsm_drive
to see the rmt number.
OK, your lsdev shows four primary devices which are defined, but not present.
Why is that? Which is the failing one?
On TSM issue
Q PATH * failing_tsm_drive
to see the rmt number.
ASKER
lsdev is showing 1 pri and 3 alt paths (rmt6,rmt16,rmt26,rmt36) tape devices in defined state as we see in the output. Also, the output of the command given by you is :
Source Name Source Type Destination Destination On-Line
Name Type
----------- ----------- ----------- ----------- -------
TSM SERVER TDRV6 DRIVE No
Source Name Source Type Destination Destination On-Line
Name Type
----------- ----------- ----------- ----------- -------
TSM SERVER TDRV6 DRIVE No
Sorry, the command should have been
Q PATH * TDRV6 F=D
Where do you see ALT paths?
Please verify with
lscfg -vl rmt6
lscfg -vl rmt16
lscfg -vl rmt26
lscfg -vl rmt36
and compare "Serial Number"
If they're identical remove all four devices using "rmdev -dl ...", run cfgmgr, then issue the TSM commands I gave you.
Currently running processes/sessions will not be affected.
Q PATH * TDRV6 F=D
Where do you see ALT paths?
Please verify with
lscfg -vl rmt6
lscfg -vl rmt16
lscfg -vl rmt26
lscfg -vl rmt36
and compare "Serial Number"
If they're identical remove all four devices using "rmdev -dl ...", run cfgmgr, then issue the TSM commands I gave you.
Currently running processes/sessions will not be affected.
ASKER
Yes the drive is showing online now in q path, I did the steps given by you and it worked fine for me.
Regarding your question I can see in lsdev - PRI or ALT..keywork which I think is the primary and alternate paths. You can see the same in the output given above.
Is there any other way to find whether drive is online or not. I can see the drive is online using command given by you, heres the output :
Source Name: TSM
Source Type: SERVER
Destination Name: TDRV6
Destination Type: DRIVE
Library: L3494B
Node Name:
Device: /dev/rmt16
External Manager:
LUN:
Initiator: 0
Directory:
On-Line: Yes
Last Update by (administrator):
Last Update Date/Time: 02/18/11 04:23:00
So, do you think the tape drive is online now..?
Thanks
virgo
Regarding your question I can see in lsdev - PRI or ALT..keywork which I think is the primary and alternate paths. You can see the same in the output given above.
Is there any other way to find whether drive is online or not. I can see the drive is online using command given by you, heres the output :
Source Name: TSM
Source Type: SERVER
Destination Name: TDRV6
Destination Type: DRIVE
Library: L3494B
Node Name:
Device: /dev/rmt16
External Manager:
LUN:
Initiator: 0
Directory:
On-Line: Yes
Last Update by (administrator):
Last Update Date/Time: 02/18/11 04:23:00
So, do you think the tape drive is online now..?
Thanks
virgo
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
OK
First, you should issue "errpt -a -j 5537AC5F " on the TSM server to get the full error messages.
The error identifier points to some drive or adapter failure. It's not a media problem.
Are there accompanying 0BA49C99 errors (check with "errpt")? If so, the culprit could be the adapter of either the AIX box or (rather) the one of the drive. A most common reason for such failures is cabling (plugs/pins). Please check!
Next, make sure that the drive is still online.
Use the library's web interface, if you enabled it:
http://library_hostname/srvrroot/en/en-us/wsindex
Select "Monitor Library Manager" on the left, then "Component Availability".
All drives available?
Check Operator Interventions:
Select "Monitor Library Manager" on the left, then "Operator Interventions"
What do you see?
If you didn't enable the web interface you will have to walk to your library to perform the above checks.
Please report back the results!
wmp