Solved

AIX: r m a n failing, AIX / backup tape issue?

Posted on 2013-01-09
18
756 Views
Last Modified: 2013-01-16
Hi wmp,

I'm having a problem on a machine where R M A N is failing with error: "Server Status:  child process killed by signal". D B A claim this is either a AIX issue that kills the R M A N process, either a backup tape issue. Net backup is also used.

Is there a way to determine backup tape status on this LPAR?


* rmt0             U789D.001.DQD20H7-P1-bla  Other FC SCSI Tape Drive
* rmt1             U789D.001.DQD20H7-P1-bla  Other FC SCSI Tape Drive
* rmt2             U789D.001.DQD20H7-P1-bla  Other FC SCSI Tape Drive
* rmt3             U789D.001.DQD20H7-P1-bla  Other FC SCSI Tape Drive
0
Comment
Question by:g0all
  • 10
  • 8
18 Comments
 
LVL 1

Author Comment

by:g0all
ID: 38763170
tar cfv /dev/rmt0 file123
tar: /dev/rmt0: There is an input or output error.
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 38763635
Hi,

at first sight it looks as if there was no tape mounted.

But you said you're using NetBackup. If the NetBackup server is running and working it has perhaps opened the device for its own processing, so you can't use it for tar.

I'd really suggest using NB tools to query tape status, like "vmoprcmd".
0
 
LVL 1

Author Comment

by:g0all
ID: 38763727
hi wmp,

backups suddeny stopped working, and it seems without OS/NB changes.
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 38763737
What do you get with "vmoprcmd"?

Something in "errpt"?

Do you have a robotic library? If so, can you run "tldtest" or "robtest"?
0
 
LVL 1

Author Comment

by:g0all
ID: 38763748
I'll get back in a few hours, currently driving.

vmoprcmd without any parameters?
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 38763804
I was afraid you would ask that. I'm a TSM guru and don't know much about NB.

A quick search suggests this

/usr/openv/volmgr/bin/vmoprcmd -d

Add "-h <device_host>" if you're not on the device host.
0
 
LVL 1

Author Comment

by:g0all
ID: 38763943
No issue, I really appreciate your help :-)

i am just concern not to affect other OS parts like vio disks or some other FC drives.

Afraid not to mess anything :-)
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 38763958
vmoprcmd -d

doesn't change anything, it's just a status display.
0
 
LVL 1

Author Comment

by:g0all
ID: 38764155
Hi,

I've pasted below output :

PENDING REQUESTS

                                     <NONE>

                                  DRIVE STATUS

Drv Type   Control  User      Label  RecMID  ExtMID  Ready   Wr.Enbl.  ReqId
  0 hcart3   TLD                -                     No       -         0
  1 hcart3   TLD                -                     No       -         0

                             ADDITIONAL DRIVE STATUS

Drv DriveName            Shared    Assigned        Comment
  0 IBM.ULTRIUM-TD3.044   No       -
  1 IBM.ULTRIUM-TD3.045   No       -





/usr/openv/volmgr/bin/tpconfig -d
Id  DriveName           Type   Residence
      Drive Path                                                       Status
****************************************************************************
0   IBM.ULTRIUM-TD3.044  hcart3 TLD(8)  DRIVE=14
      /dev/rmt2.1                                                      UP
1   IBM.ULTRIUM-TD3.045  hcart3 TLD(8)  DRIVE=15
      /dev/rmt3.1                                                      UP

Currently defined robotics are:
  TLD(8)     robot control host = server077

EMM Server = server111
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 68

Expert Comment

by:woolmilkporc
ID: 38764342
Looks good so far.

Seems that only rmt2 and rmt3 are defined to NB.
What's with rmt0/1?

Anyway, to dig deeper you should examine the logs in /usr/openv/logs and /usr/openv/netbackup/logs (directories "bptm", "ltid", "robots" et al.) and also "errpt".
Any hints?
0
 
LVL 1

Author Comment

by:g0all
ID: 38764451
Well, logs folder do not contain any entries with those names, however, errpt seems to show that both drives failed in different dates:

Date/Time:       Mon Jan  7 ---
Sequence Number: ---
Machine Id:      ---
Node Id:         myhostname
Class:           H
Type:            PERM
Resource Name:   rmt2
Resource Class:  tape
Resource Type:   ost
Location:        U789D.---
VPD:
        Manufacturer................IBM
        Machine Type and Model......ULTRIUM-TD3
        Serial Number...............
        Device Specific.(Z3)........0000

Description
TAPE DRIVE FAILURE

Probable Causes
ADAPTER
TAPE DRIVE

Failure Causes
TAPE DRIVE
ADAPTER

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0600 0800 0---
---------------------------------------------------------------------------
LABEL:          SC_TAPE_ERR4
IDENTIFIER:     ----

Date/Time:       Thu Dec 20 ---
Sequence Number: ---
Machine Id:      ---
Node Id:         myhostname
Class:           H
Type:            PERM
Resource Name:   rmt3
Resource Class:  tape
Resource Type:   ost
Location:        U789D.---
VPD:
        Manufacturer................IBM
        Machine Type and Model......ULTRIUM-TD3
        Serial Number...............
        Device Specific.(Z3)........0000

Description
TAPE DRIVE FAILURE

Probable Causes
ADAPTER
TAPE DRIVE

Failure Causes
TAPE DRIVE
ADAPTER

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0600 0A00 0---




wmp, do you think that a server reboot would solve issues?

Also pasted rest of errpt output:

...

P H rmt2           TAPE DRIVE FAILURE

...
U O RSF            RSF DIAGNOSTIC MESSAGE
P H rmt3           TAPE DRIVE FAILURE

U O RSF            RSF DIAGNOSTIC MESSAGE
U O RSF            RSF DIAGNOSTIC MESSAGE
T H fcs0           ADAPTER ERROR
T H fscsi0         ADAPTER ERROR
T H fcs0           ADAPTER ERROR
fscsi0         ADAPTER ERROR


--- some outputs I've censored due to fact that this is a public forum.
0
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 500 total points
ID: 38764578
OK, besides the names I suggested, there must be some logs. Nothing found in there?
 /usr/openv/logs  /usr/openv/netbackup/logs?

Both errpt entries indicate permanent hardware failures, so it could be worth a try resetting the robotics and the drives. The drives should have an attached OP panel offering an option "Reset". I don't know your tape lbrary, so you will have to look for yourself.

Rebooting the server could also help, at least this will refresh all drivers, kernel extensions and also the NB server itself. I'm not a real friend of rebooting, but in this case ...

If after reset/reboot the RMAN failure will reoccur writing new messages to errpt I think it will be time to call your tape/library manufacturer's support!
0
 
LVL 1

Author Closing Comment

by:g0all
ID: 38764608
Thanks for the late night support, wmp.

I will recheck and open some other threads if necessary.

Thank you very much.

Have a great evening.
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 38764655
It's just 21:48 PM here, so not really "late night". I think we had sessions by later evening, right?

Anyway, cordial thanks for the nice wishes and for the points!

Have a fine evening, too!

wmp
0
 
LVL 1

Author Comment

by:g0all
ID: 38764688
1 Hour later here, but some other days it was much later :-)

Thank you!
0
 
LVL 1

Author Comment

by:g0all
ID: 38766496
wmp, I've read again your suggestions.

"Rebooting the server could also help, at least this will refresh all drivers, kernel extensions and also the NB server itself. "

I cannot reboot the Media Server (NB Server). I can only reboot my LPAR that contains failing backups.

Do you think rebooting the client OS would help?

Thanks!
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 38766697
You can try, but there is little hope if it's really a tape drive issue ...
0
 
LVL 1

Author Comment

by:g0all
ID: 38782279
wmp,

You were right (as usual). Problem was related to the hardware part (library).

_______MANY THANKS________ !!!
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Hello fellow BSD lovers, I've created a patch process for patching openjdk6 for BSD (FreeBSD specifically), although I tried to keep all BSD versions in mind when creating my patch. Welcome to OpenJDK6 on BSD First let me start with a little …
Every server (virtual or physical) needs a console: and the console can be provided through hardware directly connected, software for remote connections, local connections, through a KVM, etc. This document explains the different types of consol…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now