Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

AIX: r m a n failing, AIX / backup tape issue?

Posted on 2013-01-09
18
Medium Priority
?
769 Views
Last Modified: 2013-01-16
Hi wmp,

I'm having a problem on a machine where R M A N is failing with error: "Server Status:  child process killed by signal". D B A claim this is either a AIX issue that kills the R M A N process, either a backup tape issue. Net backup is also used.

Is there a way to determine backup tape status on this LPAR?


* rmt0             U789D.001.DQD20H7-P1-bla  Other FC SCSI Tape Drive
* rmt1             U789D.001.DQD20H7-P1-bla  Other FC SCSI Tape Drive
* rmt2             U789D.001.DQD20H7-P1-bla  Other FC SCSI Tape Drive
* rmt3             U789D.001.DQD20H7-P1-bla  Other FC SCSI Tape Drive
0
Comment
Question by:g0all
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 8
18 Comments
 
LVL 1

Author Comment

by:g0all
ID: 38763170
tar cfv /dev/rmt0 file123
tar: /dev/rmt0: There is an input or output error.
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 38763635
Hi,

at first sight it looks as if there was no tape mounted.

But you said you're using NetBackup. If the NetBackup server is running and working it has perhaps opened the device for its own processing, so you can't use it for tar.

I'd really suggest using NB tools to query tape status, like "vmoprcmd".
0
 
LVL 1

Author Comment

by:g0all
ID: 38763727
hi wmp,

backups suddeny stopped working, and it seems without OS/NB changes.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 68

Expert Comment

by:woolmilkporc
ID: 38763737
What do you get with "vmoprcmd"?

Something in "errpt"?

Do you have a robotic library? If so, can you run "tldtest" or "robtest"?
0
 
LVL 1

Author Comment

by:g0all
ID: 38763748
I'll get back in a few hours, currently driving.

vmoprcmd without any parameters?
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 38763804
I was afraid you would ask that. I'm a TSM guru and don't know much about NB.

A quick search suggests this

/usr/openv/volmgr/bin/vmoprcmd -d

Add "-h <device_host>" if you're not on the device host.
0
 
LVL 1

Author Comment

by:g0all
ID: 38763943
No issue, I really appreciate your help :-)

i am just concern not to affect other OS parts like vio disks or some other FC drives.

Afraid not to mess anything :-)
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 38763958
vmoprcmd -d

doesn't change anything, it's just a status display.
0
 
LVL 1

Author Comment

by:g0all
ID: 38764155
Hi,

I've pasted below output :

PENDING REQUESTS

                                     <NONE>

                                  DRIVE STATUS

Drv Type   Control  User      Label  RecMID  ExtMID  Ready   Wr.Enbl.  ReqId
  0 hcart3   TLD                -                     No       -         0
  1 hcart3   TLD                -                     No       -         0

                             ADDITIONAL DRIVE STATUS

Drv DriveName            Shared    Assigned        Comment
  0 IBM.ULTRIUM-TD3.044   No       -
  1 IBM.ULTRIUM-TD3.045   No       -





/usr/openv/volmgr/bin/tpconfig -d
Id  DriveName           Type   Residence
      Drive Path                                                       Status
****************************************************************************
0   IBM.ULTRIUM-TD3.044  hcart3 TLD(8)  DRIVE=14
      /dev/rmt2.1                                                      UP
1   IBM.ULTRIUM-TD3.045  hcart3 TLD(8)  DRIVE=15
      /dev/rmt3.1                                                      UP

Currently defined robotics are:
  TLD(8)     robot control host = server077

EMM Server = server111
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 38764342
Looks good so far.

Seems that only rmt2 and rmt3 are defined to NB.
What's with rmt0/1?

Anyway, to dig deeper you should examine the logs in /usr/openv/logs and /usr/openv/netbackup/logs (directories "bptm", "ltid", "robots" et al.) and also "errpt".
Any hints?
0
 
LVL 1

Author Comment

by:g0all
ID: 38764451
Well, logs folder do not contain any entries with those names, however, errpt seems to show that both drives failed in different dates:

Date/Time:       Mon Jan  7 ---
Sequence Number: ---
Machine Id:      ---
Node Id:         myhostname
Class:           H
Type:            PERM
Resource Name:   rmt2
Resource Class:  tape
Resource Type:   ost
Location:        U789D.---
VPD:
        Manufacturer................IBM
        Machine Type and Model......ULTRIUM-TD3
        Serial Number...............
        Device Specific.(Z3)........0000

Description
TAPE DRIVE FAILURE

Probable Causes
ADAPTER
TAPE DRIVE

Failure Causes
TAPE DRIVE
ADAPTER

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0600 0800 0---
---------------------------------------------------------------------------
LABEL:          SC_TAPE_ERR4
IDENTIFIER:     ----

Date/Time:       Thu Dec 20 ---
Sequence Number: ---
Machine Id:      ---
Node Id:         myhostname
Class:           H
Type:            PERM
Resource Name:   rmt3
Resource Class:  tape
Resource Type:   ost
Location:        U789D.---
VPD:
        Manufacturer................IBM
        Machine Type and Model......ULTRIUM-TD3
        Serial Number...............
        Device Specific.(Z3)........0000

Description
TAPE DRIVE FAILURE

Probable Causes
ADAPTER
TAPE DRIVE

Failure Causes
TAPE DRIVE
ADAPTER

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0600 0A00 0---




wmp, do you think that a server reboot would solve issues?

Also pasted rest of errpt output:

...

P H rmt2           TAPE DRIVE FAILURE

...
U O RSF            RSF DIAGNOSTIC MESSAGE
P H rmt3           TAPE DRIVE FAILURE

U O RSF            RSF DIAGNOSTIC MESSAGE
U O RSF            RSF DIAGNOSTIC MESSAGE
T H fcs0           ADAPTER ERROR
T H fscsi0         ADAPTER ERROR
T H fcs0           ADAPTER ERROR
fscsi0         ADAPTER ERROR


--- some outputs I've censored due to fact that this is a public forum.
0
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 2000 total points
ID: 38764578
OK, besides the names I suggested, there must be some logs. Nothing found in there?
 /usr/openv/logs  /usr/openv/netbackup/logs?

Both errpt entries indicate permanent hardware failures, so it could be worth a try resetting the robotics and the drives. The drives should have an attached OP panel offering an option "Reset". I don't know your tape lbrary, so you will have to look for yourself.

Rebooting the server could also help, at least this will refresh all drivers, kernel extensions and also the NB server itself. I'm not a real friend of rebooting, but in this case ...

If after reset/reboot the RMAN failure will reoccur writing new messages to errpt I think it will be time to call your tape/library manufacturer's support!
0
 
LVL 1

Author Closing Comment

by:g0all
ID: 38764608
Thanks for the late night support, wmp.

I will recheck and open some other threads if necessary.

Thank you very much.

Have a great evening.
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 38764655
It's just 21:48 PM here, so not really "late night". I think we had sessions by later evening, right?

Anyway, cordial thanks for the nice wishes and for the points!

Have a fine evening, too!

wmp
0
 
LVL 1

Author Comment

by:g0all
ID: 38764688
1 Hour later here, but some other days it was much later :-)

Thank you!
0
 
LVL 1

Author Comment

by:g0all
ID: 38766496
wmp, I've read again your suggestions.

"Rebooting the server could also help, at least this will refresh all drivers, kernel extensions and also the NB server itself. "

I cannot reboot the Media Server (NB Server). I can only reboot my LPAR that contains failing backups.

Do you think rebooting the client OS would help?

Thanks!
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 38766697
You can try, but there is little hope if it's really a tape drive issue ...
0
 
LVL 1

Author Comment

by:g0all
ID: 38782279
wmp,

You were right (as usual). Problem was related to the hardware part (library).

_______MANY THANKS________ !!!
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Ready to improve network connectivity? Watch this webinar to learn how SD-WANs and a one-click instant connect tool can boost provisions, deployment, and management of your cloud connection.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

FreeBSD on EC2 FreeBSD (https://www.freebsd.org) is a robust Unix-like operating system that has been around for many years. FreeBSD is available on Amazon EC2 through Amazon Machine Images (AMIs) provided by FreeBSD developer and security office…
Every server (virtual or physical) needs a console: and the console can be provided through hardware directly connected, software for remote connections, local connections, through a KVM, etc. This document explains the different types of consol…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
Suggested Courses

721 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question