Link to home
Start Free TrialLog in
Avatar of virgo0880
virgo0880

asked on

fcs5 ADAPTER ERROR in AIX 5.3

Hi All,

I am getting following errors for fcs5 adapter which is connected to the tape drives on my AIX 5.3 TSM server. How can I check whether the adapter is bad or not, as I am getting these errors more frequently. I have also verified the zoning information on switch side but dont see any issues there, but there are some loss of sync incidents on the ports. How should I troubleshoot whether it is a hardware error or any issues on the switch side?

errpt shows following error:

---------------------------------------------------------------------------
LABEL:          FSCSI_ERR6
IDENTIFIER:     B8FBD189

Date/Time:       Sun Jan  3 05:20:44 CST 2013
Sequence Number: 1337815
Machine Id:      0112179A4C40
Node Id:         servername
Class:           S
Type:            TEMP
Resource Name:   fscsi5

Description
SOFTWARE PROGRAM ERROR

Probable Causes
ADAPTER MICROCODE
SOFTWARE PROGRAM
SOFTWARE DEVICE DRIVER

Failure Causes
ADAPTER MICROCODE
SOFTWARE PROGRAM
SOFTWARE DEVICE DRIVER

        Recommended Actions
        IF PROBLEM PERSISTS THEN DO THE FOLLOWING
        CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
SENSE DATA
0000 0000 0000 00A1 0000 0006 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0062 0413 0000 0000
0062 0D13 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 C02F 0000 1612 0002 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 5005 0763
0240 6024 5005 0763 0200 6024 0400 0000 0000 0000 0000 0000 0000 0000 0000 0000
0FF6 B000
Avatar of Ernie Gronblom
Ernie Gronblom
Flag of United States of America image

The error is not reporting any possible problem externally.  This is reporting as a software error in the firmware (Licensed Internal Code).  The error code indicates that a call is being made to the controller that the controller does not understand.  This is likely due to downrev firmware on the controller card.

You need to update the firmware on the card or the driver in the O/S (or both).  If this doesn't help, try to see if the driver is sending buggy signals byupdating it, and if you still have the issue, the only other source of this would be the firmware on the switch.  You can do the upgrades in whatever order you see fit, but the order that I presented should give you the best chance of success with fewer upgrades.

This does not present as, nor does it look like a hardware error.  It has the rare chance that the switch is sending bad packets, and this can be checked by zoning another port and moving the HBA connection to the new port.

Ernie
Avatar of Carl Dula
I suggest you upgrade to the last AIX 5.3 version, which is 5300-12-05-1140, along with the firmware on the device.
Avatar of virgo0880
virgo0880

ASKER

I am already at that version. We have opened a hardware case with IBM and they will be replacing the HBA card. Now, this card is connected to tape drives as one of the path. So, before changing the card, what is the procedure for making this card unconfigure and remove without disturbing other parts? Can somebody share that information?

Thanks
Unless you are going to be removing the card hot (without taking the system down), there isn't any unconfiguring on the host that must be done.

If you are doing it hot, you will need to use the 'hot-plug' task in the 'diag' utility (Task selection) which will step you through powering down the slot and removal/replacement of the card.  The IBM FE should be well versed on this procedure.

When you bring the server back up it will use the same device files as the old card.  You will need to rezone the switch because the WWN of the card will be changing, but you need to have the new card to know what the new WWN is to do the rezoning.

After the repair, you can go into the 'diag' utility and select "Log Repair Action" from the Task Selection menu to prevent the errpt entry from triggering further diagnostics on the new controller.
When I am trying to unconfigure that device, it is throwing following errors:

Command: failed        stdout: yes           stderr: no

Before command completion, additional instructions may appear below.

Method error (/etc/methods/ucfgAtape):
        0514-053 Error returned from sys_config.
Unable to unconfigure device: Device busy

fcnet4 deleted
rmt21 deleted
rmt22 deleted
rmt23 deleted
rmt26 deleted

How can I free up that device, so that we can run daignostic to check whether the card is bad or not? I also tried to do rmdev but getting device busy error.
ASKER CERTIFIED SOLUTION
Avatar of Ernie Gronblom
Ernie Gronblom
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The tape drive is working fine, as I see my backup tapes are read/writing properly to all the tape drives. I think here the issue is the hba I am trying to unconfigure is tied to tape path and thats why I am not able to unconfigure it. Is there a way through which I can see what paths are used by this hba and remove that path so that it will free up the hba.
You shouldn't need the path because you can simply reference the device and have the O/S determine what is beng removed.  Try using this command:
lsdev           (this will show the device paths as well)
rmdev -l rmt123    (this is lowercase L - substitute the rmt123 for the device listed in lsdev)

Please note that the message "rmt123 defined" is a successful message.  This means that the device is still in the Customized Devices definition database so that you can add it back later without reinstalling the drivers.
I tried doing rmdev and cfgmgr but the drive is showing only 2 paths instead of showing 4 paths. In the output you can see, it is not showing the path rmt24 and rmt34. I tried offlining the drive from TSM, rmdev all the four paths and cfgmgr again. But that is not working. This have had worked before several times, but this time not working. I just did the reboot of the system and nothing else.

Output of lsdev -Cc tape command :

rmt1  Available 1n-08-02     IBM 3592 Tape Drive (FCP)
rmt2  Available 1n-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt3  Available 1n-08-02     IBM 3592 Tape Drive (FCP)
rmt4  Available 1A-08-02     IBM 3592 Tape Drive (FCP)
rmt5  Available 1n-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt6  Available 1n-08-02     IBM 3592 Tape Drive (FCP)
rmt7  Available 1n-08-02     IBM 3592 Tape Drive (FCP)
rmt8  Available 1n-08-02     IBM 3592 Tape Drive (FCP)
rmt9  Available 1n-08-02     IBM 3592 Tape Drive (FCP)
rmt10 Available 1n-08-02     IBM 3592 Tape Drive (FCP)
rmt11 Available 1A-08-02     IBM 3592 Tape Drive (FCP)
rmt12 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt13 Available 1A-08-02     IBM 3592 Tape Drive (FCP)
rmt14 Available 2U-08-02     IBM 3592 Tape Drive (FCP)
rmt15 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt16 Available 1A-08-02     IBM 3592 Tape Drive (FCP)
rmt17 Available 1A-08-02     IBM 3592 Tape Drive (FCP)
rmt18 Available 1A-08-02     IBM 3592 Tape Drive (FCP)
rmt19 Available 1A-08-02     IBM 3592 Tape Drive (FCP)
rmt20 Available 1A-08-02     IBM 3592 Tape Drive (FCP)
rmt21 Available 2M-08-02     IBM 3592 Tape Drive (FCP)
rmt22 Available 2M-08-02     IBM 3592 Tape Drive (FCP)
rmt23 Available 2M-08-02     IBM 3592 Tape Drive (FCP)
rmt25 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt26 Available 2M-08-02     IBM 3592 Tape Drive (FCP)
rmt27 Available 2M-08-02     IBM 3592 Tape Drive (FCP)
rmt28 Available 2M-08-02     IBM 3592 Tape Drive (FCP)
rmt29 Available 2M-08-02     IBM 3592 Tape Drive (FCP)
rmt30 Available 2M-08-02     IBM 3592 Tape Drive (FCP)
rmt31 Available 2U-08-02     IBM 3592 Tape Drive (FCP)
rmt32 Available 2U-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt33 Available 2U-08-02     IBM 3592 Tape Drive (FCP)
rmt35 Available 2U-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt36 Available 2U-08-02     IBM 3592 Tape Drive (FCP)
rmt37 Available 2U-08-02     IBM 3592 Tape Drive (FCP)
rmt38 Available 2U-08-02     IBM 3592 Tape Drive (FCP)
rmt39 Available 2U-08-02     IBM 3592 Tape Drive (FCP)
rmt40 Available 2U-08-02     IBM 3592 Tape Drive (FCP)

Open in new window

There was problem with one of the FC port on the tape drive and it was bad. After replacing the tape drive, the issue was resolved and the errors were gone. Also, all the path was showing ok for this drive.

Giving points

Thanks
virgo