virgo0880
asked on
fcs5 ADAPTER ERROR in AIX 5.3
Hi All,
I am getting following errors for fcs5 adapter which is connected to the tape drives on my AIX 5.3 TSM server. How can I check whether the adapter is bad or not, as I am getting these errors more frequently. I have also verified the zoning information on switch side but dont see any issues there, but there are some loss of sync incidents on the ports. How should I troubleshoot whether it is a hardware error or any issues on the switch side?
errpt shows following error:
-------------------------- ---------- ---------- ---------- ---------- ---------
LABEL: FSCSI_ERR6
IDENTIFIER: B8FBD189
Date/Time: Sun Jan 3 05:20:44 CST 2013
Sequence Number: 1337815
Machine Id: 0112179A4C40
Node Id: servername
Class: S
Type: TEMP
Resource Name: fscsi5
Description
SOFTWARE PROGRAM ERROR
Probable Causes
ADAPTER MICROCODE
SOFTWARE PROGRAM
SOFTWARE DEVICE DRIVER
Failure Causes
ADAPTER MICROCODE
SOFTWARE PROGRAM
SOFTWARE DEVICE DRIVER
Recommended Actions
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
SENSE DATA
0000 0000 0000 00A1 0000 0006 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0062 0413 0000 0000
0062 0D13 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 C02F 0000 1612 0002 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 5005 0763
0240 6024 5005 0763 0200 6024 0400 0000 0000 0000 0000 0000 0000 0000 0000 0000
0FF6 B000
I am getting following errors for fcs5 adapter which is connected to the tape drives on my AIX 5.3 TSM server. How can I check whether the adapter is bad or not, as I am getting these errors more frequently. I have also verified the zoning information on switch side but dont see any issues there, but there are some loss of sync incidents on the ports. How should I troubleshoot whether it is a hardware error or any issues on the switch side?
errpt shows following error:
--------------------------
LABEL: FSCSI_ERR6
IDENTIFIER: B8FBD189
Date/Time: Sun Jan 3 05:20:44 CST 2013
Sequence Number: 1337815
Machine Id: 0112179A4C40
Node Id: servername
Class: S
Type: TEMP
Resource Name: fscsi5
Description
SOFTWARE PROGRAM ERROR
Probable Causes
ADAPTER MICROCODE
SOFTWARE PROGRAM
SOFTWARE DEVICE DRIVER
Failure Causes
ADAPTER MICROCODE
SOFTWARE PROGRAM
SOFTWARE DEVICE DRIVER
Recommended Actions
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
SENSE DATA
0000 0000 0000 00A1 0000 0006 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0062 0413 0000 0000
0062 0D13 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 C02F 0000 1612 0002 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 5005 0763
0240 6024 5005 0763 0200 6024 0400 0000 0000 0000 0000 0000 0000 0000 0000 0000
0FF6 B000
I suggest you upgrade to the last AIX 5.3 version, which is 5300-12-05-1140, along with the firmware on the device.
ASKER
I am already at that version. We have opened a hardware case with IBM and they will be replacing the HBA card. Now, this card is connected to tape drives as one of the path. So, before changing the card, what is the procedure for making this card unconfigure and remove without disturbing other parts? Can somebody share that information?
Thanks
Thanks
Unless you are going to be removing the card hot (without taking the system down), there isn't any unconfiguring on the host that must be done.
If you are doing it hot, you will need to use the 'hot-plug' task in the 'diag' utility (Task selection) which will step you through powering down the slot and removal/replacement of the card. The IBM FE should be well versed on this procedure.
When you bring the server back up it will use the same device files as the old card. You will need to rezone the switch because the WWN of the card will be changing, but you need to have the new card to know what the new WWN is to do the rezoning.
After the repair, you can go into the 'diag' utility and select "Log Repair Action" from the Task Selection menu to prevent the errpt entry from triggering further diagnostics on the new controller.
If you are doing it hot, you will need to use the 'hot-plug' task in the 'diag' utility (Task selection) which will step you through powering down the slot and removal/replacement of the card. The IBM FE should be well versed on this procedure.
When you bring the server back up it will use the same device files as the old card. You will need to rezone the switch because the WWN of the card will be changing, but you need to have the new card to know what the new WWN is to do the rezoning.
After the repair, you can go into the 'diag' utility and select "Log Repair Action" from the Task Selection menu to prevent the errpt entry from triggering further diagnostics on the new controller.
ASKER
When I am trying to unconfigure that device, it is throwing following errors:
Command: failed stdout: yes stderr: no
Before command completion, additional instructions may appear below.
Method error (/etc/methods/ucfgAtape):
0514-053 Error returned from sys_config.
Unable to unconfigure device: Device busy
fcnet4 deleted
rmt21 deleted
rmt22 deleted
rmt23 deleted
rmt26 deleted
How can I free up that device, so that we can run daignostic to check whether the card is bad or not? I also tried to do rmdev but getting device busy error.
Command: failed stdout: yes stderr: no
Before command completion, additional instructions may appear below.
Method error (/etc/methods/ucfgAtape):
0514-053 Error returned from sys_config.
Unable to unconfigure device: Device busy
fcnet4 deleted
rmt21 deleted
rmt22 deleted
rmt23 deleted
rmt26 deleted
How can I free up that device, so that we can run daignostic to check whether the card is bad or not? I also tried to do rmdev but getting device busy error.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
The tape drive is working fine, as I see my backup tapes are read/writing properly to all the tape drives. I think here the issue is the hba I am trying to unconfigure is tied to tape path and thats why I am not able to unconfigure it. Is there a way through which I can see what paths are used by this hba and remove that path so that it will free up the hba.
You shouldn't need the path because you can simply reference the device and have the O/S determine what is beng removed. Try using this command:
lsdev (this will show the device paths as well)
rmdev -l rmt123 (this is lowercase L - substitute the rmt123 for the device listed in lsdev)
Please note that the message "rmt123 defined" is a successful message. This means that the device is still in the Customized Devices definition database so that you can add it back later without reinstalling the drivers.
lsdev (this will show the device paths as well)
rmdev -l rmt123 (this is lowercase L - substitute the rmt123 for the device listed in lsdev)
Please note that the message "rmt123 defined" is a successful message. This means that the device is still in the Customized Devices definition database so that you can add it back later without reinstalling the drivers.
ASKER
I tried doing rmdev and cfgmgr but the drive is showing only 2 paths instead of showing 4 paths. In the output you can see, it is not showing the path rmt24 and rmt34. I tried offlining the drive from TSM, rmdev all the four paths and cfgmgr again. But that is not working. This have had worked before several times, but this time not working. I just did the reboot of the system and nothing else.
Output of lsdev -Cc tape command :
Output of lsdev -Cc tape command :
rmt1 Available 1n-08-02 IBM 3592 Tape Drive (FCP)
rmt2 Available 1n-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt3 Available 1n-08-02 IBM 3592 Tape Drive (FCP)
rmt4 Available 1A-08-02 IBM 3592 Tape Drive (FCP)
rmt5 Available 1n-08-02-PRI IBM 3592 Tape Drive (FCP)
rmt6 Available 1n-08-02 IBM 3592 Tape Drive (FCP)
rmt7 Available 1n-08-02 IBM 3592 Tape Drive (FCP)
rmt8 Available 1n-08-02 IBM 3592 Tape Drive (FCP)
rmt9 Available 1n-08-02 IBM 3592 Tape Drive (FCP)
rmt10 Available 1n-08-02 IBM 3592 Tape Drive (FCP)
rmt11 Available 1A-08-02 IBM 3592 Tape Drive (FCP)
rmt12 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt13 Available 1A-08-02 IBM 3592 Tape Drive (FCP)
rmt14 Available 2U-08-02 IBM 3592 Tape Drive (FCP)
rmt15 Available 1A-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt16 Available 1A-08-02 IBM 3592 Tape Drive (FCP)
rmt17 Available 1A-08-02 IBM 3592 Tape Drive (FCP)
rmt18 Available 1A-08-02 IBM 3592 Tape Drive (FCP)
rmt19 Available 1A-08-02 IBM 3592 Tape Drive (FCP)
rmt20 Available 1A-08-02 IBM 3592 Tape Drive (FCP)
rmt21 Available 2M-08-02 IBM 3592 Tape Drive (FCP)
rmt22 Available 2M-08-02 IBM 3592 Tape Drive (FCP)
rmt23 Available 2M-08-02 IBM 3592 Tape Drive (FCP)
rmt25 Available 2M-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt26 Available 2M-08-02 IBM 3592 Tape Drive (FCP)
rmt27 Available 2M-08-02 IBM 3592 Tape Drive (FCP)
rmt28 Available 2M-08-02 IBM 3592 Tape Drive (FCP)
rmt29 Available 2M-08-02 IBM 3592 Tape Drive (FCP)
rmt30 Available 2M-08-02 IBM 3592 Tape Drive (FCP)
rmt31 Available 2U-08-02 IBM 3592 Tape Drive (FCP)
rmt32 Available 2U-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt33 Available 2U-08-02 IBM 3592 Tape Drive (FCP)
rmt35 Available 2U-08-02-ALT IBM 3592 Tape Drive (FCP)
rmt36 Available 2U-08-02 IBM 3592 Tape Drive (FCP)
rmt37 Available 2U-08-02 IBM 3592 Tape Drive (FCP)
rmt38 Available 2U-08-02 IBM 3592 Tape Drive (FCP)
rmt39 Available 2U-08-02 IBM 3592 Tape Drive (FCP)
rmt40 Available 2U-08-02 IBM 3592 Tape Drive (FCP)
ASKER
There was problem with one of the FC port on the tape drive and it was bad. After replacing the tape drive, the issue was resolved and the errors were gone. Also, all the path was showing ok for this drive.
Giving points
Thanks
virgo
Giving points
Thanks
virgo
You need to update the firmware on the card or the driver in the O/S (or both). If this doesn't help, try to see if the driver is sending buggy signals byupdating it, and if you still have the issue, the only other source of this would be the firmware on the switch. You can do the upgrades in whatever order you see fit, but the order that I presented should give you the best chance of success with fewer upgrades.
This does not present as, nor does it look like a hardware error. It has the rare chance that the switch is sending bad packets, and this can be checked by zoning another port and moving the HBA connection to the new port.
Ernie