asked on

Tons of TAPE_ERR4 errors

there are a lot of errors in reported in aix and tsm that are related to each other: during migration jobs, the errors below appear.
01/06/09 13:49:16 ANR8311E An I/O error occurred while accessing drive
tapedrv3 (/dev/rmt19) for WRITE operation, errno = 78.
(PROCESS: 112)
01/06/09 14:17:28 ANR8311E An I/O error occurred while accessing drive
tapedrv8 (/dev/rmt16) for WRITE operation, errno = 78.
(PROCESS: 172)
01/03/09 05:37:55 ANR8311E An I/O error occurred while accessing drive
tapedrv9 (/dev/rmt15) for READ operation, errno = 5.
(PROCESS: 85)
while the migation takes place, the errors in aix are produced below:

LABEL: TAPE_ERR4
IDENTIFIER: 5537AC5F

Date/Time: Tue Jan 6 13:49:16 CST 2009
Sequence Number: 7972
Machine Id: 0006AADBD600
Node Id: duke01
Class: H
Type: PERM
Resource Name: rmt19
Resource Class: tape
Resource Type: LTO
Location: U7311.D20.06042DC-P1-C08-T1-W224108001BC0BEC6-L1000000000000
VPD:
Manufacturer................IBM
Machine Type and Model......ULTRIUM-TD1
Serial Number...............VD3ASV0823BVA01785
Device Specific.(FW)........5AU1

Description
TAPE DRIVE FAILURE

Probable Causes
ADAPTER
TAPE DRIVE

Failure Causes
ADAPTER
TAPE DRIVE

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0600 0000 0A00 0400 0000 0000 0000 0000 0200 0300 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

When i was checking at the hba adapters, noticed that all of them have the same FRU number except for two of them? will this be cauising the problem. is this normal?

fcs0 & 1 has FRU number 03N5029 while the rest fcs2 -fcs8 10N8620

i would like some suggestions on tunning the hba's to make them achieve their max performance on our system.

Please shed some light on it.

AIX 5.3 9133-55A box.

ASKER CERTIFIED SOLUTION

woolmilkporc

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

techie27

ASKER

How can i check the the SAN switches microcode/firmware level? is there any command on the aix side to check for it? or is something the SAN tech will have to provide? Do you have a PDF to find the meaning for all the ERROR No. in AIX ? Thanks

woolmilkporc

Hi,
regarding the SAN switch, you'll have to use its user interface (web or telnet or ssh ...)
and issue the appropriate query to find out the firmware level.
What SAN switch do you use? If it's IBM or Brocade, I probably will be able to say more.

The AIX error numbers are contained in the header file /usr/include/sys/errno.h

wmp

techie27

ASKER

All of the switches are running at 3.2(3a).

How can I relate the VTL ports and their WWN to the specific adapter defined in AIX?

20:01:08:00:1b:e0:63:f9-S TVTL2_F4 22:01:08:00:1b:c0:63:f9 0xb10008 DANSAN002 fc9/3

20:01:08:00:1b:e0:63:f9-S TVTL2_F3 21:01:08:00:1b:c0:63:f9 0xb10009 DANSAN002 fc9/4

20:41:08:00:1b:e0:be:c6-S TVTL1_F8 25:41:08:00:1b:c0:be:c6 0x7d0000 DANSAN001 fc3/47

20:41:08:00:1b:e0:be:c6-S TVTL1_F7 24:41:08:00:1b:c0:be:c6 0x7d0001 DANSAN001 fc3/48

20:41:08:00:1b:e0:be:c6-S TVTL1_F4 22:41:08:00:1b:c0:be:c6 0xb10000 DANSAN002 fc3/47

20:41:08:00:1b:e0:be:c6-S TVTL1_F3 21:41:08:00:1b:c0:be:c6 0xb10001 DANSAN002 fc3/48
Thanks

woolmilkporc

Which command did produce the above output?

techie27

ASKER

I did not use any command the info was provided by the SAn techs.
Thanks

woolmilkporc

You could compare the WWNs contained in the above list (first resp. third column)
with the WWNs of your FC adapters, obtained via

lscfg -vl fcs[n] | grep Z8
or
lscfg -vl fcs[n] | grep Address

There might be a relation, but I'm afraid there isn't.

As I'm not familiar with your VTL, I guess there is not more I can say.

wmp

techie27

ASKER

how can i know for sue that by updating the microcode/firmware level on the adapters to the latest level will fix the issue with the time-out errors?

It really does not mentioned anything in the release notes for the new microcode level.

"Adapter timeouts could be related to microcode. so check microcode levels of the FC adapters by using 'lsmcode -d fcs[n]'.
Afaik the latest mcode is 271304 (1.50x1). If your mcode is older, you should install the newest one.
Please look here for instructions and download - "

Thanks

woolmilkporc

As I wrote: "... could be related ..."

You can't know anything for sure.

As I wrote, too: "... If you still encounter the above errors, please contact IBM support to fix it..."

Firmware is just one possibility, and I swear, the first thing IBM will tell you is: "Update your adapters to the latest firmware level and we'll see."

wmp

woolmilkporc

Hi,
why wouldn't you recommend accepting my answer http:#a23323541 ?
It's a valid and comprehensive answer to the original question.
A lot of follow-ups have been asked afterwards, which I answered as precise as could be, at least in my opinion. Maybe techie27 was not quite satisfied with my answers to these follow-ups, but as I said - the original question has been answered correctly!
wmp