AIX errpt shows DISK errors

we are seeing IO errors on the database.,and when running
errpt -a
we get


LABEL:          SC_DISK_ERR4
IDENTIFIER:     DCB47997

Date/Time:       Wed Jul  8 18:04:47 2015
Sequence Number: 2410
Machine Id:      00C854A04C00
Node Id:         XXXXXXXX
Class:           H
Type:            TEMP
WPAR:            Global
Resource Name:   hdisk88
Resource Class:  disk
Resource Type:   2145
Location:        U9179.MHD.10854A0-V47-C47-T1-W500507680110B7E3-L82000000000000

VPD:            
        Manufacturer................IBM    
        Machine Type and Model......2145            
        ROS Level and ID............0000
        Device Specific.(Z0)........0000063268181002
        Device Specific.(Z1)........020060c
        Serial Number...............60050768018305CFC000000000000BFA

Description
DISK OPERATION ERROR

Probable Causes
MEDIA
DASD DEVICE

User Causes
MEDIA DEFECTIVE

        Recommended Actions
        FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
        PERFORM PROBLEM DETERMINATION PROCEDURES

Failure Causes
MEDIA
DISK DRIVE

        Recommended Actions
        FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
PATH ID
           8
SENSE DATA
0A00 2800 0D0F 7800 0002 0004 0000 0000 0000 0000 0000 0000 0200 0200 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 00CC 7264 0009 F300 0000 0000 0000 0000 0000 0000 0000 0003 0000
0000 0034 001D
---------------------------------------------------------------------------
LABEL:          FCP_ERR4
IDENTIFIER:     4B436A3D

Date/Time:       Wed Jul  8 18:04:46 2015
Sequence Number: 2409
Machine Id:      00C854A04C00
Node Id:         XXXXXXXX
Class:           H
Type:            TEMP
WPAR:            Global
Resource Name:   fscsi2
Resource Class:  driver
Resource Type:   efscsi
Location:        U9179.MHD.10854A0-V47-C247-T1


Description
LINK ERROR

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES



LABEL:          SC_DISK_ERR4
IDENTIFIER:     DCB47997

Date/Time:       Wed Jul  8 13:58:11 2015
Sequence Number: 2407
Machine Id:      00C854A04C00
Node Id:         XXXXXXXX
Class:           H
Type:            TEMP
WPAR:            Global
Resource Name:   hdisk4
Resource Class:  disk
Resource Type:   2145
Location:        U9179.MHD.10854A0-V47-C47-T1-W500507680110B9D7-L68000000000000

VPD:            
        Manufacturer................IBM    
        Machine Type and Model......2145            
        ROS Level and ID............0000
        Device Specific.(Z0)........0000063268181002
        Device Specific.(Z1)........020060c
        Serial Number...............60050768018305CFC000000000000BE0

Description
DISK OPERATION ERROR

Probable Causes
MEDIA
DASD DEVICE

User Causes
MEDIA DEFECTIVE

        Recommended Actions
        FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
        PERFORM PROBLEM DETERMINATION PROCEDURES

Failure Causes
MEDIA
DISK DRIVE

        Recommended Actions
        FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
PATH ID
           4
Location:        U9179.MHD.10854A0-V47-C247-T1


Description
LINK ERROR

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES



what should we do here?
LVL 11
it-rexAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

madunixCommented:
There is bad block issue on the media. Change media, it is always recommended to replace the disk when you see this kind of errors repeatedly.
0
it-rexAuthor Commented:
The Unix team said nothing wrong with disk(as o do not really trust them)
How can we confirm ?
How to check tha aix disks for errors?
0
David Johnson, CD, MVPOwnerCommented:
You already know that you have defective media.. replace the drive.. issue closed. You don't appear to have physical access to the machine so you can't check the disk using software that analyses the disk.. This is not a 'software error' but a hardware error.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

it-rexAuthor Commented:
This is the issue...
We need to prove to management that we have faulty media
Aix team is arguing we do not..
If there is a command we can ask them to run and reply with output that shows bad blocks/sectors that will end the argument .
0
David Johnson, CD, MVPOwnerCommented:
Isn't the errpt -a enough
0
it-rexAuthor Commented:
For me it is..
And experts word like yourself is,but our environment is too big and too complicated with lots of red rape politics ..
0
it-rexAuthor Commented:
Sorry meant red tape
0
madunixCommented:
Errpt reports errors. No error is 'fake', however you could do DIAG

Run diag:
-----------
 
Smitty diag
Current Shell Diagnostics
Enter
Select Advanced Diagnostics Routines
Select Problem Determination
Select the disk at fault (or everything listed)
let it run a check and give you feedback on if it is faulty or not
0
carlmdCommented:
Another comment here, the error is shown as TEMP, which typically means the system has already done bad block relocation. A call to IBM service for this and they typically will NOT replace the disk unless the error is PERM.

Yes you have hard disk error, but this type of thing is "normal", and thats why the system does block relocation.

If the errors continue to occur, then by all means recommend a call to IBM service for replacement, and they will then agree. But you must have more than a few errors.
0
madunixCommented:
Call your local IBM Support and open a case.
0
gheistCommented:
If you dont have IBM support paid up you must remove the broken drive from the machine.
0
it-rexAuthor Commented:
Here is the problem
These errors are random and only shows with high database load and every time a new hdisk name...
We are suspecting it is an adapter QD latency that may cause the fail of io submit
Does this make sense?

Having these random errors with random disks and all are different each time only with hi load
Does this really mean bad media?
0
gheistCommented:
Yes, the disk is bad. I had such at the times - you can read and write it end to end, IBM diags pass, but errors appear under load.
0
it-rexAuthor Commented:
@gheist
Any command line tools
To check or exploit the bad disks?
0
madunixCommented:
Check diag, as it has been mentioned above.
0
gheistCommented:
diag is for new hardware and regular tests. now you have write errors. IBM will exchange the disk if you ask (and have HW support paid)
0
carlmdCommented:
Given that you are reporting the errors seem to occur on a number random disks when the system is under heavy load, I doubt that IBM is going to come in and replace all of them.

I would contact support and report the problem as just that, disk errors on random disks under heavy load. IMHO it appears to point to a controller, or back plane, or something other than the disks themselves. Let IBM trouble shoot the problem, or help you to do so. After all, thats why you pay them the big bucks!
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Unix OS

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.