Link to home
Start Free TrialLog in
Avatar of jjmekkattil
jjmekkattil

asked on

Event 259 AMD RAID API

Every so often I would get the following error:

The description for Event ID 259 from source AMD RAID API cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

Task 20 timeout on port 3 target 1 at LBA 0x0a0245c00 (Length 0x44)

the message resource is present but the message is not found in the string/message table


I've looked everywhere and I don't seem to be able to find a fix for this.  It's an annoying error when it pops up my computer sort of freezes for like 3-4 seconds and then continues to what it was doing.
Avatar of gheist
gheist
Flag of Belgium image

Which means that inside RAID array timed out. Which means that disk is either with cable loose or going bad.
It happens that there is no event log packages even for HCL drivers, nothing to worry about just read the text and use manufacturers diagnostic tools to check.
Avatar of David
The error means the HDD is going through deep recovery > 3 seconds on a unrecoverable read at the indicated block.    You're probably using desktop instead of enterprise disks which will result in >3 seconds for error recovery.

The fix is to use enterprise class storage that will either stop deep recovery in 2 seconds so the block can be extrapolated from parity, or to ...  well. That is it.  This is why enterprise disks cost more money than the cheap desktop drives.
AMD RaidXpert comes on high-end desktop motherboards... The timeout usually is that drives are set to spin down themselves and dont respond to commands.
You need to make sure drives have all automated spin-downs off and RAID manages power
Latest driver here, may help getting right errors if you install full package
http://support.amd.com/en-us/download/chipset?os=Windows%208.1%20-%2064
Drives won't spin down unless there is no activity.  Figure the odds that the disk hasn't had a single I/O request for 5 minutes, minimum while you were using the PC.  It just isn't possible.

This is deep recovery due to a read error.
Some drives are not compatible with some of disk power saving measures like sata link power saving, so they turn off until bus is reset. Seen years ago, though no doubt modern disks are not much better.
Yes, with clear indication of read error i'd take care investigating read error, and not lack of event log decoder..
Avatar of jjmekkattil
jjmekkattil

ASKER

So at minimum changing the power settings on the harddrive to never go to sleep won't fix anything correct?
First you need to check drive SMART status, namely SMART error log.
Yes, it is free
Try to use common sense about attributes. Depending on manufacturer failure approaches from different sides...
Yes this thing is getting annoying its popping up like ever few seconds....User generated imageUser generated image
Try the SMART utility you found. If the sector is bad it should appear in self-test logs and in drive error log.
OK so I finally got something to work right....and here is the infomation
Capture.PNG
Is it just one disk and one sector?
Do you think you can manage warranty replacement?

I would suggest to get smart details and run smart internal self-test (connected to other computer?)
If it says same sector - ok drive is bad and unusable and you need a new in place.
Also internal drive self-test is more likely to flip status from good to pre-failure...
What doesn't make sense is that the smart status for the drive says healthy
Thats why I am suggesting to run internal self-test (ubuntu or knoppix USB stick has one) for the drive.
The detailed report says all 3 errors on on the same drive and sector.   If the other error logs indicate same drive, but different sector(s), then you should just replace it.  Bad blocks (alone) do NOT trigger a SMART predictive failure alert. There are tens of thousands of spare blocks.  SMART uses an algorithm to determine if a drive is in DEGRADED condition and failure is imminent.  

But long term, as I wrote before , this drive is unacceptable for RAID5 and 24x7x365 use to begin with. It is not server class and does not have non-volatile settings to control and define error recover limits.  In fact, this crappy drive DOES have a way to program them, but the settings are volatile.  

As for it going asleep. NOT possible.  The error log proves it. 2 retries on the same disk at same sector a few seconds apart.  IF drive went asleep, then errors would come in multiples of 1 (but error would be different anyway, it would be a media timeout)
Sometimes with crappy drives full rewrite triggers the relocation and it works a little longer.
Though I suggest running full drive diagnostics (smartctl -T full) to try to flip bad SMART status and go for warranty replacement. Probably not possible through RAID controller.
A bad block won't trip SMART.
Relocation failure (like out of spare sectors) certainly will...
True, but the RAID controller would have killed the drive from the array LONG before it got to that point. It would kill it when there were ONLY maybe 20,000 spare sectors available.
ASKER CERTIFIED SOLUTION
Avatar of gheist
gheist
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
So I finally was able to get it to semi work (meaning not to show the error again).  I was able to run a few tools on the raid one for redundancy the other for synchronization....to make a long story short the tool was able to find some BS, "fixed" them but I will be replacing the drive fairly soon.
The "fixed" usually means re-writing a sector with zeroes (or with RAID checksum) to inhibit relocation.