Has anyone experienced issues with WD Velociraptor Drives & Firmware issues when running Linux?

pparker900
pparker900 used Ask the Experts™
on
Folks,

We seem to be victims of a very serious firmware issue on WD1500BLFS (and similar) drives, but cannot seem to find a solution for it / specific to Linux (the problem and details follow).  Contacting WD has been fruitless for me and many others...  My goal is update the many, many drives that I have with new drive firmware from WD using their tool, but have yet to find either that I can use on a linux system.

My question is:  Has anyone else at E/E seen this issue and do you know of a fix?

Here's the problem:

Ticking time bomb error in Western Digital’s 2.5” 10K RPM SATA VelociRaptor Product
In our testing of Western Digital’s latest 10K RPM SATA product, the 2.5 inch VelociRaptor, we have discovered a very serious issue that everyone needs to be aware of. We found that after running almost 50 days continuously the drive will throw Time Limited Error Recovery (TLER) errors. These errors will cause a RAID volume to fail and possibility causing a loss of data. We have been able to confirm this issue with Western Digital’s Support. This issue occurs when TLER is enabled (normally enabled on all Western Digital RAID Edition Products and also is enabled on the VelociRaptor [7 seconds for Read and Write commands]). See below for more information about TLER.

Details of the failure

When the continuous power on hours hits 49.7 days, an internal firmware time keeper, in the drives’ firmware wraps. When this time keeper wraps, any active Read, Write, and Flush commands will prematurely TLER timeout. Because this time keeper wraps on all of the VelociRaptor’s, installed in a system together, at the same time (all powered on at the same time), any RAID volumes on these drives will fail. In our testing, our system RAID volume failed and we were not able to recovery because all of the VelociRaptor failed together. Data is not lost but the RAID controller will think that all of the drives failed because of the incorrect TLER timeout. All RAID configurations, including RAID 5 and RAID 6, will fail because multiple drives will fail. If the system, with VelociRaptor drives, stays powered on, it will fail again every 49.7 days.

We have also been told by Western Digital that a short term work around is to power cycle any systems with the VelociRaptor drives every 30-45 days to avoid the internal firmware time keeper from wrapping at 50 days. A simple reset or restart doesn’t work. The system must be completely power off to reset the internal firmware time keeper. Western Digital is also currently working on a fix and should have something soon to resolve the issue. We have been reassured by Western Digital that this issue only exists on the VelociRaptor and not on their other products, including their RAID Edition product.

Background information about TLER

TLER or Time Limited Error Recovery was created to help SATA and IDE drives work with RAID controllers when drive errors occur. Normally a SATA and IDE drives will try extensive error recovery procedures to try to recovery data when there is an error. Sometimes these procedures will take multiple seconds per sector of data (512 bytes). If the errors affect an area larger than a single sector, these recovery procedures many take 10, 20, 30 seconds or longer. Most RAID controllers will not give a drive that much time to recovery from an error. The RAID controller will typically drop or error a drive and cause the RAID volume to go into recovery using Parity information from another drive (RAID5 or RAID6) configurations or use mirror drive information (RAID1) to recover. Windows in a non-RAID configuration will also error if the drive takes too much time trying to recover. This could cause a Blue Screen of Death or other issues. TLER helps manage error recovery times to allow the host or RAID controller to find alternate ways of dealing with a drive errors. TLER has helped SATA and IDE drives compete with SCSI and other enterprise class drives in enterprise applications. Western Digital recommends using TLER for any RAID configuration and it can also be used on normal desktop applications also. A utility is also available from Western Digital to enable/disable or change the TLER settings.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
President
Top Expert 2010
Commented:
Well, any volume OEM is certainly aware of this, but non-disclosure issues prevent them from commenting on it publicly.

They had the problem 3 years or so ago  with RE2 family (well,, related issues have bit seagate as well a few years ago).   Anyway solution was to disable TLER, and they supplied a program to OEMs that turned it off.  The program should work with your particular model, but USE AT YOUR OWN RISK.

(To use it, you have to extract and run from a bootable MSDOS floppy, enter TLER-OFF)

Now I am not allowed to give you the program due to intellectual property constraints.  So just google "WDTLER.ZIP", and you will find places where you can download.

Note that this should fix problem, because , in theory if TLER is disabled, there won't be any errors to cause the problem. But I have not tried it

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial