We seem to be victims of a very serious firmware issue on WD1500BLFS (and similar) drives, but cannot seem to find a solution for it / specific to Linux (the problem and details follow). Contacting WD has been fruitless for me and many others... My goal is update the many, many drives that I have with new drive firmware from WD using their tool, but have yet to find either that I can use on a linux system.
My question is: Has anyone else at E/E seen this issue and do you know of a fix?
Here's the problem:
Ticking time bomb error in Western Digitalâ€™s 2.5â€ 10K RPM SATA VelociRaptor Product
In our testing of Western Digitalâ€™s latest 10K RPM SATA product, the 2.5 inch VelociRaptor, we have discovered a very serious issue that everyone needs to be aware of. We found that after running almost 50 days continuously the drive will throw Time Limited Error Recovery (TLER) errors. These errors will cause a RAID volume to fail and possibility causing a loss of data. We have been able to confirm this issue with Western Digitalâ€™s Support. This issue occurs when TLER is enabled (normally enabled on all Western Digital RAID Edition Products and also is enabled on the VelociRaptor [7 seconds for Read and Write commands]). See below for more information about TLER.
Details of the failure
When the continuous power on hours hits 49.7 days, an internal firmware time keeper, in the drivesâ€™ firmware wraps. When this time keeper wraps, any active Read, Write, and Flush commands will prematurely TLER timeout. Because this time keeper wraps on all of the VelociRaptorâ€™s, installed in a system together, at the same time (all powered on at the same time), any RAID volumes on these drives will fail. In our testing, our system RAID volume failed and we were not able to recovery because all of the VelociRaptor failed together. Data is not lost but the RAID controller will think that all of the drives failed because of the incorrect TLER timeout. All RAID configurations, including RAID 5 and RAID 6, will fail because multiple drives will fail. If the system, with VelociRaptor drives, stays powered on, it will fail again every 49.7 days.
We have also been told by Western Digital that a short term work around is to power cycle any systems with the VelociRaptor drives every 30-45 days to avoid the internal firmware time keeper from wrapping at 50 days. A simple reset or restart doesnâ€™t work. The system must be completely power off to reset the internal firmware time keeper. Western Digital is also currently working on a fix and should have something soon to resolve the issue. We have been reassured by Western Digital that this issue only exists on the VelociRaptor and not on their other products, including their RAID Edition product.
Background information about TLER
TLER or Time Limited Error Recovery was created to help SATA and IDE drives work with RAID controllers when drive errors occur. Normally a SATA and IDE drives will try extensive error recovery procedures to try to recovery data when there is an error. Sometimes these procedures will take multiple seconds per sector of data (512 bytes). If the errors affect an area larger than a single sector, these recovery procedures many take 10, 20, 30 seconds or longer. Most RAID controllers will not give a drive that much time to recovery from an error. The RAID controller will typically drop or error a drive and cause the RAID volume to go into recovery using Parity information from another drive (RAID5 or RAID6) configurations or use mirror drive information (RAID1) to recover. Windows in a non-RAID configuration will also error if the drive takes too much time trying to recover. This could cause a Blue Screen of Death or other issues. TLER helps manage error recovery times to allow the host or RAID controller to find alternate ways of dealing with a drive errors. TLER has helped SATA and IDE drives compete with SCSI and other enterprise class drives in enterprise applications. Western Digital recommends using TLER for any RAID configuration and it can also be used on normal desktop applications also. A utility is also available from Western Digital to enable/disable or change the TLER settings.