Swapping Controller Boards on Failed Disk Drives

Dr. KlahnPrincipal Software Engineer
CERTIFIED EXPERT
With his trusty 026 keypunch and ASR33 TTY, Dr. Klahn is building a PC of extraordinary magnitude!
Published:
Updated:
We look at whether swapping a controller board on a failed hard drive is likely to solve the problem.

Several times a month we see a question along the lines of ...


"My hard drive failed.  If I swap the bad controller board for a new one, will the drive work again?"


Let's look at the subject; Failed Drives vs. Failed Controller Boards


The presumption is that because the controller board is complicated and the head-disk assembly (HDA) is relatively simple, the problem must be in the controller.  Admittedly, much of the time this is true.  But HDAs do fail from time to time - tiny wires break, ceramic heads can chip on landing, hot air causes heads to microscopically spall.  When the HDA is at fault, the only way to get data off that disk is to send it out for professional disk recovery.


How do we know when the HDA is at fault?  

That's a problem with modern drives.  The diagnostic used to be simple:  If the controller board was swapped and the drive still didn't work, then it was an HDA issue and not the controller's fault.  Unfortunately that's no longer true, and the reason why it's no longer true is what destroyed controller "swappability".


Some Background

Up through about 2008, any particular model of hard drive had uniform, identical controller boards.  It was a "one size fits all" situation where the manufacturer would slap the next controller PCB onto the next HDA on the production line, and it would work due to generous design that didn't come near the limits of the technology.  In those days, indeed, swapping a controller was diagnostic and might also solve the problem.


But drives started pushing the terabyte limit and it was becoming more difficult and much more difficult to fit more data on the platters.  Magnetic media had nearly reached its limits, the platters were still the same size, and only so many platters could fit into an HDA while still leaving room for heads and head actuators.  Further, the industry was already using variable bit zoning so there was no "legitimate" way to get more data on the platters with the technology that existed.


So what happened?

Then an engineer came up with -- depending on how you view data reliability -- a brilliant idea, or an amazingly dumb one.  That idea is called shingled recording.


Previously, for very good reasons, data tracks were kept well apart on the platters so that there would be no interference from adjacent magnetic domains when reading, or corruption of adjacent tracks when writing.  Shingled recording throws that safety net overboard, and says "Write it as close as you can possibly get away with even though it'll corrupt adjacent tracks, and let the error correction deal with the corruption on read-back."


The Crux!  The Unique HDA...

That digression was in fact necessary - here's why.  To make shingled recording work, each individual controller board must be precisely tuned to each individual HDA.  Head currents, step rates, stabilization time, variations between heads on read and write, individual platter magnetic permeability and any number of other parameters must be tuned in the controller for this one specific, individual, unique HDA in order for it to meet the specifications.  There's no longer any safety net of generous specs.


So now each specific, individual, unique HDA has a specific, unique, individual controller board tuned to make that HDA work properly.  The Industrial Revolution, which made mass production possible, has been undone -- what should be interchangeable parts, now aren't.


A straight-across swap won't work, part II

This means that swapping a controller from working drive A onto failed drive B won't make drive B work, even if it's not the HDA that's the problem.  That's not intolerable in itself, but putting a controller tuned for a different drive onto a failed drive could make the situation worse because when Windows mounts a drive, it writes to it ... just a little ... the so-called "dirty bit."  If that write were to use bad HDA parameters, it could blow out the drive's partition table or worse.


To prevent that from happening, drive manufacturers have a fail-safe mechanism.  At power-up, the controller checks the contents of its on-board drive parameter chip against identification data written in a private section of the drive.  If the controller can't read that identification data or it doesn't match the settings in the controller, the controller won't operate with that drive.  


So just swapping controllers straight across on a modern drive won't work.


You want to try it anyway?

There are any number of vendors on fleabay who will gladly sell you a controller board for whatever drive you may happen to have.  That's because there's still a very narrow window where a controller swap could work.  But it's not simple, and the honest sellers mention three important things:


  • Hard drive failures are NOT always caused by PCB failure. There's no guarantee a failed drive will be repaired by replacing the controller.

  • Swapping the controller, by itself, will NOT solve the problem.  The parameter chip from the old controller must be moved onto the new controller.

  • Moving the parameter chip requires specialized, expensive surface-mount de- / re-soldering tools you probably don't have and no local repair shop is likely to have.


However, the sellers don't mention something equally important.  Each controller board is as unique as each HDA due to the individual electronic components used to build it.  The correct tuning for HDA 105268 using controller 26302 could be incorrect tuning for the same HDA using supposedly identical controller 26303 - even though both controllers were manufactured in the same plant, using the same components, within a few seconds of each other.



So.  We have now seen enough of the situation that we can say:


  • If the drive failed due to HDA issues, swapping the controller won't solve the problem.

  • There's no way to know if the HDA is the problem without sending it out for professional diagnosis.

  • If the failed drive was built before 2008, swapping the controller board might solve the problem.

  • If the failed drive was built after 2008, the controller is probably tuned to the HDA.  If it was built after 2012, it definitely is tuned to the HDA.  Swapping the controller won't solve the problem unless the parameter chip is moved to the replacement controller.  This means sending both old controller and replacement controller out for professional rework, at additional cost.


An identical controller usually sells for around $30, plus shipping.  Shipping both the original and replacement controllers out for parameter chip replacement will run $30, possibly more, plus shipping both ways.  That makes the total around $100, which essentially allows you to roll the dice and see if your drive comes up sevens or snake-eyes.


For most people, unless a drive contains irreplaceable data that was not backed up elsewhere, it makes more sense to buy a new drive with that $100.



A parting note:

If a drive comes up sevens and data can be retrieved, then retrieve the data and recycle the drive.  It now has a controller that can read the drive but the parameters are suited to the original controller, not this one.  Correct writes are not guaranteed.  Never use it again for anything, not even "unimportant" data and certainly not for backups!


5
1,315 Views
Dr. KlahnPrincipal Software Engineer
CERTIFIED EXPERT
With his trusty 026 keypunch and ASR33 TTY, Dr. Klahn is building a PC of extraordinary magnitude!

Comments (0)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.