Openfiler - Can not initialize disk

My environment consists of:

1 Openfiler v2.3 server (clean install - no upgrades performed to bring up-to-date)
1 Dell T105 server w/ free ESXi 4 installed
Windows 7 Ultimate virtual machine
1 iSCSI Target for Windows backup (200GB)
1 iSCSI Target for file storage (1.5TB - external WD My Book) (*Yes, I know this is large.  It is used for media storage.)


I have two separate iSCSI Target IQNS for two different initiators.  I tried to connect Windows 7 virtual machine to the Openfiler target.  Initially, it connected fine and I was able to initialize the disk (via disk management) and assign a drive letter to begin its use.  All was fine.

One day the drive hosting the media (1.5TB) was rebooted by the surge protector it was plugged into.  Now the Windows 7 VM will not show the drive letter for the iSCSI target.  The drive (in Disk Management) reads "unallocated".  When I attempt to initialize the disk (the only option available to me), I get "Data Error - Cyclic Redundancy Check" message and it doesn't allow me to initialize it.  The disk is still showing actual size/space remaining, so I believe the data is not lost.  Is there a way to get this disk to be recognized?  And what is causing this issue?  This is the second time it has happened...  Luckily, I have a backup, but I stii need would like to prevent from happening in the future.  Thanks for everyone's help!  
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Looks like a catastrophic drive failure.   Showing the drive space remaining does not involve performing full media checks.  Go to the WD site and download/run diagnostics.  Be prepared for the worst however.
diallo0024Author Commented:
Thanks for the reply, dlethe!  After doing some more digging late last night, I discovered a couple of things:

1.  The actual version of openfiler is version 2.3 (with all updates and patches as of 3/1/2010).
2.  The drive still works fine when plugged into a Windows 7 machine (physical, not virtual machine)

I'm not exactly sure, but I think the drive is still good (relatively new...purchased this year 2010).  I may have to rebuild openfiler (which is what I did to resolve the issue last time).  I was just trying to avoid this step.  

Whereas, I believe the problem to reside in openfiler, I'm open to suggestions on resolving/preventing this issue from reoccurring.  

Thanks again for your reply!  Any other assistance is greatly appreciated.
Davis McCarnOwnerCommented:
It's not supposed to happen; but, about 4 times each year, I see a drive that rudely shut down in the midst of a sector write.  When this happens, the checksum (CRC) for that sector is wrong resulting in a CRC error.
Thee first thing to do is to confirm that as the problem.  Go get the free version of , install it, run it, and select the WD.  Choose the Health tab and inspect the Reallocated Sector value.  If it is anything but zero, there may be more serious things going on.  Next, run the error scan and expect it to take several hours.  You should get at least one red box.  If you get more than two or three, there is something more serious going on; but, if it is a very low number, post back with the result and we'll roll up our sleeves and fix it.
Newly released Acronis True Image 2019

In announcing the release of the 15th Anniversary Edition of Acronis True Image 2019, the company revealed that its artificial intelligence-based anti-ransomware technology – stopped more than 200,000 ransomware attacks on 150,000 customers last year.

diallo0024Author Commented:
Thanks for the suggestions, DavisMcCarn.  I will try what you've recommended and post the results when the run is complete.

By the way...   Is this problem that you see (about 4 times each year) related to the drive, openfiler, or windows?
I disagree with Davis' assumption that reallocated sector count is indicator of "more serious things going on".  The parameter indicates that a disk drive had a bad block, and that the block has been reallocated.  This is a normal thing for a disk drive to do.  Modern high-density disk drives have tens of thousands of spare sectors, and are designed to do this.

RAID 1/5/6 controllers reallocate sectors on regular basis, and this is a design point.  When you use windows to format/initialize a disk drive, it checks for bad sectors and tells disk to reallocate unreadable ones.  This is all a natural thing to do.

Now you should keep an eye on this value, and if it jumps from 4 to 400 in a month, then try to get a warranty replacement, but if you have a dozen or so then it is nothing of concern, as it represents less than 0.0000000001%  of your data.

A probable reason why DavisMcCarn sees disks shut down in midst of a sector write is that he is using consumer-class disks rather than enterprise drives.  One of the reasons why enterprise disks cost 3x more money then consumer class is that their error recovery algorithms, ECC hardware/reserved block topology is tweaked to insure 5-10X faster recovery of bad blocks.   If a disk can't remap fast enough then some rAID controllers will think the disk is unreliable and shut it down.

Davis McCarnOwnerCommented:
These days, there ought to be enough stored power in the drive to finish the write operation prior to shutting down.  That is an engineering problem that could only be fixed by the drive manufacturers.
Having the power rudely go out; though, is, at least, forgivable......
Half of those I see with this problem occur when Windoze spontaneously reboots or freezes and those are unforgivable.  The drive should ignore any and all commands while it is writing so it can finish the block/sector.
Davis McCarnOwnerCommented:
What I have done for 34 years now is fix what was brought to me and data recovery started in the late 70's on cassette tapes or floppies.  Too many of the drives I see had started screaming about SMART failures weeks before they finally fell off the cliff.
It is just not practical to do this, Davis.  Neither battery circuitry nor using an electrolytic capacitor can provide enough power to do this without busting out of the physical footprint of the disk drive.  (Well, a specially designed battery that is flat and covers the top of the HDD may be possible, but then you would have to change the batteries every few years, and pay a big premium.    That is why people buy a UPS.

But you are still missing the point. The drive is being kicked off because in eye's of the controller, the disk is unreliable. The pending write is the one that timed out due to unacceptably long sector rewrite.  If you have SCSI/SAS/FC disks, then this is tunable via mode page editor.  SATA/ATA disks generally don't provide such configurability.  That is why RAID manufacturers spend big bucks certifying disks.  This is also why the tier 1/2 NAS/SAN appliance vendors often have special firmware built for disk drives. They hardcode such settings.
(FYI - worked for 20+ years for RAID manufacturers, designing firmware/configurators, etc  ... so I speak with experience here).
As for SMART errors, we are not discussing SMART, that is something different. Reallocated sector counts are part of the vendor/product specific algorithm that can trigger a S.M.A.R.T. alert. In as of itself, sector errors will not ALWAY trigger this.   I have NDAs with Seagate, had NDA with WD, so believe me, I know the internals of some algorithms, and I can not get specific due to NDA.  

But I certainly agree with you, if you get a S.M.A.R.T. error, then the prudent thing to do is replace the drive.  But if the reallocated sector count increases, this will not necessarily trigger the S.M.A.R.T. alert.  
diallo0024Author Commented: I ran the HD tune software.  Interesting, I didn't get any data to appear on the Health tab for any of my external drives.  And I did a quick scan for red blocks.  The "slow" scan is still running.  Looking at the conversations taking place, should I be approaching this differently?
Davis McCarnOwnerCommented:
No; the first order of business is still precluding a hardware error, and yes, HDTune often has issues with retrieving the SMART data on some drive subsystems.
Let it finish the long error scan and, if it comes up clean, we'll want to lokk at some partition recovery/repair utilities.  Luckilly, NTFS keeps a duplicate table at the end of the drive so that path is often pretty painless.
Unless you are at least using HDTune Pro, then you won't get much in the area of diagnostics.  What you fail to realize is that S.M.A.R.T. tells you about whether the disk is in a degraded condition.  it will NOT pick up bad blocks that have not had a read attempt.

Reallocated sector counts are only applicable to reallocated sectors, not bad sectors which you could very well have.

Again, I have been writing diagnostics and have NDAs with drive manufacturers, there is a great deal of difference between using diagnostics and writing them, and being able to talk to seagate engineers and have discussions about such things one on one.  

Your diagnostic program needs to invoke the ATA_OP_READ_VERIFY or the ATA_OP_READ_VERIFY_EXT, op code 0x40 or 0x42, depending on the ATA level of compliance and block count.  From what I saw reading the specs of HDTune and HDTune pro, it doesn't use this opcode.  HDTune is a nice, cute package, but professionals don't use it. HDTunePro is better, but still very much a consumer product.
Davis McCarnOwnerCommented:
The usefulness of HDTune is that it will not write (in the free version) and will let you know if there are any bad blocks, even though they are sometimes CRC errors created by a partial write.
The point is, running any repair utility on a drive with a chunk of sequential error blocks (which usually results from a physical shock) or a large number of of errors is, 9 times out of 10, going to finish off any hope of recovering the data.
If, on the other hand, no read errors are found, we can be fairly confident that fixing the error which is causing his drive to state it is not formatted, most probably won't make matters worse.
Or, if only one or two errors are found, we can either force a rewrite of those sectors or see what normal repair utilities will do.
Yes, there are numerous other utilities for examining hard disk drives; but, they are a digression from our goal.
Seagate and WD drives, BTW, build a list of pending reallocations and don't actually perform the operation until that block is next written.
diallo0024Author Commented: I ran the HDtune (free version), as suggested.  While that was running, I began researching Openfiler 2.3 and any issues with iSCSI targets disconnecting.  To my surprise, there were others with the exact same issue I have.  Something about when Openfiler is rebooted, the iSCSI targets seem to "get lost".  There was a fix for this located here:

Although, I have implemented the suggested change, I have not rebooted the system to verify that it works.  I will do that sometime this evening.  

Thanks anyways for all your help, guys.  

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.