Link to home
Get AccessLog in
Avatar of purejamie
purejamieFlag for United Kingdom of Great Britain and Northern Ireland

asked on

OpenFiler lost drivers, important work missing!

Hello all,

I have been using OpenFiler fine for the last couple of years, I have 4 1TB drives in a software RAID 5 setup as an ISCSI target which I connect to from a Windows Server 2008 box.

This morning when logging in, the drive was no accessable from windows explorer. I removed the target from the ISCSI initiater and readded. The drive then appears in disk management, when i try to online it windows gives me an error message asking if I want to format the disk as it is blank. The drive file system appears as "RAW".

Openfiler is still up and reports no problems with any of the disks, all the SMART stuff checks out as well... Is there anything I can do to reverse this situation. I am pretty desperate as it holds a lot of work from the last few weeks which has not been backed up, I would be extremely grateful for any help anyone could offer.

Thanks,
Jamie
Avatar of David
David
Flag of United States of America image

You removed and re-added?  EXACTLY how did you do this.  Step-by-step.  You may have destroyed the partitioning and/or filesystem headers.  This is recoverable, mostly, but it will be a long painful process that will also require the use of 3 TB worth of scratch disks, and most likely more than a week of your time.
Avatar of purejamie

ASKER

Hi there, thanks for the response.

I may not have chosen my words well in the OP, when I couldn't see the disks any longer :

* Went into ISCI initiator in windows
* On the target tab, disconnected from the target by selecting the target and pressing the disconnect button
* On the discovery tab, removed the target portal  by selecting the target address and selecting remove
* Then did a quick connect on the IP address of the target again

Hope this helps,
Jamie.

I've used other SAN/RAID devices, but not iSCSI and not an OpenFiler, so I'm not exactly sure what you mean by the Initiator and target.  I'm assuming you didn't delete the config of the drives on the RAID interface, but you just stopped the volume from displaying itself to Windows.

It's a shame when you have all the fancy RAID and such, which can protect from a physical disk failure, but not from a data error like this.  Sounds like the file tables just got a small corruption somewhere.
Spinrite http://www.grc.com/spinrite.htm is a well-regarded HDD repair/maintenance utility which you can burn to CD or USB stick.  You'd need to shutdown the OpenFiler and pull out all 4 drives.  Run Spinrite on all of them (via attaching them to other computers directly), then put them all back in and turn it back on.  I'd suggest running Spinrite on level 2 to start with.
I know the website is a little cheezy, but the guy is a genius and has been an expert in HDDs, security and programming for decades.  He has a great security podcast too, on the popular TWiT network.
Running spinrite is and always will be a major mistake for a software-based or hardware-based RAID controller.  Doing so almost guarantees data destruction.   The reason, is that when it recovers a block of data, then no feedback is possible to the controller.   Controllers maintain tables of known bad blocks and use the XOR parity to extrapolate the data.   It has no idea whether or not the controller was doing something special?   What if block #X on 2 disks are bad, but one or both are recovered?  Think about the data corruption scenarios.

ASKER CERTIFIED SOLUTION
Avatar of Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Andrew Hancock (VMware vExpert PRO / EE Fellow/British Beekeeper)
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This content is only available to members.
To access this content, you must be a member of Experts Exchange.
Get Access
hanccocka,

The old ones are the best! A chkdsk actually fixed the problem, found NTFS straight out then fixed a load of index entries, works a treat!

Thanks a million, much appreciated.

Jamie.
Yep, people forget about chkdsk!

Make sure you get a backup of the lun, and try and establish why it went offline, lost, and got corrupted.

All the best

oh yes - backup kicked off, I'll take a hunt through the logs tomorrow.

Thanks again.