I am working with a brand new client. The client has 2x DNS 343 enclosures. We will call them 127 and 49 based on their IP addresses. 127 was the main drive, and it was supposed to be mirroring to 49 on a regular basis. It wasn’t. 127 and 49 both had 4x 2TB Samsung HD204UI disks in a RAID 5 array.
Users began noticing data missing or corrupted. The drive holds approximately 3TB of data normally. A check indicated less than 2TB visible in Windows Explorer. This decreased over the next hour to 165GB when I first checked it on scene.
I shut down the DNS 343, let it rest and then started it back up. All data was visible and accessible. I checked the drives using the web administration Disk Diagnostic interface. All passed. I went home for the evening.
The next morning, data was again missing, etc. I came back over and observed the same behavior of a shrinking amount of data. Temperatures in the web interface read at 90-100 degrees. After reviewing the boards and consulting with a partner, we determined the following course of action (DLink tech support wanted us to update the firmware first, but I was leery to do so, because in my experience that can cause its own problems).
First, I suspected a bad/wonky drive. I powered down the machine, removed and labeled disk 1 and tried to boot. No data. I then powered off, removed and labeled disk 2 and tried again. No data – and so on until I removed drive 4. With drive 4 gone, drives 1-3 functioned well. All data was visible and the response seemed snappy. As a side note, SMART testing had not detected any issues with drive 4.
Users had been instructed to make a list of prioritized data they needed. We immediately began to offload data onto a backup. About 90 minutes later, the data started to act funny again. The owner noted the fans on the DNS 343 were not spinning. I powered off the machine. Then we powered it on again, saw the data, and took a last few emergency files off. I then backed up the settings.
At this point, I went and bought three WD 2TB hard drives and cloned them over using two dual bay external docking ports with a direct clone function. After that was done, I went to the other DNS 343 enclosure (49), where the last backup was ~1 yr old. I backed up the data, powered down, removed the drives, factory reset, updated the firmware on this known good machine. I then restarted and uploaded the settings from the other enclosure. I then powered down, inserted the numbered Samsung drives in the correct order and voila! Everything looked good.
We then began pulling data off and I went home again. Checking remotely two hours later, the data was wonky again and copy operations could not continue (unable to access file on network and invalid copy handle errors). I am now back over at the office. Of my three ‘good’ drives , two are reading failed in disk diagnostic (Note: HD Tune Pro quick scans show no bad sectors). As a side note, the fans on my known good enclosure were not spinning when I arrived.
I am currently assembling all the data I was able to get off three attempts into an aggregate. I believe I have 60% of mission critical data. I hate RAID 5. I am a RAID 1 guy all the way.
1. Each of the RAID 5 drives when placed individually into an enclosure and read with ext2read show a /dev/sdc4 partition with what looks like the array info? They do not show any other data. A RAID 1 disk in ext2 formatting shows /dev/sdc4 with similar files and a separate /dev/sdc2 section with all the data (this was just a test disk, since I am not familiar with ext2reader).
2. Should the fans not be spinning all the time?
3. Why did this go bad in the other enclosure which had been working just fine before factory reset and introduction of these drives?
4. Most importantly, what should I do now? Thoughts?
I am unfamiliar with RAID 5 in a NAS of this type. I am interested in step by step instructions for items as easy as how to rebuild the array if recommended (I had been pursuing backing up before rebuilding).
Thank you in advance for your expertise.