• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3962
  • Last Modified:

RAID 5 Container #1 Dead, any hope for Data recovery?

Dear Experts Exchange -
Our school has a Dell PowerEdge 2500 which has major problems after a weekend power outage. Students and Faculty need data from this server, can you help?

Dell PowerEdge 2500
5 disk drives, RAID 5
2 containers: #0 4GB with OS, Windows 2000 Server and #1 131GB named H:\ for file storage
Perc 3 - adaptec

Monday am I arrived to the server in an off state.
Turned on and received "no boot device available", with 4 of 5 disks status lights slowly going from green to amber.
Called Dell, container information was OK, found both containers.
After scrubbing both containers was able to install a parallel OS and access data on H drive.
Tried to setup a backup to tape drive.

Tuesday am
I arrived and server was in an on state with errors.
Three files were giving a corrupt or unreadable error message.
Since the backup didn't work, I started to copy data to other various locations on the network.
Received a message stating H:\ is not accessible the file or directory is corrupted and unreadable.
I thought I could let it continue to try to copy data off...

Wednesday am
I arrived and the server could not be seen on the domain and was unresponsive, black screen.
Hard power off, back on and message " Array controller monitor failed ".
Several reboots, 50% of time will find array controller, 50% not.
Whe it does find the array controller drives that it sees are varied, sometimes drives 0,2,3 show up, other times drives 0,1,4...
Status is container #0 critical, container #1 critical
Called Dell, reseated drives, accepted configuration changes for the array controller, Dell ordering/shipping parts to rebuild the scsi chain.
Decided to update Bios and flash ESM. Container #0 OK, container #1 critical.

Thursday am
Boot up, array controller monitor OK, container #0 OK, container #1 critical. Able to access OS, try to access H:\ and recieve message "H:\is not accessible the file or directory is corrupted and unreadable" Now, in the continer configuration container #1 status is DEAD.
Still awaiting a technician to replace parts ordered by Dell.

If hardware replacement doesn't help, is there any hope of recovering data?
I've called a few data recovery companies, any suggestions?

thank you very much.



0
kimzmn
Asked:
kimzmn
  • 8
  • 5
1 Solution
 
kimzmnAuthor Commented:
Thursday mid-day update
Dell requested I run DSET and send the file to them.
On boot, container #0 ok and container #1 scrubbing
Running Elite hard drive diagnostics, all 5 drives passed
Now drive H:\ does not even showup in Windows Explorer
Ctrl +A on bootup and in Manage Containers, container #0 status OK and all 5 drives show. container #1 status Dead and only 3 of 5 drives show as members.

Yikes...help please
0
 
tmenascoCommented:
Do you have another server of the same model in the data center?

If so, mark the order of these drives and put them in the other machine in the exact same orientation and see what happens. This will tell you whether it is the drives or the server.

I utilize this as a troubleshooting routine regularly to help diagnose, but have not tried it on a Dell, just HP and Compaq. HP and Compaq store the array configuration data on the drives themselves and it can be copied to a floppy for backup purposes.

Is the aray controller a PCI card or onboard? If it is PCI, try another slot.

Good Luck...
0
 
kimzmnAuthor Commented:
I do have another Dell PowerEdge 2500 across the street and I believe I am understanding your recommendation.
Yes, the array information is supposed to be on the header of each drive as well as on the key card.
I'll look into extracting the array configuration data to a floppy.
There is no other available PCI slot that I can see.

I will admit, I am really concerned about taking the drives over to another box and fear then having two servers unavailable.

Anyone else have further ideas?   Should I allow Check Disk to run?

Thank you

 - Kim
0
Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

 
tmenascoCommented:
Trying the disks in another server will not damage the other server, but could help in troubleshooting to determine if it is the disks or the server that is a fault.

You might also check and verify that the SCSI bus is correctly terminated or make sure a terminator has not been removed or loosened by accident.

If the array controller is a PCI card, at least remove it, clean the contacts and replace it. You could maybe swap it with another PCI card if all of the slots are full. I would much prefer to have a NIC not working if the slot is the issue.

Good Luck...
0
 
kimzmnAuthor Commented:
A Dell service technician arrived yesterday and replaced the motherboard, scsi cable, backplane, array key and memory.
Both containers were found, scrubbed, and Ctrl+A container mgmt show both containers status OK and all 5 drives available.
The containers show #0 as 4GB and #131GB.
Boot into windows, did not let check disk to run, and trying to access H:\ gives the following message "H:\ is not accessible. The file or directory is corrupted and unreadable".   On viewing the properties of H:\ it reads 0 space available and 0 space used.

Oh no....what's my next step? Sounds like I need to label the disks and bring the disks over to the other PowerEdge 2500....which of course is a production server and I'll have to wait until the weekend.
0
 
tmenascoCommented:
I would try to get my hands on a copy of Server Magic by PowerQuest.

Since all of the server side hardware is new, it could be an issue with the zero sector of the partition. I have used Server Magic and Partition Magic before to repair this kind of thing.

I just went to the PowerQuest.com site to find out that Symantec bought them last year and Server Magic for Windows is no longer listed. You might ask around and see if one of your IT buddies has a copy. It works wonders and I am not sure why it is no longer offered.

Are there any diagnostic utilities in the array controller that could possibly idetify and correct the error?

Good Luck...
0
 
kimzmnAuthor Commented:
No luck in finding a copy of Server Magic for Windows.
I ended up sending the drives to DriveSavers.
Thank you for your help
0
 
tmenascoCommented:
What is that going to cost you?

Is Dell footing the bill since it was their array controller that caused the problems?
0
 
kimzmnAuthor Commented:
Very Very Expensive.
Dell will not foot the bill.
0
 
kimzmnAuthor Commented:
You won't believe this!!!

DriveSavers is still working to recover data. I signed up for 2-3 business day service and it is now going on day 6!

Plus, the server has failed again!

I bought 5 brand new hard drives to begin rebuilding the server so it will be ready for the data.
Dell walked me through step by step:
initializing the drives
creating a container
waiting for them to scrub
installing openmanage server
installing windows 2000 OS
and as I was waiting on hold for 80 minutes to find out what my next steps are...
I reboot the server, and it does not come back up.
"A disk read error occurred, press Ctrl+Alt+Del to restart"
2 of the 5 drives are giving an amber status light
I rebooted and did a Ctrl+A
The container status is DEAD

ARGHHHHHH....

Dell will not ship me a new server.
On top of the 5 parts they replaced a week ago, and the new drives, now they insist on sending
9 more parts.
Another backplane, another raid-key and this time power supply parts.

Why can't they just send me a new box completely?
0
 
kimzmnAuthor Commented:
Dell replaced several other parts the next day; the power distribution, motherboard, backplane, cable assembly, raid key, scsi cable...
I believe it was the replacement of the power distribution and cables which were key in fixing the server. I believe this because all the other parts were replaced before and the server failed a second time.
With a rebuilt server I am now able to copy data back onto it.

DriveSavers did take longer than the 2-3 days I signed up for, however, I can't complain too loud because they sent back what we believe to be greater than 90% valid usable data.
0
 
kimzmnAuthor Commented:
Thank you tmenasco for your dialog on this issue. I accepted your answer as a thank you for your participation even though I chose not to use your solutions.
0
 
tmenascoCommented:
Thanks. Sorry I couldn't solve your problem. But it looks like since it was hardware, Dell was the only on who could.

You don't happen to have an old raised floor? If so, go to www.nwfusion.com and look up "zink whiskers", I think the DocFinder number is 4461.

Good luck...

Tom...
0

Featured Post

Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

  • 8
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now