Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

RAID 1 Failed

Posted on 2010-09-02
9
Medium Priority
?
1,272 Views
Last Modified: 2016-12-08
First, a warning, I am a non-expert working for a smaller company with no real IT staff.  

We have two servers running running Red Hat Linux ES3 using RAID 1 on an Adaptec 2120S controller.  The 2nd server is actually set up to mirror parts of the first using RSync.  

One morning, the 2nd server became unresponsive, and was listing the following errors on the console:

-------------------------------------------------------------------------------------------------------
SCSI disk error: host 2 channel 0 id 0 lun 0 return code = 8000002
Info fld=0x0, Current sd08:02: sense key hardware error
Additional sense indicates internal target failures
    I/O error: dev08:02, sector 39040
-------------------------------------------------------------------------------------------------------

This repeated with many different sector numbers.  I could not copy the text, so this is a hand-written example.  After continuing these errors for a while, I saw the following messages, also repeated multiple times:

-------------------------------------------------------------------------------------------------------

EXT3-fs error (device sd(8,2)) in ext3_orphan_add: IOfailure
...
EXT3-fs error (device sd(8,2)) in ext3_reserve_inode_write: IOfailure
...
EXT3-fs error (device sd(8,2)) in ext3_get_inode_bc: unable to read inode block
-------------------------------------------------------------------------------------------------------

I eventually powered down the machine and re-started.  After the controller POST, I was presented with the message:
Following Arrays have Missing or Rebuilding or Degraded Members and are Degraded
Array#1-RAID-1    136.7GB    FAILED

I went into the Array Management Utility, and tried the "Verify Disk Media" option on both disks, which did result in remapping a few bad sectors, but I still got the same failed array message after reboot.  Since both disks were visible, and could be verified, I didn't assume that I needed a replacement.  Does that assumption seem correct?

Eventually I tried just pressing return to "accept the current configuration."  However, I then got an error "no boot filename received," and was asked for the system disk.  I tried using the Red Hat disk in rescue mode, but that only resulted in "no linux partitions found."

The data on this server is not critical, as this is the mirror, but it seems to me that I should be able to recover it.  I have zero training for this kind of work, unfortunately, and researching this topic has led to a variety of information that doesn't help me very much.  I may have a workable image of the server to restore from, but it is years old, and not my first option.  Does anyone have any suggestions?  

My first goal is to identify what went wrong, if possible, as the disks and controller appear functional to me.  If I can get the server back up without having to resort to re-installing/restoring, that would be great.
0
Comment
Question by:horvack
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
9 Comments
 
LVL 5

Expert Comment

by:zzx999
ID: 33590818
From your hand copy of error messages not clear, but:

Might be your  RAID  was runing in degraded mode from earlier HDD failure, and might be you mised it. And when second disk went off, RAID failed.  

You might try to remove one disk and try to boot, if it fails try other disk. If you succeed booting in one of them , then you may add new disk to array and try to rebuild array.
0
 

Author Comment

by:horvack
ID: 33591269

Sorry about the error messages, the screen was scrolling quickly, and non-responsive, so I tried to copy them accurately, but may have failed.  Given that I haven't been able to boot since, I don't suppose there's any way I could access the system log.

I did try removing one disk, rebooting, and then again with the other disk.  Neither booted successfully, and I got the same "no boot filename received" message.

I have since tried using cntrl+r to "enable/recover" the array in the RAID array management utiltity.  It lists the array as rebuilding 1% currently, and is going quite slowly.  This may be an all-night process, so I'll update the status when it is finished.
0
 
LVL 5

Accepted Solution

by:
zzx999 earned 1500 total points
ID: 33591508
From error log it would be useful to know how many disks failed. Important info is those lines:
>host 2 channel 0 id 0 lun 0.... just this disk or other too.
But now you can wait for rebuild result. If it succeds and lets you boot (then first do backup!!!) - add additional disk , and select it as spare. Then degrade raid (remove failed disk). It will rebuild again.
But if there was failure in both disks it may not rebuild.

In critical situation (witch is not your case as you can just install or copy other new server ) you'd had to  try reconstruct raid from both disks by reading full disks info and combining it. But thats expensive work. And it requires special knowledge,software,expirience and time (or just money :)  http://www.ontrackdatarecovery.com/ ).
0
NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

 

Author Comment

by:horvack
ID: 33596640
OK, still in rebuild/verify, and only at 16%!  Wow, I guess this is going to be an all-weekend process...

Where would I look to find the error log that you are referring to?  If you mean the messages printed during POST, then both disk are listed, and they are both listed in the array management utility as well.  Or did you mean IF the rebuild works and I can boot from the array again?

Thanks for the suggestions, your point about doing an immediate back-up is well taken.   :)
0
 
LVL 5

Expert Comment

by:zzx999
ID: 33596685
well, both variants. But if you saw that both disks were listed having errors, then I doubt reconstruction will be succesful...
0
 

Author Comment

by:horvack
ID: 33597311
No, I didn't see any errors, just that both disks were recognized during POST, and can be accessed in the array management utility.  That's why I thought both disks were still good.
0
 
LVL 5

Expert Comment

by:zzx999
ID: 33597547
POST recognition does not mean that disks are good. They may have many  unreadable sectors. that can be detected only by disk check.
0
 

Author Comment

by:horvack
ID: 33618490
Well, after a long weekend of rebuilding/verifying, the server is back up.  Now to figure out what the problem was (and perform backup!)...

Thanks for your help.
0
 

Author Closing Comment

by:horvack
ID: 33618517
Didn't actually come up with the process that fixed the problem,  but was helpful in determining options with limited information.
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

this article is a guided solution for most of the common server issues in server hardware tasks we are facing in our routine job works. the topics in the following article covered are, 1) dell hardware raidlevel (Perc) 2) adding HDD 3) how t…
Learn about cloud computing and its benefits for small business owners.
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
Suggested Courses

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question