We help IT Professionals succeed at work.
Get Started

Disk array failure

Hallonstedt
Hallonstedt asked
on
803 Views
Last Modified: 2012-08-26
I seem to have lost my file system and I miss it!

A while back mdadm e-mailed me that a disk had failed in my raid-5 device. Seconds later another email arrived that another disk had failed. With two out of six disks malfunctioning, obviously the array failed.

I ran some diagnostics on the hard drives and looked in logfiles and the only conclusion I could come up with was that the disk controller temporarilly had failed somehow (the two failed disk was on a separate controller).

I decided to reboot the system and see if it would all fix itself (I have alot of confidence in mdadm and XFS!). Unfortunately, it did not assemble. I then decided to manually start the array by issuing;
mdadm --create /dev/md1 --level=5 --chunk=128 --raid-devices=6 /dev/sd[abcd]3 /dev/sd[ef]1.

Open in new window


Device started. State: Clean, Degraded. The last drive (one of the failed) was listed as Failed, Spare. I removed it and re-added it. Array synced and told me it was clean.

When I mounted the system I was informed that there was no file system on the array. Syslog informed me "XFS: bad magic number"
xfs_check: /dev/md1 is not a valid XFS filesystem (unexpected SB magic number 0x494e81a4)
xfs_repair fails to find a superblock and keeps searching for a secondary with no apparent luck.

Not even xfs_irepair works and tells me; xfs_db: /dev/md1 is not a valid XFS filesystem (unexpected SB magic number 0x494e81a4).

I realise that most of you still reading this are about to tell me to give up and restore the backup and I would agree if it wasn't for the fact that I have no backup. Lots of valued media (photos etc) are gone and I want to make sure there is no other option before I permanently ruin the data still left on the disks.

The only hope I have left is an XFS header on block 0 on /dev/sdb3. It looks like this;
# dd if=/dev/sdb3 bs=512 count=1 2> /dev/null | hexdump -C
00000000  58 46 53 42 00 00 10 00  00 00 00 00 35 0d 19 e0  |XFSB........5...|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  9d 5c cb dd 05 70 43 ca  9b 64 da eb 53 ef fd 58  |.\...pC..d..S..X|
00000030  00 00 00 00 10 00 00 07  00 00 00 00 00 00 02 00  |................|
00000040  00 00 00 00 00 00 02 01  00 00 00 00 00 00 02 02  |................|
00000050  00 00 00 60 00 fe a5 60  00 00 00 36 00 00 00 00  |...`...`...6....|
00000060  00 00 80 00 bd b4 10 00  01 00 00 10 72 61 69 64  |............raid|
00000070  00 00 00 00 00 00 00 00  0c 0c 08 04 18 00 00 05  |................|
00000080  00 00 00 00 00 04 d0 00  00 00 00 00 00 00 0a ab  |................|
00000090  00 00 00 00 0c 37 22 92  00 00 00 00 00 00 00 00  |.....7".........|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000b0  00 00 00 00 00 00 00 02  00 00 00 20 00 00 00 60  |........... ...`|
000000c0  00 0c 10 00 00 00 10 00  00 00 00 08 00 00 00 08  |................|
000000d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Open in new window


I realised that mdadm had assembled the disks in a different order and sdb3 was now disk number 1 rather then 0. I stopped the device and issued another command;  
mdadm --create /dev/md1 --assume-clean --level=5 --chunk=128 --raid-devices=6 /dev/sdb3 /dev/sda3 /dev/sdd3 /dev/sdc3 /dev/sdf1 /dev/sde1

Open in new window


Now the array looks like this;
freeport:~# mdadm -D /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Fri Aug 10 22:03:45 2012
     Raid Level : raid5
     Array Size : 3560198400 (3395.27 GiB 3645.64 GB)
  Used Dev Size : 712039680 (679.05 GiB 729.13 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent

    Update Time : Fri Aug 10 22:03:45 2012
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

           Name : freeport:1  (local to host freeport)
           UUID : cdbb881a:52ab1c71:72a71ce4:d8e4f1dc
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8       19        0      active sync   /dev/sdb3
       1       8        3        1      active sync   /dev/sda3
       2       8       51        2      active sync   /dev/sdd3
       3       8       35        3      active sync   /dev/sdc3
       4       8       81        4      active sync   /dev/sdf1
       5       8       65        5      active sync   /dev/sde1

Open in new window


which is the same as before the incident.

Unfortunately, no file system is detected on the device despite the modification. Block 0 on the md device shows;
00000000  49 4e 81 a4 02 02 00 00  00 00 03 e8 00 00 03 e8  |IN..............|
00000010  00 00 00 01 00 00 00 00  00 00 00 00 00 00 00 04  |................|
00000020  4b f5 19 64 01 30 31 a7  4b 11 70 f3 00 00 00 00  |K..d.01.K.p.....|
00000030  4c 0c d0 1c 05 b9 81 7f  00 00 00 00 00 00 0e c4  |L...............|
00000040  00 00 00 00 00 00 00 01  00 00 00 00 00 00 00 01  |................|
00000050  00 00 00 02 00 00 00 00  00 00 00 00 f5 97 70 f0  |..............p.|
00000060  ff ff ff ff 00 00 00 00  00 00 00 00 00 04 00 00  |................|
00000070  cb 80 00 01 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000100  49 4e 81 a4 02 02 00 00  00 00 03 e8 00 00 03 e8  |IN..............|
00000110  00 00 00 01 00 00 00 00  00 00 00 00 00 00 00 04  |................|
00000120  4b f5 19 63 35 e0 f7 c0  4b 11 70 f3 00 00 00 00  |K..c5...K.p.....|
00000130  4c 0c d0 1c 05 b9 81 7f  00 00 00 00 00 00 8f 13  |L...............|
00000140  00 00 00 00 00 00 00 09  00 00 00 00 00 00 00 01  |................|
00000150  00 00 00 02 00 00 00 00  00 00 00 00 f5 97 70 f0  |..............p.|
00000160  ff ff ff ff 00 00 00 00  00 00 00 00 00 04 20 88  |.............. .|
00000170  3a 40 00 09 00 00 00 00  00 00 00 00 00 00 00 00  |:@..............|
00000180  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Open in new window


while /dev/sdb3 still has the xfs header.

If i do a simple dd on the md device and pipe through strings I see text that belongs to documents I used to own but I am not sure that actually means anything.

This is where I ran out of ideas. I have most likely done something wrong along the line. All I ask is that the collective wistom that is this forum help me with either additional tests or ideas, or inform me it is gone. Cut your losses and move on!

Many thanks in advance,

  Mats
Comment
Watch Question
President
CERTIFIED EXPERT
Top Expert 2010
Commented:
This problem has been solved!
Unlock 1 Answer and 11 Comments.
See Answer
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE