Solved

RAID 1 unexplained file system error Centos5.6

Posted on 2011-09-19
66
652 Views
Last Modified: 2012-05-12
I have 2 identical servers. Took the perfectly working pair of drives and put them into the other server and got the following error:
Checking file system /contains a file system with error check forced.
Reaches about 60% and then
Extended attribute block 19334147 has reference count 1024 and should be 992
Unexpected inconsistency
run fsk MANUALLY and drops me to Cntrol D shell!

Can the system time cause such an error. They are different!
If I put the drives back into the original server I get a similar error condition!

Any ideas as to how to repair above unexplained error?
0
Comment
Question by:shaunwingin
  • 31
  • 13
  • 9
  • +3
66 Comments
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36558553
Seems there's been a defective filesystem already on the originating box.

Why don't you run an fsck manually against the filesystem in question ( /  apparently).

The system time is most probably unrelated to this issue.

wmp
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36558566
... run an fsck against / by touching a file /forcefsck and rebooting afterwards,

or run shutdown -rF now

wmp
0
 

Author Comment

by:shaunwingin
ID: 36558576
I don't believe there was defective file system from originating box as it didn't show any errors and booted fine.
It did mention an error about file system time being in the future and corrected something....
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36558588
Try pressing the letter F when it says:

"run fsck manually ..."
0
 

Author Comment

by:shaunwingin
ID: 36558604
Ok. Will try,

A question of curiosity: If I were to swop the drives around accidentally and put them in slot 2 and 1 instead of 1 and 2 would this cause an error?
0
 

Author Comment

by:shaunwingin
ID: 36558647
F does nothing as it drops to prompt:
Give root password for maintenance.
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36558653
And? Did you give the root password?

0
 

Author Comment

by:shaunwingin
ID: 36558655
Btw another question of curiosity:
If I remove one of the RAID drives it gives this error:
error pdc: wrong # of devices in RAID set pdc_bbfcbhgdy 1/2 on dev/sda

Can the system still manage to boot with only one raid drive? Its RAID 1 so it should?
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36558668
It should boot, but you will have to get around the error:

Press a key early in the boot sequence to pull up the grub boot menu, add the keyword "nodmraid" to the kernel command line and see if it boots.
0
 

Author Comment

by:shaunwingin
ID: 36558685
I give root password. Is gives # prompt (Repair file system).

Should I do this from # prompt?
"... run an fsck against / by touching a file /forcefsck and rebooting afterwards,"
What are exact commands?

What does "nodmraid" do?
0
 

Author Comment

by:shaunwingin
ID: 36558691
Btw at root prompt I ran:
shutdown -rF now
It reboots but with errors that READ ONLY FILE SYSTEM
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36558695
Try

/sbin/fsck

nodmraid: disable software raid.

dmraid: discover and activate software raid.

0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36558710
Try

mount -o remount,rw / on "#(Repair ...)" prompt.

0
 

Author Comment

by:shaunwingin
ID: 36558873
"mount -o remount,rw / on "#(Repair ...)" prompt."

Gave errors: Jourbal has aborted....
remount read-only

With nodmraid still saw in boot sequence:
dmraid45....so not sure if it actioned it....?


0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36558915
This is a filesystem/journal mismatch and a bit hard to repair.

You could try this:

1) tune2fs -O ^has_journal /dev/hdxxx
with /dev/hdxxx being the underlying device of the FS in question.

2) e2fsck /dev/hdxxx

3) tune2fs -j /dev/hdxxx

4) mount /dev/hdxxx /

 
0
 

Author Comment

by:shaunwingin
ID: 36558971
say what is hdxxx

This is fdisk -l
  Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14       30175 242276265 83  Linux
/dev/sda3              30176   30272 779152 82 Linux swap/Solaris
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36558994
Rather issue "mount" and look for

/dev/sda[x] on /

then use this /dev/sda[x].

According to the fdisk output it should be /dev/sda1

wmp

0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36558999
I assume you'll have to boot from some rescue media first since it seems we're talking about the active boot partition.
0
 

Author Comment

by:shaunwingin
ID: 36559009
mount returns:
/dev/mapper/pdc_bbfcfbhgdgp2 on / type -ext3(rw) and then a whole lot of other things below and ends with warning /etc/mtab no readable....

any ideas?
0
 

Author Comment

by:shaunwingin
ID: 36559026
"I assume you'll have to boot from some rescue media first since it seems we're talking about the active boot partition."
Any ideas?
0
 

Author Comment

by:shaunwingin
ID: 36559048
This all seems very round about to repair something that was working perfectly and just broke when inserting into an identical server....?
Is there not something else that I can try?
Also because its a RAID partition - one needs to surely run the repair on the RAID partition....
Alternatively - if I removed one RAID drive and repaired it and then rebuilt the 2nd drive - would this make more sense?
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36559074
Your root filesystem seems to be mounted r/w and OK.

Which problematic filesystem are we actually talking about, please?

Seems that your original Q was misleading: " Checking file system / contains a file system..."

0
 

Author Comment

by:shaunwingin
ID: 36559143
Just to clarify there is a file system error and this is the message:
"Checking file system / contains a file system with error check forced."
However the cause was simply to move the drives from one identical server to another and back.
The error exists in both servers now.
It was working perfectly in the 1st server.
Something is causing the OS to think there is an issue....
I'm looking for an easy way to solve this...?
0
 

Author Comment

by:shaunwingin
ID: 36559185
Can you perhaps ask some of the other experts if they have any idea as to what is causing this behavior.
This is not the 1st disk pair that I've had the identical issues with...
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36559204
Rather delete this Q and ask a new one, perhaps posting some more detailed output with it.

I'll abstain from the new question, let's hear what the other experts say ...

Good luck!
0
 

Author Comment

by:shaunwingin
ID: 36559275
Tx. What output do U suggest?
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 36559309
Your new Q looks quite OK.

Don't forget to delete this one, to get your points back.

wmp
0
 

Author Comment

by:shaunwingin
ID: 36559762
I've requested that this question be deleted for the following reason:

No solution found as yet
0
 

Author Comment

by:shaunwingin
ID: 36559750
Please update the Question with this at the bottom of it:
Can the system time cause such an error. They are different!
If I put the drives back into the original server I get a similar error condition!

Just to clarify there is a file system error and this is the message:
"Checking file system / contains a file system with error check forced."
However the cause was simply to move the drives from one identical server to another and back.
The error exists in both servers now.
It was working perfectly in the 1st server.
Something is causing the OS to think there is an issue....
I'm looking for an easy way to solve this...?


SOME SYSTEM INFO:
This is fdisk -l
  Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14       30175 242276265 83  Linux
/dev/sda3              30176   30272 779152 82 Linux swap/Solaris

mount returns:
/dev/mapper/pdc_bbfcfbhgdgp2 on / type -ext3(rw) and then a whole lot of other things below and ends with warning /etc/mtab no readable....
0
 
LVL 87

Expert Comment

by:rindi
ID: 36559753
I wouldn't disable the Array when running the fsck on it, otherwise you would have to rebuild the array again if fsck repairs anything.

I'm not sure whether you have already done that now. Can you verify you haven't yet run fsck with the RAID disabled?
0
 

Author Comment

by:shaunwingin
ID: 36559763
Please don't delete Qeustion then...but ask other experts for comment pls and update question as above..
0
 

Author Comment

by:shaunwingin
ID: 36559776
Hi Rindi. I haven't run the fsk with raid disabled. Array still in place.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 87

Expert Comment

by:rindi
ID: 36559862
To me it looks like the system isn't using RAID (unless it is hardware RAID and not Software RAID). What does mount output?

The reason I think you aren't using RAID is because you should get something like /dev/md1 etc and not /dev/sda1 etc. sda would be a single drive.

Are you using hardware RAID? then it would show a single drive, like /dev/sda....?
0
 

Author Comment

by:shaunwingin
ID: 36559885
Its a RAID configured in the System BIOS - its not an add on card.

mount returns:
/dev/mapper/pdc_bbfcfbhgdgp2 on / type -ext3(rw) and then a whole lot of other things below and ends with warning /etc/mtab no readable....
Also see:
If I remove one of the RAID drives it gives this error:
error pdc: wrong # of devices in RAID set pdc_bbfcbhgdy 1/2 on dev/sda
0
 

Author Comment

by:shaunwingin
ID: 36559888
Server is HP Proliant Micro Server using AMD Raid chipset.
0
 
LVL 87

Accepted Solution

by:
rindi earned 250 total points
ID: 36559929
Boot into single user mode and then run fsck on /dev/mapper/pdc_bbfcfbhgdgp2 -y
0
 

Author Comment

by:shaunwingin
ID: 36559953
How do I boot in single user mode?
0
 
LVL 87

Expert Comment

by:rindi
ID: 36560083
I just checked and don't think you need that, but use the shutdown -rF now command that woolmilkporc posted in his early post, and then try the fsck I mentioned earlier.
0
 

Author Comment

by:shaunwingin
ID: 36560203
See what I did above:
Btw at root prompt I ran:
shutdown -rF now
It reboots but with errors that READ ONLY FILE SYSTEM
0
 

Author Comment

by:shaunwingin
ID: 36560612
Anyone with any suggestions please?
0
 
LVL 87

Expert Comment

by:rindi
ID: 36561064
It is supposed to be a read only file-system when you run fsck (at least when you repair things). The reason is that root is mounted. If it were mounted and not set as read only, fsck can cause havoc. fsck will write the changes to the file-system, while the OS itself can't because it is read only.

Usually you would run the fsck from a boot CD to make sure the file-system you want to repair isn't mounted, but the boot CD's don't usually recognize software raid just like that, so it is easier to do it from the installed OS while the file-system is ro.
0
 

Author Comment

by:shaunwingin
ID: 36562291
So what do U suggest I now do?
0
 
LVL 87

Expert Comment

by:rindi
ID: 36562313
Run fsck /dev/mapper/pdc_bbfcfbhgdgp2 -y

when it has booted into the read only file-system.
0
 

Author Comment

by:shaunwingin
ID: 36562924
ran it for fsck /dev/mapper/pdc_bbfcfbhgdgp2 -y and said file system still has errors but reboot...rebooted and got a whole host of new errors.
Ran fsck /dev/mapper/pdc_bbfcfbhgdgp1 -y and this runs very quick as it also appears under the mount.
Ran
 fsck /dev/mapper/pdc_bbfcfbhgdgp2 -y again and rebooted but still file errors.
An ideas
0
 

Author Comment

by:shaunwingin
ID: 36562996
When it reboots it gives error about read only and when it boots it also give readonly errors
0
 
LVL 118
ID: 36563655
the microserver uses a fake raid, its not a hardware raid card.

are you using the drivers supplied by HP?
0
 
LVL 20

Expert Comment

by:Daniel McAllister
ID: 36563812
OK... take a step back.... It seems to me you're making an assumption that the array was fine when you removed it in the first place... just because it would boot then doesn't mean that there weren't errors then! That's one reaon why there is a date & mount-count meter on the filesystem -- so that a fsck is done at least once every xxx days or yyy mounts, as errors can sometimes "sneak up on you"...

So ask yourself:

Q: WHY do you use RAID-1?
A: Because my data is stored identically on two hard drives so that if one fails, all of my data is safe on the other

Q: What has happened?
A: I'm getting seemingly random errors on my RAID 1 Array

Q: How could that be?
A: In RAID 1, the disk READ can be assigned to either drive... assume 1 drive is good, the other bad, you'll get random failures whenever the RAID controller (hard or soft) selects the bad drive to be the source (read) drive.

Q: How do I fix it?
A: Test the drives independently -- e.g.: BREAK THE MIRROR (physically remove one drive) and run FSCK on each drive separately (without the other drive in the array). DO NOT boot the system LIVE onto either one, unless you are sure that nothing important will happen (or get stored to it) while it is up. (NOTE: You may as well admit it at this point, with all the steps you've already taken -- at this point you're in disaster recovery mode, not reboot and you're up mode!)

You should probably also look at the output of SMARTtools (smartctl -H <device>)

My guess is that one will pass fsck (and/or smart), the other not so much... NOTE: The one that passes may have some minor errors -- but they'll PALE in comparison to the other one!

Once you know which drive is the good one, get a new 2nd drive and synch your data to it... THEN you should be back in business!

I do hope this helps....

Dan
IT4SOHO
0
 
LVL 118
ID: 36563842
Are you sure it's running RAID 1 under Centos 5.6, as there does not appear to be any Linux SATA RAID drivers published by HP for the MicroServer on their website, and I know, that this controllers appears as two indepdant disks under VMware ESX/ESXi. (no RAID).

Windows 2008/2008 R2 have a SATA RAID driver, for AMD Ready RAID function.
0
 
LVL 76

Expert Comment

by:arnold
ID: 36564029
what is the hardware that you are using: server make/model
RAID hardware or are you using software raid?
If hardware, did you get an alert during the bootup that informed you that the RAID volume is "seen" as incorrect and whether you want to adjust it or accept it? Did you accept it?  In hardware raid, usually you have to purge the configuration on the controller and then insert the drives and get the controller to read in the RAID configuration from the DISKS.


your fdisk -l reports that a single drive is seen which suggests that the volume is based on HARdware raid.  If you can get into the RAID controller, you may see the RAID volume reflected in degraded more.
0
 
LVL 118
ID: 36564053
I believe he is using a HP ProLiant MicroServer which uses a software fake raid SB700 SATA Controller in RAID mode.

see here

http://h20000.www2.hp.com/bizsupport/TechSupport/Home.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=4248009&lang=en&cc=us
0
 
LVL 76

Expert Comment

by:arnold
ID: 36564376
With that information, the drives can not be simply moved from one system to the next and have everything work.
Moving both drives at the same time leaves no fall back options since both disks are then marked.
shutting down the original system and moving one of the RAID 1 drives is a way to retain the option of functional resources if the move does not work.

I believe the raid configuration had to be cleared prior to attempting to boot the system using the moved drives.

0
 
LVL 87

Expert Comment

by:rindi
ID: 36567193
I don't think he's using the controller's RAID, but rather CentOS software RAID.

What does cat /proc/mdstat say?
0
 
LVL 76

Expert Comment

by:arnold
ID: 36567913
fdisk -l in http:#a36558971 suggests that the OS only sees a single drive /dev/sda.
The Type of partition is also 83 but with software raid should likely be fd.
0
 

Author Comment

by:shaunwingin
ID: 36568879
This is what using:

I believe he is using a HP ProLiant MicroServer which uses a software fake raid SB700 SATA Controller in RAID mode.

see here

http://h20000.www2.hp.com/bizsupport/TechSupport/Home.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=4248009&lang=en&cc=us

Any ideas how to recover the disk?
0
 
LVL 87

Expert Comment

by:rindi
ID: 36568978
Tun the cat /proc/mdstat command I posted, it should help us find out whether you are using software RAID or the Controller's RAID.
0
 

Author Comment

by:shaunwingin
ID: 36569291
cat /proc/mdstat
If I run it from the repair console it returns
personalities:
unused devices:none....?
0
 
LVL 87

Expert Comment

by:rindi
ID: 36569329
Then it does look like you are using the controller RAID (there is actually a driver for redhat for it, and CentOS is a redhat clone, so you would use that same driver on CentOS).

Boot the server into the RAID config utility and check what status you get there. If possible put the HD's in a PC or server that doesn't have RAID, then test them using the HD manufacturer's diagnostic utility.
0
 

Author Comment

by:shaunwingin
ID: 36569377
Boot tools shows all ok...?
0
 
LVL 76

Expert Comment

by:arnold
ID: 36570405
What steps did you file to transfer the disks from one server to the other?
Did you get an error prompt during the initial boot after the disks were transfer to the other system?
0
 

Author Comment

by:shaunwingin
ID: 36570447
Just pulled and inserted them.
No error
0
 
LVL 76

Expert Comment

by:arnold
ID: 36570631
when you moved the drives from one server to the next, what steps did you take to perform the transfer?
0
 
LVL 20

Assisted Solution

by:Daniel McAllister
Daniel McAllister earned 250 total points
ID: 36571117
OK, one more thing... if you've broken the RAID set and run a test on each drive and they both passed, then something has caused them to get out of synch, even though the "dirty bit" (a setting that would show the RAID driver/controller that they're not a matched pair anymore) is not indicating so....

So run tests to determine which of the 2 drives is closest to what you want, and re-establish a RAID-1 array with that drive being a synch-master for the initial build (use another drive, or the old pair -- just re-execute the synch).

Dan
IT4SOHO
0
 
LVL 76

Expert Comment

by:arnold
ID: 36571354
My suggestion would be to remove one of the drives to see whether the system will operate without errors while the RAID will be broken.

I think this is the test route it4soho is suggesting.
I'm not sure how you will determine which of the two drives is out-of-sync.
0
 

Author Closing Comment

by:shaunwingin
ID: 36598272
I think a re-install is needed. Tx
0

Featured Post

Get up to 2TB FREE CLOUD per backup license!

An exclusive Black Friday offer just for Expert Exchange audience! Buy any of our top-rated backup solutions & get up to 2TB free cloud per system! Perform local & cloud backup in the same step, and restore instantly—anytime, anywhere. Grab this deal now before it disappears!

Join & Write a Comment

AWS Glacier is Amazons cheapest storage option and is their answer to a ‘Cold’ storage service.  Customers primarily use this service for archival purposes and storage of infrastructure backups.  Its unlimited storage potential and low storage cost …
The article will include the best Data Recovery Tools along with their Features, Capabilities, and their Download Links. Hope you’ll enjoy it and will choose the one as required by you.
In this Micro Tutorial viewers will learn how to restore their server from Bare Metal Backup image created with Windows Server Backup feature. As an example Windows 2012R2 is used.
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now