Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 673
  • Last Modified:

RAID 1 unexplained file system error Centos5.6

I have 2 identical servers. Took the perfectly working pair of drives and put them into the other server and got the following error:
Checking file system /contains a file system with error check forced.
Reaches about 60% and then
Extended attribute block 19334147 has reference count 1024 and should be 992
Unexpected inconsistency
run fsk MANUALLY and drops me to Cntrol D shell!

Can the system time cause such an error. They are different!
If I put the drives back into the original server I get a similar error condition!

Any ideas as to how to repair above unexplained error?
0
shaunwingin
Asked:
shaunwingin
  • 31
  • 13
  • 9
  • +3
2 Solutions
 
woolmilkporcCommented:
Seems there's been a defective filesystem already on the originating box.

Why don't you run an fsck manually against the filesystem in question ( /  apparently).

The system time is most probably unrelated to this issue.

wmp
0
 
woolmilkporcCommented:
... run an fsck against / by touching a file /forcefsck and rebooting afterwards,

or run shutdown -rF now

wmp
0
 
shaunwinginAuthor Commented:
I don't believe there was defective file system from originating box as it didn't show any errors and booted fine.
It did mention an error about file system time being in the future and corrected something....
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
woolmilkporcCommented:
Try pressing the letter F when it says:

"run fsck manually ..."
0
 
shaunwinginAuthor Commented:
Ok. Will try,

A question of curiosity: If I were to swop the drives around accidentally and put them in slot 2 and 1 instead of 1 and 2 would this cause an error?
0
 
shaunwinginAuthor Commented:
F does nothing as it drops to prompt:
Give root password for maintenance.
0
 
woolmilkporcCommented:
And? Did you give the root password?

0
 
shaunwinginAuthor Commented:
Btw another question of curiosity:
If I remove one of the RAID drives it gives this error:
error pdc: wrong # of devices in RAID set pdc_bbfcbhgdy 1/2 on dev/sda

Can the system still manage to boot with only one raid drive? Its RAID 1 so it should?
0
 
woolmilkporcCommented:
It should boot, but you will have to get around the error:

Press a key early in the boot sequence to pull up the grub boot menu, add the keyword "nodmraid" to the kernel command line and see if it boots.
0
 
shaunwinginAuthor Commented:
I give root password. Is gives # prompt (Repair file system).

Should I do this from # prompt?
"... run an fsck against / by touching a file /forcefsck and rebooting afterwards,"
What are exact commands?

What does "nodmraid" do?
0
 
shaunwinginAuthor Commented:
Btw at root prompt I ran:
shutdown -rF now
It reboots but with errors that READ ONLY FILE SYSTEM
0
 
woolmilkporcCommented:
Try

/sbin/fsck

nodmraid: disable software raid.

dmraid: discover and activate software raid.

0
 
woolmilkporcCommented:
Try

mount -o remount,rw / on "#(Repair ...)" prompt.

0
 
shaunwinginAuthor Commented:
"mount -o remount,rw / on "#(Repair ...)" prompt."

Gave errors: Jourbal has aborted....
remount read-only

With nodmraid still saw in boot sequence:
dmraid45....so not sure if it actioned it....?


0
 
woolmilkporcCommented:
This is a filesystem/journal mismatch and a bit hard to repair.

You could try this:

1) tune2fs -O ^has_journal /dev/hdxxx
with /dev/hdxxx being the underlying device of the FS in question.

2) e2fsck /dev/hdxxx

3) tune2fs -j /dev/hdxxx

4) mount /dev/hdxxx /

 
0
 
shaunwinginAuthor Commented:
say what is hdxxx

This is fdisk -l
  Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14       30175 242276265 83  Linux
/dev/sda3              30176   30272 779152 82 Linux swap/Solaris
0
 
woolmilkporcCommented:
Rather issue "mount" and look for

/dev/sda[x] on /

then use this /dev/sda[x].

According to the fdisk output it should be /dev/sda1

wmp

0
 
woolmilkporcCommented:
I assume you'll have to boot from some rescue media first since it seems we're talking about the active boot partition.
0
 
shaunwinginAuthor Commented:
mount returns:
/dev/mapper/pdc_bbfcfbhgdgp2 on / type -ext3(rw) and then a whole lot of other things below and ends with warning /etc/mtab no readable....

any ideas?
0
 
shaunwinginAuthor Commented:
"I assume you'll have to boot from some rescue media first since it seems we're talking about the active boot partition."
Any ideas?
0
 
shaunwinginAuthor Commented:
This all seems very round about to repair something that was working perfectly and just broke when inserting into an identical server....?
Is there not something else that I can try?
Also because its a RAID partition - one needs to surely run the repair on the RAID partition....
Alternatively - if I removed one RAID drive and repaired it and then rebuilt the 2nd drive - would this make more sense?
0
 
woolmilkporcCommented:
Your root filesystem seems to be mounted r/w and OK.

Which problematic filesystem are we actually talking about, please?

Seems that your original Q was misleading: " Checking file system / contains a file system..."

0
 
shaunwinginAuthor Commented:
Just to clarify there is a file system error and this is the message:
"Checking file system / contains a file system with error check forced."
However the cause was simply to move the drives from one identical server to another and back.
The error exists in both servers now.
It was working perfectly in the 1st server.
Something is causing the OS to think there is an issue....
I'm looking for an easy way to solve this...?
0
 
shaunwinginAuthor Commented:
Can you perhaps ask some of the other experts if they have any idea as to what is causing this behavior.
This is not the 1st disk pair that I've had the identical issues with...
0
 
woolmilkporcCommented:
Rather delete this Q and ask a new one, perhaps posting some more detailed output with it.

I'll abstain from the new question, let's hear what the other experts say ...

Good luck!
0
 
shaunwinginAuthor Commented:
Tx. What output do U suggest?
0
 
woolmilkporcCommented:
Your new Q looks quite OK.

Don't forget to delete this one, to get your points back.

wmp
0
 
shaunwinginAuthor Commented:
I've requested that this question be deleted for the following reason:

No solution found as yet
0
 
shaunwinginAuthor Commented:
Please update the Question with this at the bottom of it:
Can the system time cause such an error. They are different!
If I put the drives back into the original server I get a similar error condition!

Just to clarify there is a file system error and this is the message:
"Checking file system / contains a file system with error check forced."
However the cause was simply to move the drives from one identical server to another and back.
The error exists in both servers now.
It was working perfectly in the 1st server.
Something is causing the OS to think there is an issue....
I'm looking for an easy way to solve this...?


SOME SYSTEM INFO:
This is fdisk -l
  Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14       30175 242276265 83  Linux
/dev/sda3              30176   30272 779152 82 Linux swap/Solaris

mount returns:
/dev/mapper/pdc_bbfcfbhgdgp2 on / type -ext3(rw) and then a whole lot of other things below and ends with warning /etc/mtab no readable....
0
 
rindiCommented:
I wouldn't disable the Array when running the fsck on it, otherwise you would have to rebuild the array again if fsck repairs anything.

I'm not sure whether you have already done that now. Can you verify you haven't yet run fsck with the RAID disabled?
0
 
shaunwinginAuthor Commented:
Please don't delete Qeustion then...but ask other experts for comment pls and update question as above..
0
 
shaunwinginAuthor Commented:
Hi Rindi. I haven't run the fsk with raid disabled. Array still in place.
0
 
rindiCommented:
To me it looks like the system isn't using RAID (unless it is hardware RAID and not Software RAID). What does mount output?

The reason I think you aren't using RAID is because you should get something like /dev/md1 etc and not /dev/sda1 etc. sda would be a single drive.

Are you using hardware RAID? then it would show a single drive, like /dev/sda....?
0
 
shaunwinginAuthor Commented:
Its a RAID configured in the System BIOS - its not an add on card.

mount returns:
/dev/mapper/pdc_bbfcfbhgdgp2 on / type -ext3(rw) and then a whole lot of other things below and ends with warning /etc/mtab no readable....
Also see:
If I remove one of the RAID drives it gives this error:
error pdc: wrong # of devices in RAID set pdc_bbfcbhgdy 1/2 on dev/sda
0
 
shaunwinginAuthor Commented:
Server is HP Proliant Micro Server using AMD Raid chipset.
0
 
rindiCommented:
Boot into single user mode and then run fsck on /dev/mapper/pdc_bbfcfbhgdgp2 -y
0
 
shaunwinginAuthor Commented:
How do I boot in single user mode?
0
 
rindiCommented:
I just checked and don't think you need that, but use the shutdown -rF now command that woolmilkporc posted in his early post, and then try the fsck I mentioned earlier.
0
 
shaunwinginAuthor Commented:
See what I did above:
Btw at root prompt I ran:
shutdown -rF now
It reboots but with errors that READ ONLY FILE SYSTEM
0
 
shaunwinginAuthor Commented:
Anyone with any suggestions please?
0
 
rindiCommented:
It is supposed to be a read only file-system when you run fsck (at least when you repair things). The reason is that root is mounted. If it were mounted and not set as read only, fsck can cause havoc. fsck will write the changes to the file-system, while the OS itself can't because it is read only.

Usually you would run the fsck from a boot CD to make sure the file-system you want to repair isn't mounted, but the boot CD's don't usually recognize software raid just like that, so it is easier to do it from the installed OS while the file-system is ro.
0
 
shaunwinginAuthor Commented:
So what do U suggest I now do?
0
 
rindiCommented:
Run fsck /dev/mapper/pdc_bbfcfbhgdgp2 -y

when it has booted into the read only file-system.
0
 
shaunwinginAuthor Commented:
ran it for fsck /dev/mapper/pdc_bbfcfbhgdgp2 -y and said file system still has errors but reboot...rebooted and got a whole host of new errors.
Ran fsck /dev/mapper/pdc_bbfcfbhgdgp1 -y and this runs very quick as it also appears under the mount.
Ran
 fsck /dev/mapper/pdc_bbfcfbhgdgp2 -y again and rebooted but still file errors.
An ideas
0
 
shaunwinginAuthor Commented:
When it reboots it gives error about read only and when it boots it also give readonly errors
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
the microserver uses a fake raid, its not a hardware raid card.

are you using the drivers supplied by HP?
0
 
Daniel McAllisterPresident, IT4SOHO, LLCCommented:
OK... take a step back.... It seems to me you're making an assumption that the array was fine when you removed it in the first place... just because it would boot then doesn't mean that there weren't errors then! That's one reaon why there is a date & mount-count meter on the filesystem -- so that a fsck is done at least once every xxx days or yyy mounts, as errors can sometimes "sneak up on you"...

So ask yourself:

Q: WHY do you use RAID-1?
A: Because my data is stored identically on two hard drives so that if one fails, all of my data is safe on the other

Q: What has happened?
A: I'm getting seemingly random errors on my RAID 1 Array

Q: How could that be?
A: In RAID 1, the disk READ can be assigned to either drive... assume 1 drive is good, the other bad, you'll get random failures whenever the RAID controller (hard or soft) selects the bad drive to be the source (read) drive.

Q: How do I fix it?
A: Test the drives independently -- e.g.: BREAK THE MIRROR (physically remove one drive) and run FSCK on each drive separately (without the other drive in the array). DO NOT boot the system LIVE onto either one, unless you are sure that nothing important will happen (or get stored to it) while it is up. (NOTE: You may as well admit it at this point, with all the steps you've already taken -- at this point you're in disaster recovery mode, not reboot and you're up mode!)

You should probably also look at the output of SMARTtools (smartctl -H <device>)

My guess is that one will pass fsck (and/or smart), the other not so much... NOTE: The one that passes may have some minor errors -- but they'll PALE in comparison to the other one!

Once you know which drive is the good one, get a new 2nd drive and synch your data to it... THEN you should be back in business!

I do hope this helps....

Dan
IT4SOHO
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Are you sure it's running RAID 1 under Centos 5.6, as there does not appear to be any Linux SATA RAID drivers published by HP for the MicroServer on their website, and I know, that this controllers appears as two indepdant disks under VMware ESX/ESXi. (no RAID).

Windows 2008/2008 R2 have a SATA RAID driver, for AMD Ready RAID function.
0
 
arnoldCommented:
what is the hardware that you are using: server make/model
RAID hardware or are you using software raid?
If hardware, did you get an alert during the bootup that informed you that the RAID volume is "seen" as incorrect and whether you want to adjust it or accept it? Did you accept it?  In hardware raid, usually you have to purge the configuration on the controller and then insert the drives and get the controller to read in the RAID configuration from the DISKS.


your fdisk -l reports that a single drive is seen which suggests that the volume is based on HARdware raid.  If you can get into the RAID controller, you may see the RAID volume reflected in degraded more.
0
 
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
I believe he is using a HP ProLiant MicroServer which uses a software fake raid SB700 SATA Controller in RAID mode.

see here

http://h20000.www2.hp.com/bizsupport/TechSupport/Home.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=4248009&lang=en&cc=us
0
 
arnoldCommented:
With that information, the drives can not be simply moved from one system to the next and have everything work.
Moving both drives at the same time leaves no fall back options since both disks are then marked.
shutting down the original system and moving one of the RAID 1 drives is a way to retain the option of functional resources if the move does not work.

I believe the raid configuration had to be cleared prior to attempting to boot the system using the moved drives.

0
 
rindiCommented:
I don't think he's using the controller's RAID, but rather CentOS software RAID.

What does cat /proc/mdstat say?
0
 
arnoldCommented:
fdisk -l in http:#a36558971 suggests that the OS only sees a single drive /dev/sda.
The Type of partition is also 83 but with software raid should likely be fd.
0
 
shaunwinginAuthor Commented:
This is what using:

I believe he is using a HP ProLiant MicroServer which uses a software fake raid SB700 SATA Controller in RAID mode.

see here

http://h20000.www2.hp.com/bizsupport/TechSupport/Home.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=4248009&lang=en&cc=us

Any ideas how to recover the disk?
0
 
rindiCommented:
Tun the cat /proc/mdstat command I posted, it should help us find out whether you are using software RAID or the Controller's RAID.
0
 
shaunwinginAuthor Commented:
cat /proc/mdstat
If I run it from the repair console it returns
personalities:
unused devices:none....?
0
 
rindiCommented:
Then it does look like you are using the controller RAID (there is actually a driver for redhat for it, and CentOS is a redhat clone, so you would use that same driver on CentOS).

Boot the server into the RAID config utility and check what status you get there. If possible put the HD's in a PC or server that doesn't have RAID, then test them using the HD manufacturer's diagnostic utility.
0
 
shaunwinginAuthor Commented:
Boot tools shows all ok...?
0
 
arnoldCommented:
What steps did you file to transfer the disks from one server to the other?
Did you get an error prompt during the initial boot after the disks were transfer to the other system?
0
 
shaunwinginAuthor Commented:
Just pulled and inserted them.
No error
0
 
arnoldCommented:
when you moved the drives from one server to the next, what steps did you take to perform the transfer?
0
 
Daniel McAllisterPresident, IT4SOHO, LLCCommented:
OK, one more thing... if you've broken the RAID set and run a test on each drive and they both passed, then something has caused them to get out of synch, even though the "dirty bit" (a setting that would show the RAID driver/controller that they're not a matched pair anymore) is not indicating so....

So run tests to determine which of the 2 drives is closest to what you want, and re-establish a RAID-1 array with that drive being a synch-master for the initial build (use another drive, or the old pair -- just re-execute the synch).

Dan
IT4SOHO
0
 
arnoldCommented:
My suggestion would be to remove one of the drives to see whether the system will operate without errors while the RAID will be broken.

I think this is the test route it4soho is suggesting.
I'm not sure how you will determine which of the two drives is out-of-sync.
0
 
shaunwinginAuthor Commented:
I think a re-install is needed. Tx
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 31
  • 13
  • 9
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now