asked on

EXT-3 maximal mount count reached (centOS)

Hi
I have 1 Centos kernel 2.6 running web-service-hosting cPanel software.
It's been running pretty well for the last couple of months. However, recent, I received notes that filesystem sunddenly is locked down (and become read-only file system). Consequently, all file-upload, website-session-control, etc are not working. Unless I reboot the server then filesystem becomes normal again.

But, eventually, it just enters read-only mode.
is there anyway that can fix this?
I force filesystem to check itself at boot. I also performed fsck remtely. THought the result turn out to be clean, it sitll goes back to "read-only" eventually.

Any inight will be great!!

I check /var/log/dmesg and receive this:
device-mapper: multipath: version 1.0.5 loaded
EXT3 FS on dm-0, internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: initialized (dev sda1, type ext3), uses xattr
SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
kjournald starting.  Commit interval 5 seconds
EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
EXT3 FS on sdc1, internal journal
ext3_orphan_cleanup: deleting unreferenced inode 2244609
EXT3-fs: sdc1: 1 orphan inode deleted
 
 
---------------------
mount table
 
root@cpanel [/var/log]# mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw,usrquota)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
/dev/sdc1 on /home type ext3 (rw,usrquota)
/dev/sdb1 on /home2 type ext3 (rw,usrquota)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
/usr/tmpDSK on /tmp type ext3 (rw,noexec,nosuid,loop=/dev/loop0)
/tmp on /var/tmp type none (rw,noexec,nosuid,bind)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

Open in new window

ASKER CERTIFIED SOLUTION

Julian Parker

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

valleytech

ASKER

owah... how can you tell it's /dev/sdc1 causing problem?
i thought /dev/sda1 is problematic

Julian Parker

From the bottom of your log;

EXT3 FS on sdc1, internal journal
ext3_orphan_cleanup: deleting unreferenced inode 2244609
EXT3-fs: sdc1: 1 orphan inode deleted

Open in new window

valleytech

ASKER

ah! I totaly didn't see that part, i was focusing on /dev/sda1
I'm running fsck /dev/sdc1 -y
Along the way, I received a list of
Multiply-claimed block(s) in inode 459888: 943551 943552 943553 943554 943555 94 3556 943557
Illegal block number passed to ext2fs_test_block_bitmap #33554432 for multiply c laimed block map

It's been staying there for 15 minutes already.
soulnd like a bad sign? a bad disk in this case?

many thanks!

Julian Parker

Could be on it's way out... possibly one leg out the door if you know what I mean...

If you havnt got a backup you should get one ASAP.

If you have smartctl installed, run the smartctl command I posted above

just done a search on the error, found this; http://www.linuxquestions.org/questions/linux-newbie-8/filesystem-errors-617286/

valleytech

ASKER

Agree!
luckily I have a NFS partition to copy stuffs over.
I also run smarctl -t long /dev/sdc

root@cpanel [~]# smartctl -a /dev/sdc
smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
Device: IBM      IC35L073UCDY10-0 Version: S27F
Serial number: E6VSWWBC
Device type: disk
Transport protocol: Parallel SCSI (SPI-4)
Local Time is: Mon Nov 16 15:46:28 2009 PST
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK
 
Current Drive Temperature:     35 C
Drive Trip Temperature:        85 C
Manufactured in week 39 of year 2003
Recommended maximum start stop count:  10000 times
Current start stop count:      199 times
Elements in grown defect list: 0
 
Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0      17608.736           0
write:         0        0         0         9          9      12313.029           0
verify:        0        0         0         0          0          0.002           0
 
Non-medium error count:        0
 
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Self test in progress ...   -   41899                 - [-   -    -]
# 2  Background long   Completed                   -       0                 - [-   -    -]
# 3  Background short  Completed                   -       0                 - [-   -    -]
 
Long (extended) Self Test duration: 4700 seconds [78.3 minutes]

Open in new window

Julian Parker

OK, I was hoping to see a bit more information (see example below) but I guess not all disk report the same info.

Anyway, I guess youre now looking at the right disk, once you've copied all the info off you could delete the partition table and re-partition/reformat the drive to see if it clears the problem, If it were me, I'd be ordering a replacement.

SMART Attributes Data Structure revision number: 32
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0027   192   191   063    Pre-fail  Always       -       12358
  4 Start_Stop_Count        0x0032   253   253   000    Old_age   Always       -       51
  5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0027   245   219   187    Pre-fail  Always       -       46238
  9 Power_On_Hours          0x0032   178   178   000    Old_age   Always       -       26238
 10 Spin_Retry_Count        0x002b   253   252   157    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   253   253   000    Old_age   Always       -       82
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   052   044   000    Old_age   Always       -       48 (Lifetime Min/Max 21/49)
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0032   043   253   000    Old_age   Always       -       48
195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -       2936
196 Reallocated_Event_Count 0x0008   253   253   000    Old_age   Offline      -       0
197 Current_Pending_Sector  0x0008   253   253   000    Old_age   Offline      -       0
198 Offline_Uncorrectable   0x0008   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   Offline      -       0
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   Always       -       1
202 TA_Increase_Count       0x000a   253   252   000    Old_age   Always       -       0
203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always       -       0
204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   Always       -       0
205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   Always       -       0
207 Spin_High_Current       0x002a   253   252   000    Old_age   Always       -       0
208 Spin_Buzz               0x002a   253   252   000    Old_age   Always       -       0
210 Unknown_Attribute       0x0032   253   252   000    Old_age   Always       -       0
211 Unknown_Attribute       0x0032   253   252   000    Old_age   Always       -       0
212 Unknown_Attribute       0x0032   253   252   000    Old_age   Always       -       0

Open in new window

valleytech

ASKER

what command you're using to give such information?
smartctl -t long?

thanks!

Julian Parker

in your case; smartctl -a /dev/sdc

Julian Parker

At least you can see your disk is 6 years old... :-)

Julian Parker

I'm afraid I have to go to my pit (it's a bit late). I hope you manage to backup the partition, I'll check back again later but I'm sure someone else will pick up the Q while I'm offline.

valleytech

ASKER

many thanks!!!

valleytech

ASKER

i ended up replacing the disk ;)
thanks again!