?
Solved

attempt to access beyond end of device

Posted on 2008-10-22
5
Medium Priority
?
1,528 Views
Last Modified: 2013-12-06
A list below is what we have been experiencing over and over again. I am not sure if it is a bad hard drive or something with the 3ware raid card.

The first thing that happens is a file shows up on the file system that is about 6.2 PetaBytes. This is impossible because the raid is only 182GB

Second, are backup server tries to backup this 6.2 PT file. It continuously tries to back it up until someone finally has to stop it. I guess it would backup until the backup server drive is full.

Third, the log shows
Oct 22 04:03:02 fs kernel: attempt to access beyond end of device
Oct 22 04:03:02 fs kernel: sda2: rw=0, want=6736695736, limit=386427510

This makes perfect since because the file is not really there.
The last thing that happens is the file system will finally switch over to read only mode. We then have to reboot the server. On reboot FSCK always says the drive contains errors. FSCK then goes though and fixes a lot of inode problems.

Any ideas on why the big 6.2 PT file would keep being created and how we could stop it?
This is a raid 1 with two 186.31 GB WD drives.  This is an ext3 filesystem.
0
Comment
Question by:clintonm9
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 23

Accepted Solution

by:
Mysidia earned 1000 total points
ID: 22783082
Not to rule out hardware issues;  it could be caused by a
problem with the controller,  or (theoretically) the drive,
but most likely a drive failure would result in the controller failing the drive
and marking the array degraded

It sounds like an inconsistent filesystem to me.
FSCK is most likely not able to fix all the problems.

An Ext3 filesystem with corrupt metadata is an insidious problem to fix.
Insidious corruption is best avoided if possible by ensuring your kernel
is modern with old bugs addressesd.

Ext3 is journaled, but it is not perfect -- especially if your hardware implements
write caching and the write cache is not batter-backed.

This type of corruption is possible in a simple power failure situation.

It can also be caused by a software (OS) bug, or a controller/hard drive issue.
Unless you start testing your hardware and looking through 'dmesg' for errors,
there is no way to tell.




Clearly a backup should be made of all files.

If possible swap both controller and drives with spares, and
test the possibly bad controllers and drives on a test system.


I think fresh EXT3  filesystems should be re-created.

Then load the backup files onto the fresh EXT3 filesystems.

This is really the only way to ensure there are not unknown errors in
filesystem metadata.


*Cloning a filesystem with a tool like 'dd'   copies it,  but if the source filesystem had metadata corruption, so will the copy.



0
 

Author Comment

by:clintonm9
ID: 22785530
This is a production system and would have to be done in the middle of the night.

I just ran the following command and you can see how many files are very big. Also you see the error while this process was running.

ind /home -type f -size +5000000k -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'
/home/websites/mysalonsite/htdocs/data/accounts/xclusivetan159691/pageviews/day/2008/06.19.Services: 6.3E
/home/websites/mysalonsite/htdocs/data/accounts/EliteTan84463/pageviews/day/2006/02.11.About: 6.4E
/home/websites/fsordering/weblogs/error_log: 11G
/home/websites/fsordering/htdocs/images/2728836/products_sample_2.gif: 13E
find: /home/websites/fsordering/htdocs/images/products/s30478L.jpg: No such file or directory

Message from syslogd@ at Thu Oct 23 07:51:42 2008 ...
fsordering kernel: journal commit I/O error

0
 

Author Closing Comment

by:clintonm9
ID: 31508759
the solution will fix the problem, but it will be a big job.
0
 
LVL 23

Expert Comment

by:Mysidia
ID: 22800919
I/O  errors of this nature are serious;  file data corruption may have already occured to some extent (even if it hasn't been noticed yet), and I suggest you also lengthen the amount of time you retain old backups for that server, and be sure to update backups or get as complete a new one as possible,  until the server rebuild can be performed.

Weigh this against the risk of the server going down due to trying to take additional backups.

It just depends on which is more important in your situation;  having up-to-date copies of the data in case the corruption or drive/controller problem creeps further causing loss of info.

Or maximizing the uptime.

I.e.  Is it ok to lose a few days worth of data in this server,  in exchange for the benefit of less downtime?

If this were just a DHCP server,  uptime would be more important, and a few days of data lost would be irrelevent.

On the other hand, if this is a file server, that has users'  home directories...
the loss of a few days data could be costly.

And it might be a good idea to pull out a contingency plan.

Like temporarily offloading the production server's function to a server
normally used for testing.




0
 

Author Comment

by:clintonm9
ID: 22801440
Thanks for the info. The last two nights i got a good backup. Seems like the big files are not showing up right now. I bet they come back soon though. Thanks for eveyrhting!
0

Featured Post

 [eBook] Windows Nano Server

Download this FREE eBook and learn all you need to get started with Windows Nano Server, including deployment options, remote management
and troubleshooting tips and tricks

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
Fine Tune your automatic Updates for Ubuntu / Debian
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Suggested Courses
Course of the Month11 days, 5 hours left to enroll

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question