Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium


attempt to access beyond end of device

Posted on 2008-10-22
Medium Priority
Last Modified: 2013-12-06
A list below is what we have been experiencing over and over again. I am not sure if it is a bad hard drive or something with the 3ware raid card.

The first thing that happens is a file shows up on the file system that is about 6.2 PetaBytes. This is impossible because the raid is only 182GB

Second, are backup server tries to backup this 6.2 PT file. It continuously tries to back it up until someone finally has to stop it. I guess it would backup until the backup server drive is full.

Third, the log shows
Oct 22 04:03:02 fs kernel: attempt to access beyond end of device
Oct 22 04:03:02 fs kernel: sda2: rw=0, want=6736695736, limit=386427510

This makes perfect since because the file is not really there.
The last thing that happens is the file system will finally switch over to read only mode. We then have to reboot the server. On reboot FSCK always says the drive contains errors. FSCK then goes though and fixes a lot of inode problems.

Any ideas on why the big 6.2 PT file would keep being created and how we could stop it?
This is a raid 1 with two 186.31 GB WD drives.  This is an ext3 filesystem.
Question by:clintonm9
  • 3
  • 2
LVL 23

Accepted Solution

Mysidia earned 1000 total points
ID: 22783082
Not to rule out hardware issues;  it could be caused by a
problem with the controller,  or (theoretically) the drive,
but most likely a drive failure would result in the controller failing the drive
and marking the array degraded

It sounds like an inconsistent filesystem to me.
FSCK is most likely not able to fix all the problems.

An Ext3 filesystem with corrupt metadata is an insidious problem to fix.
Insidious corruption is best avoided if possible by ensuring your kernel
is modern with old bugs addressesd.

Ext3 is journaled, but it is not perfect -- especially if your hardware implements
write caching and the write cache is not batter-backed.

This type of corruption is possible in a simple power failure situation.

It can also be caused by a software (OS) bug, or a controller/hard drive issue.
Unless you start testing your hardware and looking through 'dmesg' for errors,
there is no way to tell.

Clearly a backup should be made of all files.

If possible swap both controller and drives with spares, and
test the possibly bad controllers and drives on a test system.

I think fresh EXT3  filesystems should be re-created.

Then load the backup files onto the fresh EXT3 filesystems.

This is really the only way to ensure there are not unknown errors in
filesystem metadata.

*Cloning a filesystem with a tool like 'dd'   copies it,  but if the source filesystem had metadata corruption, so will the copy.


Author Comment

ID: 22785530
This is a production system and would have to be done in the middle of the night.

I just ran the following command and you can see how many files are very big. Also you see the error while this process was running.

ind /home -type f -size +5000000k -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'
/home/websites/mysalonsite/htdocs/data/accounts/xclusivetan159691/pageviews/day/2008/06.19.Services: 6.3E
/home/websites/mysalonsite/htdocs/data/accounts/EliteTan84463/pageviews/day/2006/02.11.About: 6.4E
/home/websites/fsordering/weblogs/error_log: 11G
/home/websites/fsordering/htdocs/images/2728836/products_sample_2.gif: 13E
find: /home/websites/fsordering/htdocs/images/products/s30478L.jpg: No such file or directory

Message from syslogd@ at Thu Oct 23 07:51:42 2008 ...
fsordering kernel: journal commit I/O error


Author Closing Comment

ID: 31508759
the solution will fix the problem, but it will be a big job.
LVL 23

Expert Comment

ID: 22800919
I/O  errors of this nature are serious;  file data corruption may have already occured to some extent (even if it hasn't been noticed yet), and I suggest you also lengthen the amount of time you retain old backups for that server, and be sure to update backups or get as complete a new one as possible,  until the server rebuild can be performed.

Weigh this against the risk of the server going down due to trying to take additional backups.

It just depends on which is more important in your situation;  having up-to-date copies of the data in case the corruption or drive/controller problem creeps further causing loss of info.

Or maximizing the uptime.

I.e.  Is it ok to lose a few days worth of data in this server,  in exchange for the benefit of less downtime?

If this were just a DHCP server,  uptime would be more important, and a few days of data lost would be irrelevent.

On the other hand, if this is a file server, that has users'  home directories...
the loss of a few days data could be costly.

And it might be a good idea to pull out a contingency plan.

Like temporarily offloading the production server's function to a server
normally used for testing.


Author Comment

ID: 22801440
Thanks for the info. The last two nights i got a good backup. Seems like the big files are not showing up right now. I bet they come back soon though. Thanks for eveyrhting!

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

SSH (Secure Shell) - Tips and Tricks As you all know SSH(Secure Shell) is a network protocol, which we use to access/transfer files securely between two networked devices. SSH was actually designed as a replacement for insecure protocols that sen…
In my business, I use the LTS (Long Term Support) versions of Linux. My workstations do real work, and so I rarely have the patience to deal with silly problems caused by an upgraded kernel that had experimental software on it to begin with from a r…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.
Suggested Courses
Course of the Month12 days, 8 hours left to enroll

578 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question