Link to home
Start Free TrialLog in
Avatar of mgokman
mgokman

asked on

File system full, lost files

OS: Sun Solaris
Allpication: Oracle 7.3.4
Background info:
I'm running Oracle in archivelog mode which means that Oracle's log files containing roll forward data are automatically archived as they are filled up. I underestimated the expected amount of archived log data required for the new application. As a result, the file system containing archives became 100% full.
Problem:
When this happened I had 30 recently created archive files. Their names and sizes were listed by ls -l. df -k showed 100%.
I deleted some old files and got this file system down to about 90%. Then we had to reboot the machine (for a different reason). When it came up... all 30 Oracle archive log files were gone! df -k showed much more free space. I'm sure noone deleted them.
What happened?
Avatar of alexhudghton
alexhudghton

The files must have been deleted - do you have a delete script among your system startup files ?

possible scenario - shutdown database - shutdown machine - take full backup on the way back up - delete archive logs (risky 'cause they may not be on tape!) - delete core files etc - back to multi user
Avatar of mgokman

ASKER

I thought I was clear enough: noone deleted those files.
My backup scripts never delete files, only rename them. My startup scripts don't even touch those files. I have a special cron job to delete these files. This cron runs at 11:00 PM and deletes yesterday's files only.
My problem occured at noon. Also, I disabled this cron long before the reboot.
So the files ust mesteriousely disapeared.
Sorry
did you check the system date?
before this reboot - when was the last time the system was rebooted ?
How about someone added a new file system for your archive logs on reboot of the server. This filesystem is mounted in the same directory as your archive log destination. You 'lost' the 30 files because they are still in the mount point and will not be seen until you unmount the filesystem. Also explains why you have so much space now ?
Avatar of mgokman

ASKER

I appreciate your efforts, I know it sounds weird. I forgot to mention that there were also some older archive files in that directory. Those ones were still there, but the new ones were not. The system was not rebooted for at least a week before this reboot.
I was wondering if there could be a problem with inodes due to the 100% full status of the file system.
We also experienced a different problem in another file system: deleting a file didn't adjust available space in it until the reboot. SO I was wondering if our file systems are not properly defined.
Thanks.

well the other problem I can understand - you can have processes which hold files (and consequently the space taken by them) open even after the file has been deleted. Stop the process or reboot and hey presto! the space reappears.

As for the original question if the older files were still there and the new ones were not, then I go back to my original post - someone or something has deleted them. At least until you get a better answer :-)

Alex
Avatar of mgokman

ASKER

Thanks again. I will wait for anyone to figure out my puzzle. If I don't get any answer, I will assume that someone accidently deleted them, I can't imagine who and how.
You will get the points If I don't get a better answer.
ASKER CERTIFIED SOLUTION
Avatar of alextr
alextr
Flag of Italy image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
this is not peculiarity - it is normal behaviour - file systems have some space reserved for use by the root user (usually 10% of the total size) up to this limit any user can write files until the filesystem is 100% full - root can still write using the extra 10%. this figure can be reduced (using tunefs on Sequent systems for example) but to let it get below 5% causes performance problems

 So if you have 100% full filesystem you can gain space by reducing the amount allocated to root - although this is not advised - but it may get you out of a hole.
Basically what I think happened is that the file system was in a corrupt state, so it wasn't reporting the correct amount of free space.  You deleted a large number of files.
Somehow the OS knew that there was available space, and it allowed you to perform a write transaction.  However, due to the filesystem corruption, the data in the write buffer could not be written to disk as would usually happen when you did a soft shutdown.  

All I can say is don't write to full filesystems.  If you ever get a filesystem in a state like that, you should do this:
1)  Back up all files on the corrupt filesystem to a non-corrupt file system, or tape.
2)  Unmount the corrupt filesystem
3)  Run fsck and repair the corruption.
4)  Mount the file system again
5)  Free up space on the file system by deleting files
6)  Restore any files that got lost because they didn't get written out of the buffer.

How does that sound...
Avatar of mgokman

ASKER

I don't know if alextr's answer realy answers my question, but it kind of confirms my wild guess about strange things that can happen when file system is 100% full. I realy wanted a real explanation, but looks like I won't get it.
I wish I could also give points to alexhudghton, but because I can't I want to give him many thanks.