Link to home
Start Free TrialLog in
Avatar of betsywr
betsywr

asked on

Using lseek on a compressed file

I am trying to save space on my client's unix box by compressing several files (each is at least 1 gig before compression).  My problem is that I have a C program that uses these files, one at a time, as input.  This program uses 'lseek' to find a particular location within the file, but this will present a problem once the files are compresses.  Is lseek able to operate on a compressed file?  If so, which method (compress, gzip, pack, etc...) is most compatible?
Avatar of snifong
snifong
Flag of United States of America image

>>Is lseek able to operate on a compressed file?

Yes.

>>If so, which method (compress, gzip, pack, etc...) is most compatible?

I am not sure what you mean by "most compatible".  See next statement.

When you open the file in binary mode you are essetially just looking at bytes.  lseek is not going to know that the file is compress or anything about the data that the fp is pointing to.  It is going to be up to you to tell it were to seek to.
Avatar of ozo
How does the program determine which location to lseek to in the uncompressed file?
A seek into a compressed file would gennerally not be particularly meaningful.
Can you uncompress before seeking?
Avatar of betsywr
betsywr

ASKER

The program determines the location to lseek by querying a database table. All three lseek parameters (file name, offset and origin) are fields in that table. I am not able to change the information that gets populated in the table, so I'm trying to figure out a way (without running this program through a unix shell script - major performance hit) to utilize this data and still save space.  Any ideas?

By 'most compatible', I mean easiest to work with (i.e. which method of compressing would produce a file that is the least problematic for the lseek function to operate on).
since you are compressing "several" fiels, I would suggets that the easiest thing to do is to uncompress one of them when the program starts to run (or just before you want to seek). The others will remain compressed & if your programs run time is short, you will only have one uncompressed for a short while.

To do what you want to, the only way that I can see is to get hold of the freeware source of the compression code (such things are avaiable) & tweak it. Thus, when compressing, you can store the filepos of certain key data & use that to lseek afterwards. However, you would then need to uncompress the data & since it's not the whole file which you're uncompressing, just a bit, you *must* use the same algorithm to uncompress as to compress.

I *strongly* recommend working on only whole files.

Avatar of betsywr

ASKER

Uncompressing only one file at a time sounds like a great idea.  However, this is an online customer care system that I'm working with, and there are hundreds of customer service reps hitting these files at any given time.  Therefore, it wouldn't save me any space given that all of the files could be uncompressed at the same time.  Not to mention that I would have major contention problems if someone was trying to uncompress a file which is already uncompressed and being used by another customer service rep.  Thanks for the suggestion though.  

I'm really hoping that someone will know of a form of compression that wouldn't require uncompressing the file to do an lseek on it.  

If this is at ALL possible, please let me know.
ASKER CERTIFIED SOLUTION
Avatar of graham_k
graham_k

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of betsywr

ASKER

gzip could help, but it still doesn't help my problem of having to perform an lseek WITHOUT uncompressing the files first.  I think that my co-workers and I have a possible solution, but it is quite messy.  I am most in favor of simply purchasing more disk space!
hmm, on the one hand - thanks for the points - they pushed me over the 30k mark & I am now eligible for another stripe on my T-shirt. Otoh, that wasn't really an answer. I guess that you are like me - I ask questions in order to bounce ideas off of others, then end up implementing my own original idea, but awarding points to everyone who participated in the discussion.

Gzip may not directly help, but tweaking the code so that when compresing you store the offsets to important records in a second, index file might help. I think however that you have the correct idea - storage is cheap (but apparantly not as cheap as your boss <g>) - buy more hard drives.

best wishes,

Graham