asked on

Using lseek on a compressed file

I am trying to save space on my client's unix box by compressing several files (each is at least 1 gig before compression). My problem is that I have a C program that uses these files, one at a time, as input. This program uses 'lseek' to find a particular location within the file, but this will present a problem once the files are compresses. Is lseek able to operate on a compressed file? If so, which method (compress, gzip, pack, etc...) is most compatible?

snifong

>>Is lseek able to operate on a compressed file?

Yes.

>>If so, which method (compress, gzip, pack, etc...) is most compatible?

I am not sure what you mean by "most compatible". See next statement.

When you open the file in binary mode you are essetially just looking at bytes. lseek is not going to know that the file is compress or anything about the data that the fp is pointing to. It is going to be up to you to tell it were to seek to.

ozo

How does the program determine which location to lseek to in the uncompressed file?
A seek into a compressed file would gennerally not be particularly meaningful.
Can you uncompress before seeking?

betsywr

ASKER

The program determines the location to lseek by querying a database table. All three lseek parameters (file name, offset and origin) are fields in that table. I am not able to change the information that gets populated in the table, so I'm trying to figure out a way (without running this program through a unix shell script - major performance hit) to utilize this data and still save space. Any ideas?

By 'most compatible', I mean easiest to work with (i.e. which method of compressing would produce a file that is the least problematic for the lseek function to operate on).

graham_k

since you are compressing "several" fiels, I would suggets that the easiest thing to do is to uncompress one of them when the program starts to run (or just before you want to seek). The others will remain compressed & if your programs run time is short, you will only have one uncompressed for a short while.

To do what you want to, the only way that I can see is to get hold of the freeware source of the compression code (such things are avaiable) & tweak it. Thus, when compressing, you can store the filepos of certain key data & use that to lseek afterwards. However, you would then need to uncompress the data & since it's not the whole file which you're uncompressing, just a bit, you *must* use the same algorithm to uncompress as to compress.

I *strongly* recommend working on only whole files.

betsywr

ASKER

Uncompressing only one file at a time sounds like a great idea. However, this is an online customer care system that I'm working with, and there are hundreds of customer service reps hitting these files at any given time. Therefore, it wouldn't save me any space given that all of the files could be uncompressed at the same time. Not to mention that I would have major contention problems if someone was trying to uncompress a file which is already uncompressed and being used by another customer service rep. Thanks for the suggestion though.

I'm really hoping that someone will know of a form of compression that wouldn't require uncompressing the file to do an lseek on it.

If this is at ALL possible, please let me know.

ASKER CERTIFIED SOLUTION

graham_k

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

betsywr

ASKER

gzip could help, but it still doesn't help my problem of having to perform an lseek WITHOUT uncompressing the files first. I think that my co-workers and I have a possible solution, but it is quite messy. I am most in favor of simply purchasing more disk space!

graham_k

hmm, on the one hand - thanks for the points - they pushed me over the 30k mark & I am now eligible for another stripe on my T-shirt. Otoh, that wasn't really an answer. I guess that you are like me - I ask questions in order to bounce ideas off of others, then end up implementing my own original idea, but awarding points to everyone who participated in the discussion.

Gzip may not directly help, but tweaking the code so that when compresing you store the offsets to important records in a second, index file might help. I think however that you have the correct idea - storage is cheap (but apparantly not as cheap as your boss <g>) - buy more hard drives.

best wishes,

Graham