Use Linux libarchive to create a relative path zip file

On our RHEL6 server, our admin installed libarchive 4.8.3.
http://www.libarchive.org/
http://www.libarchive.org/downloads/ to get older 4.8.3

I am asked to write a C++ program using libarchive to produce a very limited Linux zip look-alike without spawning.
The original zip command in a customer provided script looks something like:
$ zip bar.zip folder1/* folder2/* folder3/*

We don't need all the features of zip. In fact, they don't care if we even deal with compression. All that is necessary is that an archive of files is created and someone can use unzip to recreate the folder/file structure. (Since most of the files are already compressed, compression would actually waste time, but no biggie if the result uses compression as a default - whatever is the easiest route to take.)

The resultant zip file will have relative locations, not full absolute paths.
Does anyone have a simple example to do just this creating a zip file and filling it so that the result looks like the above zip command line.

I see examples on http://www.libarchive.org/, which covers full capabilities of libarchive - mostly being able to autodetect existing archives and handling them. We have no need to read archives.

I was hoping that someone has a quick solution to create a zip file. The group received from the customer a script with the zip command, and told us to use their class that spawned the script using system(). That caused a crash when integrated into our program. The first work-around was to use posix_spawn but that has problems also. Basically, we learned that our program must not spawn anything due to program constraints.

Any simple program that illustrates, say, inserting three files with relative file paths into the created zip file would be appreciated. I know how to handle directory recursion if that becomes a requirement.

Thanks.
LVL 33
phoffricAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

sarabandeCommented:
I see examples on http://www.libarchive.org/

Open in new window


the sample 'A Basic Write Example' does pretty much what you want.

you would call the function like

char * szzipfile = "sample.zip";
char * szfilepaths[] = { "./FolderA/File1.dat", 
         "./FolderA/File2.dat",                             
         "./FolderB/File1.dat",                             
         "./FolderB/File2.dat",                             
         "./FolderB/File3.dat",                             
         "./FolderB/File4.dat",                             
         "./FolderC/File1.dat",                             
         0
};   
chdir("/tmp/somefolder");  // change to top folder
write_archive(szzipfile, szfilepaths);

Open in new window


the function currently creates a tar archive. calling archive_write_set_format_zip instead of archive_write_set_format_pax_restricted should solve this.

Sara
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
phoffricAuthor Commented:
I looked at that site again, and did not find 'A Basic Write Example' on that page.

The reason why the customer wrote the script with a zip command in it was to avoid programming the chdir() function since it changed the directory of all the other threads.
0
phoffricAuthor Commented:
Just came out of a meeting discussing this. The systems engineer just informed us that the zip file produced has to be compatible with an older zip 2.0 version to properly handle store and deflate. I have no idea why older versions of the customer's utilities have problems with later versions of zip.

I'll just try to get a prototype and hand it to our test team. If it works there, it still may not work for the customer if this libarchive produces a later version of zip.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

phoffricAuthor Commented:
Is there a way to modify your code without the chdir()?
Thanks.
0
phoffricAuthor Commented:
Ok, I drilled down and found 'A Basic Write Example'. Thanks.
0
sarabandeCommented:
I looked at that site again, and did not find 'A Basic Write Example' on that page
.
there is ? link at the site after 'Examples' pointing to https://github.com/libarchive/libarchive/wiki/Examples

The reason why the customer wrote the script with a zip command in it was to avoid programming the chdir() function since it changed the directory of all the other threads.

then, writing the archive should moved to a separate process for example by a fork process. alternatively, you can get the current working directory by getcwd() and use a mutex or critical section for all file creations (all threads) while the directory was changed.

 the zip file produced has to be compatible with an older zip 2.0 version
the libarchive can handle all common archive formats as far as i have read. the following is a table where you see all supported names and the corresponding lib function:

names[] =
{
	{ "7zip",	archive_write_set_format_7zip },
	{ "ar",		archive_write_set_format_ar_bsd },
	{ "arbsd",	archive_write_set_format_ar_bsd },
	{ "argnu",	archive_write_set_format_ar_svr4 },
	{ "arsvr4",	archive_write_set_format_ar_svr4 },
	{ "bsdtar",	archive_write_set_format_pax_restricted },
	{ "cd9660",	archive_write_set_format_iso9660 },
	{ "cpio",	archive_write_set_format_cpio },
	{ "gnutar",	archive_write_set_format_gnutar },
	{ "iso",	archive_write_set_format_iso9660 },
	{ "iso9660",	archive_write_set_format_iso9660 },
	{ "mtree",	archive_write_set_format_mtree },
	{ "mtree-classic",	archive_write_set_format_mtree_classic },
	{ "newc",	archive_write_set_format_cpio_newc },
	{ "odc",	archive_write_set_format_cpio },
	{ "oldtar",	archive_write_set_format_v7tar },
	{ "pax",	archive_write_set_format_pax },
	{ "paxr",	archive_write_set_format_pax_restricted },
	{ "posix",	archive_write_set_format_pax },
	{ "rpax",	archive_write_set_format_pax_restricted },
	{ "shar",	archive_write_set_format_shar },
	{ "shardump",	archive_write_set_format_shar_dump },
	{ "ustar",	archive_write_set_format_ustar },
	{ "v7tar",	archive_write_set_format_v7tar },
	{ "v7",		archive_write_set_format_v7tar },
	{ "xar",	archive_write_set_format_xar },
	{ "zip",	archive_write_set_format_zip },
	{ NULL,		NULL }
};

Open in new window


Sara
0
phoffricAuthor Commented:
>> writing the archive should moved to a separate process for example by a fork process.
   Can't use fork as that spawns the new process. (See OP.)

>> you can get the current working directory by getcwd() and use a mutex or critical section for all file creations (all threads) while the directory was changed.
   Can't do that as many threads are running and we must not stop all threads in a HPC environment. Most of the threads work independently of each other and have file I/O in them. This is legacy code, and they won't be willing to change that.

What has to be done is to be able to write the zip file out without using chdir() and without spawning. Are you saying that this is not possible? If so, then I will fall back to my other work-around (which they rejected) which was to have a small program (or script) startup with the huge program, and have a simple file-based IPC to control the zip file creation using the Linux zip utility. But they are very much against this approach, and insist on using libarchive.

Thanks for you guidance.
0
phoffricAuthor Commented:
>> { "zip",      archive_write_set_format_zip },
Yeah, I just wish there was something about the zip version number. But we won't worry about that for now. Instead we'll just write a zip file and hope that it works. If not, there is still that work-around I just described.

Thanks.
0
sarabandeCommented:
spawning a script is another game than spawning a self-written c program.

if you don't want to spawn you could run the new program as a daemon and pass requests by an arbitrary p2p.

but actually, you don't need the chgdir but can pass absolute paths to the function and strip down filename to relative path after you retrieved the file status using stat function.

Sara
0
sarabandeCommented:
I just wish there was something about the zip version number.
before i found the names table i read a lot of remarks regarding old archive formats. look below "Formats" at the website.

Sara
0
phoffricAuthor Commented:
If we spawn, we crash, even if we use posix_spawn (which really works OK with single-threaded programs).

>> new program as a daemon
Told we are not allowed to do that. As I said, I already suggested a separate small program starting up with the large program and using a simple file-based IPC to kick off zip. That will be our fall-back if needed.

I'll see if the example in the link works. Thanks.
0
sarabandeCommented:
if you do

char * szzipfile = "sample.zip";
char * szfilepaths[] = { "/tmp/somefolder/FolderA/File1.dat", 
         "/tmp/somefolder/FolderA/File2.dat",                             
         "/tmp/somefolder/FolderB/File1.dat",                             
         "/tmp/somefolder/FolderB/File2.dat",                             
         "/tmp/somefolder/FolderB/File3.dat",                             
         "/tmp/somefolder/FolderB/File4.dat",                             
         "/tmp/somefolder/FolderC/File1.dat",                             
         0
};   

write_archive(szzipfile, szfilepaths, "/tmp/somefolder");

Open in new window


and

void write_archive(char * outname, char ** filename, char * rootfolder);
{
      ...
     while (*filename) {
    stat(*filename, &st);
    entry = archive_entry_new(); // Note 2
    // make *filename a relative path
    if (strstr(*filename, rootfolder) == *filename) {
          *filename += strlen(rootfolder)-1;
          (*filename)[0] = '.';
    }      
    archive_entry_set_pathname(entry, (*filename));
    ...

Open in new window


the files were checked using absolute paths but the archive entries are relative paths.

Sara
0
phoffricAuthor Commented:
>> the function currently creates a tar archive. calling archive_write_set_format_zip instead of archive_write_set_format_pax_restricted should solve this

archive_write_set_format_gzip is not defined, but
archive_write_set_format_zip is defined.

But it did not produce a zip formatted file. The zip utility produced a zip file starting with PK (as in pkzip, probably), but the basic program just started out with a filename (with relative path).
0
phoffricAuthor Commented:
I am using version 4.8.3 of libarchive. The latest version 3.1.2. is not compatible with our red hat version (so I am told).
0
phoffricAuthor Commented:
Ooops, I accidentally left in the pax LOC.
Now it works for one file in the archive.
I'll test with a few folders and see how it compares with zip utility.
Thanks for your help.
Appreciate it.
0
phoffricAuthor Commented:
Thanks for the sample code. Looks like I was able to create a good zip file.

Tomorrow, I am going to ask the team whether multiple zips can occur at the same time. Our calling code is thread safe; but I don't know anything about libarchive. Do you know whether libarchive is thread safe? If necessary, I will have to add mutex around our zip creation function. Thanks again.
0
sarabandeCommented:
Do you know whether libarchive is thread safe?
i don't know, but there is no reason to assume it is not. thread-safety could be violenced by static data used in the library or shared data used together with other threads. i can't see any advantage in using static data (buffers) in the library and haven't seen that in the last 30 years beside of 'historical' exceptions (for example strtoken or single-threaded libc). shared data only could be the data you were passing as arguments. if you create all those on the heap or do not call into libarchive from different threads it would be thread-safe from your side as well.

Sara
0
phoffricAuthor Commented:
Arrrrg, I just came across that open() was not thread safe, so may have to use openat. What more gotchas will I find!
0
sarabandeCommented:
open() was not thread safe
are you using open for opening files? why not using fopen?

open() for other streams should not be done in multiple threads since streaming to a shared stream is not thread-safe at all. you always would use a stream exclusive by protecting it with a mutex.

Sara
0
phoffricAuthor Commented:
The open function is the underlying primitive for the fopen and freopen functions, that create streams.
http://www.gnu.org/software/libc/manual/html_node/Opening-and-Closing-Files.html

Well, that is for the GNU C Library. I may look into what our Intel C library has to say about this.
0
phoffricAuthor Commented:
Identifying that sample program and subsequent discussion was very helpful.
Thanks much!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C++

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.