Link to home
Start Free TrialLog in
Avatar of phoffric
phoffric

asked on

rhel6 C system() call to zip to archive has problems

Someone at work described this problem on rhel 6, in C. They ran the system() function to run zip to archive a tree. But it returned an error. They said that system() forks copying the huge number of memory pages. Because of the large amount of memory used, there is not enough memory to accommodate the fork.

They also looked into vfork(), but I am not sure what was the problem when they used it.

Is there a minimum resource method to get the zip program to run?

I suggested getting integrating open source zip code and zip up the files without needing system() call. But that process takes many months to get through legal department. And they needed this done yesterday.

I also suggested having another small separate program running that would handle the system() call; but they are against that, as they spent years consolidating multiple programs into a single multi-threaded process to enhance performance and easier configuration management control.

After some research, they learned that posix spawn worked. But when using the TotalView debugger to resolve other issues, the TotalView debugger crashes when the posix spawn instruction is reached. That is unacceptable. (TotalView debugger is used because it handles C, C++, and Fortran with multiple thread control very well - up to this latest system() call addition.)

BTW, our process does not use swap space due to performance issues.

All I can do is to forward your suggestions to the group working this problem and see if they will buy into this.

At this point, the lead is looking into using Valgrind to see if there is some new memory corruption that may be causing the crash.

If they reject the suggestions you make, I may not be able to judge the quality of the solutions offered. (Sorry in advance about that, if that occurs.)

Regards,
Paul
Avatar of arnold
arnold
Flag of United States of America image

Not sure what it is you need to complete. What are you planing on passing to the external ZIP command?
Is calling a shell script that will run the zip/compress an option?

While I think I understand what you are after, i.e. you have a daemon/application written in C that needs to compress "tree" as being a directory tree?
With what credentials/rights is the C app is running and could that be what prevents.

Is the compression process a requirement for the C app or can that be managed through shell scripts, etc.? Is the compressed archive needed by the C app?
ASKER CERTIFIED SOLUTION
Avatar of evilrix
evilrix
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of phoffric
phoffric

ASKER

>> What are you planning on passing to the external ZIP command? Is calling a shell script that will run the zip/compress an option?
The customer gave us a C module that has a system() call to run their script which executes zip to archive our tree structure containing many folders/files.

I was told that the problem with system() or fork/exec is that another set of page tables are reserved for the spawned process. Since the parent program uses almost all available server memory, there is no more memory available as swap space is not an option.

I thought a simple work-around was to have a small program run the script; and it would be kicked off by some simple IPC from the main program. But this group that I joined recently is against  having more than one program. (On Monday when I get back, I will talk to the lead to see if I can convince him of the simple work-around.)

>> Most open source libraries have standard licenses. Does your legal team you account for this?
Yes. I explained the time constraint in the OP. (As you would expect legal to do, they thoroughly check the licenses to be in full compliance of the law. I doubt that it takes legal much time to do the actual work, but their queue is probably huge given that their inputs comes from very many divisions. So, it takes about 3 months to get approval.)

>> With what credentials/rights is the C app is running and could that be what prevents.
Well, they tell me that the posix spawn functionally does the job, but they just have problems with the TotalView debugger crashing when the posix spawn instruction is executed. I was told that the problem with the system() call is that the fork results in copying the page tables.

>>  I'd use libarchive to do this programmatically
This was the other suggestion I made to them (noted in the OP), but legal then gets involved.

I had overheard them mentioning vfork, but I think they had problems with it (I'll ask on Monday what was the problem.) I looked at vfork just now, and it appears that it should have solved the memory issue.
vfork() is a special case of clone(2).  It is used to create new processes without copying the page tables of the parent process
http://man7.org/linux/man-pages/man2/vfork.2.html

If you know of any lib archive that comes with rhel 6.3 that can create a .zip archive (compression is not necessary), I will see whether it is installed on our development system on Monday. (If not, there is another procedure to try to get it installed.)
>> If you know of any lib archive that comes with rhel 6.3 that can create a .zip archive
In most cases, if you only link to the dynamic and not the static library you don't actually have to include any licence details in your own code. I don't have access to rhel but I'd be very surprised if there wasn't a library you can use. The problem is if you link to the dynamic library your code will only work with that version of the library.

Either way, it's definitely best to do this using a library rather than invoking an external process. Using vfork may work but my concern would be that it's not robust. Also, having code with burnt in execution strings (especially if they use environment variables) can be (*is*) unsafe - especially if running elevated. All it needs is someone to modify the binary in such a way as to execute a different command and, BAM, you've got yourself an exploit waiting to happen.

From the "system()" man page: Do not use system() from a program with set-user-ID or set-group-ID privileges, because strange values for some environment variables might be used to subvert system integrity.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
meaning the option can be or likely is unsafe.
Unfortunately programs often designed/initially used in/on a system that security is not a consideration find their way into an environment where those unneeded security feature are now impacting and creating a vulnerability that will be exposed.
libarchive appears to be in redhat distributions since there are references to it here:
https://bugzilla.redhat.com/show_bug.cgi?id=1169770
I also read that libarchive provides headers archive.h and archive_entry.h, possibly located in /opt/local/include/, so I'll look for them.

>> where those unneeded security feature are now impacting and creating a vulnerability that will be exposed.
Good point.
I found libarchive on our system but without header files. They all want to use it, so managers put in a high priority request to get the latest libarchive 3.1.2 installed.

We have to produce .zip files (uncompressed) given a folder tree. The entire folder tree is to be inserted into the zip file.

We have never used libarchive. Could you please provide us with a C example of how to create a zip file given an input of a folder name? (If C is not available, then will have to go with C++03, and I'll help translate to C.)
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
>> use yum install libarchive-devel
Good to know. Maybe admins did not do that originally and therefore, missing headers.

>> libarchive using tar files not zip files
Hope I am misunderstanding you. We are required to create zip files on the folder tree. Are you saying that we cannot do that? http://www.libarchive.org/ says:

The bsdtar and bscpio command-line utilities are feature- and performance-competitive with other tar and cpio implementations:
•Writes tar, pax, cpio, zip, xar, ar, ISO, mtree, and shar archives.
Looking at this more closely, it appears that this reference to writing a zip file is just a feature of bsdtar and bscpio command-line utilities; and from the above discussion, we are unable to run a script from within our program.

As noted in the OP I had recommended a small single-threaded process that could execute the script, or come to think of it, just a script that was started with the main program; but when this libarchive option was mentioned, they all wanted to do it this way due to some complexities in starting up and especially shutting down the system.

But if you are saying that we cannot easily use a libarchive api to create a zip file, then I will have to report that the only way to solve this problem is to define another process (either a small program that runs a script or just running a script directly).

>> option where you can use a compressing tool gzip
Files are already compressed, so the archiving will not use compression.

Thanks.
>> Github has a code example, are you using C or C++?
I was just told that this piece will be a C++ module.
I wrote:
... and from the above discussion, we are unable to run a script from within our program.
I no longer have an edit button; I meant to change this to:
... It is not clear that we can create zip files unless we are using a command-line utility.
I am a bit unclear on your requirement to create an archive under the zip suffix and without any compression.
Tar is an archive that is uncompressed.

Reread your latest comment, dealing with the task being an archive of compressed files. While compressed archive will most likely achieve any compression, compression padding will likely be attempted as part of that archive process.

Have a look at github.org/libarchive  there are code examples, Double check the libarchive that comes on RHEL 6 which will  be maintained through vendor updates whether FormatZIp .........
>> to create an archive under the zip suffix and without any compression.
We are required to have an uncompressed archive having a zip internal structure.

>> Tar is an archive that is uncompressed.
True, but tar does not have a zip internal structure. If it did, then we could tar, and then rename to give the archive a .zip suffix.

>> Have a look at github.org/libarchive  there are code examples
Thanks, will do.

>>  Double check the libarchive that comes on RHEL 6 which will  be maintained through vendor updates whether FormatZIp .........
Could you please clarify this a bit. Not sure what to do here. Thanks.
FormatZIP is the structure/scheme you would need to code to get zip archive.
>> While compressed archive will most likely achieve any compression, compression padding will likely be attempted as part of that archive process.

The no compression requirement comes from the customer. My understanding is that most zip utilities come with a zero compression ratio. We have a script from the customer that will tell us exactly how they want the zip file produced. We just need to be able to replicate this script action by using a function call that does not fork.
>> FormatZIP is the structure/scheme you would need to code to get zip archive.
Ok, thanks. I did see something like that while scanning archive.h in version 3.1.2, which is what I am trying to get installed.
It might be simpler/straight forward to get the RedHat Vendor included libarchive using yum install versus trying to get the newer libarchive from source ........

Hopefully, it works out for you.
We got libarchive 2.8.3 installed with headers. I looked at the man pages and do not understand the differences between archive_write_disk and archive_write.
WRITING ENTRIES TO DISK
The archive_write_disk(3) API allows you to write archive_entry(3) objects to disk using the same API used by archive_write(3).  The archive_write_disk(3) API is used internally by archive_read_extract(); using it directly can provide greater control over how entries get written to disk.  This API also makes it possible to share code between archive-to-archive copy and archive-to-disk extraction operations.
Any clear clues as to the distinctions? Thanks.
I am unclear what you are asking
When creating an archive you have to decide whether your code will allow the addition of files into an existing archive (read existing archive, add another file/s and close this archive), or it will be creating a new archive everytime. whether you will be "extracting/removing" files from an archive, etc.
The variations of the tasks will dictate which method/mechanism you will need to use for the specified scenario.

The _disk has the disk as the end destination.
the archive_write provides the programmer the option of whether the stream of data ends up in a file on the disk or is being passed to ........
IMHO, it might be .... to use the github.org/libarchive code example to create a single file archive.
Thanks. If you are referring to the minitar example, I built it. I am unclear how to run it.
I would like to be able to create a zip file of a folder's contents (with relative paths), or even given just two files (and then if I have to handle directory recursion, I'll do that later).
The three examples in github.org/libarchive code examples, minitar, tarfilter, and untar, all appear to read an archive. So, I am not sure how to run any of them with a folder specification, and get a zip archive having relative paths of the folder contents.

BTW - although I was told that there was no compression, now that I looked at the customer's original script, it is just using the zip command line: something like $ zip folder1/* folder2/* folder3/*  (although there may have been a -r option for recursion).
There is one example deals with the write example.

tar.gz

I do not believe you provide a directory set, but you have to add each file at a time I.e. List the directories of interest to you adding a file at a time to the archive you are building.
Thanks for confirming that there is no minimal resource to get zip to run. Thanks for suggesting libarchive as a programmatic alternative, and thanks for helping me get the admins to know how to get the header files in place. Since I recommended this alternative approach, they gave the task to me. Looking into details further, I learned that the module is C++, not C, as originally stated.

Thanks again for helping me get on the right track.
I wrapped libarchive into our system, and the delivery has been running smoothly for a year. Thanks again!