Solved

How to check when zip function completed?

Posted on 2003-12-08
19
411 Views
Last Modified: 2008-03-10
Hi,
I have a perl scrip to zip a folder:
-------------------------------------
tar -cf folder.tar folder;
gzip -f folder.tar;
-------------------------------------
After that I remove the folder and all the files in that folder. However, when there are a lot of files in the folder and need some time to gzip the folder but the process continue running to the following steps to remove the files and folder. This caused the gzip process zip the empty folder.
How to make sure that the gzip is complete before removing the files and folder?
Please help.
Thanks.

Best Regards,
Ivonne
0
Comment
Question by:ivonne1094
  • 8
  • 7
  • 2
  • +1
19 Comments
 
LVL 24

Expert Comment

by:shivsa
ID: 9901317
$check = `gzip -f folder.tar`;
if ($check > 0 ) {
    print "zip is complete \n";
}
0
 

Author Comment

by:ivonne1094
ID: 9901350
Hi shivsa,
Thanks....but how about if the gzip function still running? How should I change my code to let the gzip finish then only removing the folder?
Thanks.
0
 
LVL 24

Expert Comment

by:shivsa
ID: 9901408
u can do more check

$check = `ps -aef | grep gzip`;
if ( $check ~= /folder.tar/ ) {
   print " gzip still running \n";
}
0
 
LVL 20

Expert Comment

by:jmcg
ID: 9901678
I have the impression there is something more going on that you are not telling us. Normally, whether executed from a perl script or from a shell script or from a command line, these commands run to completion before the next command is started or before returning control to the caller. What environment are you running on (or what code are you actually using?) that would permit sequential commands to overlap in time?
0
 

Author Comment

by:ivonne1094
ID: 9902076
Hi jmcg,
The perl script is running in Unix environment.
Please refer to the perl script below:
-----------------------------------------
system "tar -cf folder.tar folder";
system "gzip -f folder.tar";

system "rm folder/*";
rmdir folder;
------------------------------------------
When the folder size is very big, the gzip need some times to zip the folder. However, the result returns invalid zip file because the files in the folder have been removed before the zip process finished.
Now I add "sleep 60" into the perl script to give one minute for zip process.
-----------------------------------------
system "tar -cf folder.tar folder";
system "gzip -f folder.tar";

sleep 60;

system "rm folder/*";
rmdir folder;
------------------------------------------
But this is no good solution to wait for the zip process.
Maybe I shouldn't write the gzip and remove file in above method, please let me know if there
is better solution.
Thanks.

0
 
LVL 20

Expert Comment

by:jmcg
ID: 9904180
This isn't yet making sense to me.

The tar process creates 'folder.tar' from the contents of 'folder'. It does not exit until it has finished. But there has to be sufficient disk space on the device to hold both copies, since the tar file takes up just as much space as the files that go into it.

The gzip command operates on the folder.tar file, which is not affected by any later attempts to remove the contents of folder. But it again requires more space to write the gzipped file.

Could it be that you are running out of space? If you are not checking for errors along the way, that might account for having an incomplete or invalid zip file at the end.

0
 
LVL 3

Expert Comment

by:merphle
ID: 9906415
Instead of

---
system "tar -cf folder.tar folder";
system "gzip -f folder.tar";
sleep 60;
system "rm folder/*";
rmdir folder;
---

try using:

---
system "tar -cf - folder | gzip -9 > folder.tar.gz && rm -rf folder";
---

That should tar the folder and pipe the output to stdout, which is then read by gzip, and compressed into folder.tar.gz. If that whole operation succeeds, then rm -rf folder and all subfolder/directories.
0
 
LVL 3

Expert Comment

by:merphle
ID: 9906429
Note that you will not have the intermediary folder.tar taking up space during the tar/gzip process. Also note that the -9 option to gzip will cause it to take a longer time to process, but the resulting gz file will almost certainly be smaller. You can safely remove -9 if you need the speed (as opposed to the disk space savings).
0
 

Author Comment

by:ivonne1094
ID: 9918514
Hi merphle,
I have tried the command:
---
system "tar -cf - folder | gzip -9 > folder.tar.gz && rm -rf folder";
---
but the forder seems not to be compressed properly and cannot open the zip file.

I found that the perl script terminated without waiting for the gzip process running because the "system" command will execute the process and the perl script continue running.

Is there any other function used to run or execute the any process and the following perl function can wait for the process finish?

Please advise. Thanks a lot.
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 20

Expert Comment

by:jmcg
ID: 9918597
Perl's "system" command on UNIX should not (and _does not_ on any system I've encountered) return before the subprocess exits. Do you have a 'gzip' script that is executed in place of the normal gzip command?

Please give us some supporting evidence for your claims. What happens if you leave out the 'rm' step? Do you then get a working gzip file? How have you determined that gzip continues to run after the "system" call has returned? When you say "cannot open the zip file", what are you using to try to open it? Are there any error messages?

0
 

Author Comment

by:ivonne1094
ID: 9918627
Hi,
Below is the original complete perl script:
------------------------------
#!/usr/local/bin/perl
#script 'create_dir.pl'
#

   $WEBSERVER_NAME="pngedms2";
   $LOCALTIME=localtime;

   open(LOGFILE,">>auto_zip.log") or die "Can't find the auto_zip.log file, Please check the auto zip log file in current dir";
   print LOGFILE "************************************* \n";
   print LOGFILE "$LOCALTIME \n\n";

$ENV{ORACLE_BASE} = "/oracle/app/oracle";
$ENV{ORACLE_SID} = "ppap";
$ENV{ORACLE_HOME} = "/oracle/app/oracle/product/8.1.7";

$ctxdir = "/oracle/app/dl25/web/cgi/eppap";

#chop($WEBSERVER_NAME=`uname -n`);

   if($ENV{'REQUEST_METHOD'} eq "GET" ) {
        $_ = $ENV{'QUERY_STRING'};
        ($one,$two) = split /&/,$_,2;
        ($temp1,$ctg_id )= split /=/,$one,2;
        ($temp2,$user_id)= split /=/,$two,2;

   print "Content-type: text/html\n";
   print "Refresh:1;  URL=http://$WEBSERVER_NAME/eppap/plsql/amd_zip.page_waiting?v_ctg_id=$ctg_id\n\n";

#########################################################
#eval 'use Oraperl; 1;' || die $@ if $] >= 5;
use Oraperl;
#$ENV{'ORACLE_SID'};
$local_db = $ENV{'ORACLE_SID'};
$lda = &ora_login('','dl@eppap','dlbest') || die $ora_errstr;
$csr = &ora_open($lda,"select f.oid, f.upper_file_name
                 from cr_files f
                 where f.it_ctg_id = $ctg_id");

#($oid, $file_name) = &ora_fetch($csr);
 
#########################################################

            $_ = $ctg_id;
            chdir zip;
            mkdir $_;

    print LOGFILE "Folder id: $ctg_id \n";
    print LOGFILE "User: $user_id \n";
    print LOGFILE "Documents: \n";

    while (($oid, $file_name) = &ora_fetch($csr))
    {
        system "$ctxdir/ctx_getfile $file_name";
        print LOGFILE " $file_name \n";
    }

    system "tar -cf $_.tar $_ ";
    system "gzip -f $_.tar";

    # Remove files and folder
    system "rm $_/*";
    rmdir $_;

    print "<HTML>";
    print "<body bgcolor=linen>";
    print "<Strong>";
    print "Please wait while system download and zip the files...";

   }

 close(LOGFILE);

------------------------------
When the command " system "$ctxdir/ctx_getfile $file_name "; " still running (not all the files have been extracted to file system), the perl script already generate the refresh URL :
-----------------------------
print "Refresh:1;  URL=http://$WEBSERVER_NAME/eppap/plsql/amd_zip.page_waiting?v_ctg_id=$ctg_id\n\n";
-----------------------------
and the log file is closed.
SO I suspect the system through the process running outside the perl script and the perl script terminated without waiting for the program running under "system".
0
 
LVL 20

Expert Comment

by:jmcg
ID: 9918886
Yes, that refresh line is printed by the script BEFORE it calls out to the $ctxdir/ctx_getfile" command.

Is there anything interesting in the LOGFILE? Is there anything interesting in the web server log files?

Since it is running as a CGI, are there limits imposed by your httpd daemon on how long a CGI script is allowed to run?
0
 

Author Comment

by:ivonne1094
ID: 9918911
Although the refresh line comes first, but we no problem in other script.
The problem only comes for the system"..."
The logfile will keep the file and folder that have been compressed, and I found the logfile and page refresh before all the files extracted into file system.
The cgi timeout has been set to 30 minutes.
Any command can replace the "system" and wait for the page running?
Please help. THanks a lot.
0
 
LVL 20

Expert Comment

by:jmcg
ID: 9920268
You hold all the high cards in this guessing game. Your model for how "system" works and my model for how it works are apparently completely at odds. I cannot force you to accept my model, that is, that "system" only returns when the command has completed. I must therefore guess at the evidence you've seen that causes you to conclude that "system" returns before the command is completed.

So far, you have been extremely stingy in sharing data that would help us explain the disrepancy. Unless you are more forthcoming, I am unable to help you.

The LOGFILE is created and the refresh line is sent BEFORE any "system" call is made. It is completely consistent that you should see them and also see that the file extraction is going on. On UNIX systems, a file and its contents are observable as soon as the file is opened. You do not have to wait, as on some systems, for the file to be closed.

Please post the contents of the logfile, if it's not too large. Please verify how the amount of free space on the disk in question compares to the total size of the folder contents.

Is it a downloaded ZIP file that is unusable? Do you have access to check the gzipped archive that is on the web server?


0
 
LVL 20

Expert Comment

by:jmcg
ID: 9920323
One more question. Is the refresh URL that you've given

print "Refresh:1;  URL=http://$WEBSERVER_NAME/eppap/plsql/amd_zip.page_waiting?v_ctg_id=$ctg_id\n\n";

pointing back to this same script? I'm wondering if you are repeatedly doing the extract/archive/zip/rm process with successive refreshes.



0
 

Author Comment

by:ivonne1094
ID: 9924830

Sorry if I didn't explain in details about my problem. I am not quite familiar with perl script and facing problem when use the script above and need help to solve some problems.

After few testing and I found out more and moreIf I moved the refresh URL to bottom, the program seems like will wait for the zip files process running. The refresh URL is just opening a page to inform user to download the zip but not extract and zip the folder again.

I thought the "system" caused some problem because from some testing of below code:
--------------------------------
while (($oid, $file_name) = &ora_fetch($csr))
    {
        system "$ctxdir/ctx_getfile $file_name";
        print LOGFILE " $file_name \n";
    }

    system "tar -cf $_.tar $_ ";
    system "gzip -f $_.tar";

    # Remove files and folder
    system "rm $_/*";
    rmdir $_;

----------------------------------
when the program still running the gzip process, some of the files already been removed - system "rm $_/*"; and caused the gzip process return invalid zip files (empty files), so I thought the "system" caused the problem.

The log file only contains information about the zip time, folder id, and files name :
*************************************
Thu Dec 11 00:49:21 2003

Folder id: 423
User: DELPHI
Documents:
 DELPHI ACN 2080 WAIVER PAGE.DOC
 ACN CHANGE NOTICE 12_15_00.DOC
 DELPHI QUALIFIED LETTER LEVEL1.AMDDOC.DOC
 09386968 DL400B90EI PSW ACN 2080.PDF
 ACN NUMBER 2080.TXT
 09386968 4_12_02 APPRVL.PDF
 2077 DELPHI ACN 2080.PDF
 09386968_A DELPHI.PDF
*************************************
Thu Dec 11 00:55:56 2003
 
Folder id: 422
User: DELPHI
Documents:
 09386968 4_12_02 APPRVL.PDF
 DELPHI ACN 2040 WAIVER PAGE.DOC
 DELPHI QUALIFIED LETTER LEVEL1.AMDDOC.DOC
 ACN 2040 SECTIONR LETTER DL400B90EI.DOC
 PSW_00_LEVEL1 0936968 ACN 2040.XLS
 2040X_LETTER.DOC
 ACN 2040 QUAL SUMMARY.PDF
*************************************

0
 
LVL 20

Expert Comment

by:jmcg
ID: 9925038
In the original script, the screen directing the user to download the zip file would appear after a second or two, regardless of the state of the script on the server continuing to first get the files, then archive them, then zip them. Now, with the refresh URL moved to after the "system" calls, the various file operations should be completed before that screen appears. (How long does it typically take now for that screen to appear?)  Do you still need to prevent the user from resending the original request that started the extraction while the files are still being worked on?
0
 

Author Comment

by:ivonne1094
ID: 9926805
The time for page refresh to the URL depends on the size and total no of files to be compressed. If the file is small then the time needed will be little.
Yes, I would like to know how to prevent the user do the extraction again while waiting for the program running. Very appreciate your time and help.
Thanks a lot.
0
 
LVL 20

Accepted Solution

by:
jmcg earned 100 total points
ID: 9932572
You may need to create a lockfile using the value of $ctg_id. This can be removed at the time you remove the folder contents. By checking for the existence of this file before beginning the extraction/archive process, you can avoid overlapping requests that would interfere with each other.
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
pattern matching in perl 2 100
stftime format 4 52
Transforming a Soap message to a simple xml message! 10 135
Regular Expression for URL 10 81
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now