Solved

Linux Shell script or Perl script to gunzip & rezip using zip with highest compression

Posted on 2010-11-22
10
1,245 Views
Last Modified: 2012-05-10
I'm running into disk space issue on one archiving filesystem & noticed that
there are lots of *.gz that were not previously gzipped using "gzip -9" to get
the highest compression.

So will need a Shell or Perl script such that it will scan for all *.gz files in a
directory (that's provided as the first parameter) that will gunzip *.gz files
in the given directory one at a time & then zip up that same file with the
highest compression & then proceed to do the same for each *.gz file in
that directory.  Note that we need to do this one file at a time because
if we gunzip all files together, it will fill up the disk space.

I'm on RHES 4.6 & the zip version is :

# zip -v
Copyright (C) 1990-1999 Info-ZIP
Type 'zip "-L"' for software license.
This is Zip 2.3 (November 29th 1999), by Info-ZIP.


Help for the zip is as below:

# zip
Copyright (C) 1990-1999 Info-ZIP
Type 'zip "-L"' for software license.
Zip 2.3 (November 29th 1999). Usage:
zip [-options] [-b path] [-t mmddyyyy] [-n suffixes] [zipfile list] [-xi list]
  The default action is to add or replace zipfile entries from list, which
  can include the special name - to compress standard input.
  If zipfile and list are omitted, zip compresses stdin to stdout.
  -f   freshen: only changed files  -u   update: only changed or new files
  -d   delete entries in zipfile    -m   move into zipfile (delete files)
  -r   recurse into directories     -j   junk (don't record) directory names
  -0   store only                   -l   convert LF to CR LF (-ll CR LF to LF)
  -1   compress faster              -9   compress better
  -q   quiet operation              -v   verbose operation/print version info
  -c   add one-line comments        -z   add zipfile comment
  -@   read names from stdin        -o   make zipfile as old as latest entry
  -x   exclude the following names  -i   include only the following names
  -F   fix zipfile (-FF try harder) -D   do not add directory entries
  -A   adjust self-extracting exe   -J   junk zipfile prefix (unzipsfx)
  -T   test zipfile integrity       -X   eXclude eXtra file attributes
  -y   store symbolic links as the link instead of the referenced file
  -R   PKZIP recursion (see manual)
  -e   encrypt                      -n   don't compress these suffixes
0
Comment
Question by:sunhux
  • 6
  • 3
10 Comments
 
LVL 17

Accepted Solution

by:
sweetfa2 earned 500 total points
Comment Utility
#!/bin/bash
if [ $# -ne 1 ];
then
    echo "Invalid arguments"
    exit
fi
for file in `find $1 -maxdepth 1 -type f -name "*.gz" -print`
do
  echo $file
  gunzip $file
  zip -9 ${file%%.*}
done

Open in new window

0
 

Author Comment

by:sunhux
Comment Utility


I think "gzip -9 filename" is supported but not "zip -9 filename" :
Kindly help me get the right syntax as I've been googling but no luck


 # zip -9 ftp_get.log.20090413
        zip warning: missing end signature--probably not a zip file (did you
        zip warning: remember to use binary mode when you transferred it?)

zip error: Zip file structure invalid (ftp_get.log.20090413)
0
 

Author Comment

by:sunhux
Comment Utility

When I do "man zip", it gave the following on my platform:


# man zip
Formatting page, please wait...
ZIP(1L)                                                               ZIP(1L)

NAME
       zip,  zipcloak,  zipnote,  zipsplit  -  package and compress (archive)
       files

SYNOPSIS
       zip   [-aABcdDeEfFghjklLmoqrRSTuvVwXyz!@$]   [-b path]   [-n suffixes]
       [-t mmddyyyy] [-tt mmddyyyy] [ zipfile [ file1 file2 ...]] [-xi list]

       zipcloak [-dhL] [-b path] zipfile

       zipnote [-hwL] [-b path] zipfile

       zipsplit [-hiLpst] [-n size] [-b path] zipfile

DESCRIPTION
       zip  is a compression and file packaging utility for Unix, VMS, MSDOS,
       OS/2, Windows NT, Minix, Atari and Macintosh, Amiga and Acorn RISC OS.

       It  is analogous to a combination of the UNIX commands tar(1) and com-
       press(1) and is compatible with PKZIP (Phil Katzâs ZIP for MSDOS  sys-
       tems).
0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
Very odd.  Your original post indicates that zip accepts -# to indicate compression level.  The man page is also for info-zip which supports -# (-9 for best).
0
 

Author Comment

by:sunhux
Comment Utility

Assuming I want to zip xx.log, the syntax is
  zip -9 xx.log.zip xx.log
  rm -f xx.log

You need to remove xx.log as zip will not housekeep it, unlike gzip

Can you rewrite the script to cater for the correct syntax as well as removing the
gunzipped log file?
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 

Author Comment

by:sunhux
Comment Utility

Sorry to add 2  more requirements, in the statement
   `find $1 -maxdepth 1 -type f -name "*.gz" -print`
can you find only *.gz files that are more than 1MB in size as
I've found there's not much savings for small files & some of
the *.gz files are already gzipped with "gzip -9" which already
gave high compression ratios
0
 
LVL 17

Assisted Solution

by:sweetfa2
sweetfa2 earned 500 total points
Comment Utility
#!/bin/bash
if [ $# -ne 1 ];
then
    echo "Invalid arguments"
    exit
fi
for file in `find $1 -maxdepth 1 -type f -name "*.gz" -size +1024000000c -print`
do
  echo $file
  gunzip $file
  zip -9 ${file%%.*).zip ${file%%.*}
  rm -f ${file%%.*}
done

Open in new window

0
 

Author Comment

by:sunhux
Comment Utility

Is there any way to check if the existing *.gz file is already gzipped with "gzip -9 ..."
compression ratio? If this can be checked before doing 'gunzip', that will be perfect.
If this is not possible, it's ok, just let me know & I'll close this thread & award points
0
 
LVL 17

Expert Comment

by:sweetfa2
Comment Utility
Not that I am aware of
0
 

Author Closing Comment

by:sunhux
Comment Utility
Ok
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now