Solved

Linux Shell script or Perl script to gunzip & rezip using zip with highest compression

Posted on 2010-11-22
10
1,321 Views
Last Modified: 2012-05-10
I'm running into disk space issue on one archiving filesystem & noticed that
there are lots of *.gz that were not previously gzipped using "gzip -9" to get
the highest compression.

So will need a Shell or Perl script such that it will scan for all *.gz files in a
directory (that's provided as the first parameter) that will gunzip *.gz files
in the given directory one at a time & then zip up that same file with the
highest compression & then proceed to do the same for each *.gz file in
that directory.  Note that we need to do this one file at a time because
if we gunzip all files together, it will fill up the disk space.

I'm on RHES 4.6 & the zip version is :

# zip -v
Copyright (C) 1990-1999 Info-ZIP
Type 'zip "-L"' for software license.
This is Zip 2.3 (November 29th 1999), by Info-ZIP.


Help for the zip is as below:

# zip
Copyright (C) 1990-1999 Info-ZIP
Type 'zip "-L"' for software license.
Zip 2.3 (November 29th 1999). Usage:
zip [-options] [-b path] [-t mmddyyyy] [-n suffixes] [zipfile list] [-xi list]
  The default action is to add or replace zipfile entries from list, which
  can include the special name - to compress standard input.
  If zipfile and list are omitted, zip compresses stdin to stdout.
  -f   freshen: only changed files  -u   update: only changed or new files
  -d   delete entries in zipfile    -m   move into zipfile (delete files)
  -r   recurse into directories     -j   junk (don't record) directory names
  -0   store only                   -l   convert LF to CR LF (-ll CR LF to LF)
  -1   compress faster              -9   compress better
  -q   quiet operation              -v   verbose operation/print version info
  -c   add one-line comments        -z   add zipfile comment
  -@   read names from stdin        -o   make zipfile as old as latest entry
  -x   exclude the following names  -i   include only the following names
  -F   fix zipfile (-FF try harder) -D   do not add directory entries
  -A   adjust self-extracting exe   -J   junk zipfile prefix (unzipsfx)
  -T   test zipfile integrity       -X   eXclude eXtra file attributes
  -y   store symbolic links as the link instead of the referenced file
  -R   PKZIP recursion (see manual)
  -e   encrypt                      -n   don't compress these suffixes
0
Comment
Question by:sunhux
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 3
10 Comments
 
LVL 17

Accepted Solution

by:
sweetfa2 earned 500 total points
ID: 34193916
#!/bin/bash
if [ $# -ne 1 ];
then
    echo "Invalid arguments"
    exit
fi
for file in `find $1 -maxdepth 1 -type f -name "*.gz" -print`
do
  echo $file
  gunzip $file
  zip -9 ${file%%.*}
done

Open in new window

0
 

Author Comment

by:sunhux
ID: 34194233


I think "gzip -9 filename" is supported but not "zip -9 filename" :
Kindly help me get the right syntax as I've been googling but no luck


 # zip -9 ftp_get.log.20090413
        zip warning: missing end signature--probably not a zip file (did you
        zip warning: remember to use binary mode when you transferred it?)

zip error: Zip file structure invalid (ftp_get.log.20090413)
0
 

Author Comment

by:sunhux
ID: 34194242

When I do "man zip", it gave the following on my platform:


# man zip
Formatting page, please wait...
ZIP(1L)                                                               ZIP(1L)

NAME
       zip,  zipcloak,  zipnote,  zipsplit  -  package and compress (archive)
       files

SYNOPSIS
       zip   [-aABcdDeEfFghjklLmoqrRSTuvVwXyz!@$]   [-b path]   [-n suffixes]
       [-t mmddyyyy] [-tt mmddyyyy] [ zipfile [ file1 file2 ...]] [-xi list]

       zipcloak [-dhL] [-b path] zipfile

       zipnote [-hwL] [-b path] zipfile

       zipsplit [-hiLpst] [-n size] [-b path] zipfile

DESCRIPTION
       zip  is a compression and file packaging utility for Unix, VMS, MSDOS,
       OS/2, Windows NT, Minix, Atari and Macintosh, Amiga and Acorn RISC OS.

       It  is analogous to a combination of the UNIX commands tar(1) and com-
       press(1) and is compatible with PKZIP (Phil Katzâs ZIP for MSDOS  sys-
       tems).
0
Visualize your virtual and backup environments

Create well-organized and polished visualizations of your virtual and backup environments when planning VMware vSphere, Microsoft Hyper-V or Veeam deployments. It helps you to gain better visibility and valuable business insights.

 
LVL 26

Expert Comment

by:wilcoxon
ID: 34194296
Very odd.  Your original post indicates that zip accepts -# to indicate compression level.  The man page is also for info-zip which supports -# (-9 for best).
0
 

Author Comment

by:sunhux
ID: 34195152

Assuming I want to zip xx.log, the syntax is
  zip -9 xx.log.zip xx.log
  rm -f xx.log

You need to remove xx.log as zip will not housekeep it, unlike gzip

Can you rewrite the script to cater for the correct syntax as well as removing the
gunzipped log file?
0
 

Author Comment

by:sunhux
ID: 34195211

Sorry to add 2  more requirements, in the statement
   `find $1 -maxdepth 1 -type f -name "*.gz" -print`
can you find only *.gz files that are more than 1MB in size as
I've found there's not much savings for small files & some of
the *.gz files are already gzipped with "gzip -9" which already
gave high compression ratios
0
 
LVL 17

Assisted Solution

by:sweetfa2
sweetfa2 earned 500 total points
ID: 34195688
#!/bin/bash
if [ $# -ne 1 ];
then
    echo "Invalid arguments"
    exit
fi
for file in `find $1 -maxdepth 1 -type f -name "*.gz" -size +1024000000c -print`
do
  echo $file
  gunzip $file
  zip -9 ${file%%.*).zip ${file%%.*}
  rm -f ${file%%.*}
done

Open in new window

0
 

Author Comment

by:sunhux
ID: 34196559

Is there any way to check if the existing *.gz file is already gzipped with "gzip -9 ..."
compression ratio? If this can be checked before doing 'gunzip', that will be perfect.
If this is not possible, it's ok, just let me know & I'll close this thread & award points
0
 
LVL 17

Expert Comment

by:sweetfa2
ID: 34202033
Not that I am aware of
0
 

Author Closing Comment

by:sunhux
ID: 34222743
Ok
0

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

615 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question