Link to home
Start Free TrialLog in
Avatar of avoorheis
avoorheis

asked on

Create multiple zip files

I have about 5k files that I want to zip into files containing 500 files. I see that zipping program can span into multiple files, but, that's not exactly what I want to do. I haven't seen any other zipping prgram that can do that, would I have to create some kind of batch file to do that?

thanks
alan
Avatar of Sam654
Sam654
Flag of Australia image

SOLUTION
Avatar of Gastone Canali
Gastone Canali
Flag of Italy image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Bill Prew
Bill Prew

Here's how I would approach it, in a BAT script method.  This will read all files in c:\temp and add to zip files in c:\save, with names of files-n.zip where n will be a number indicating a ZIP file of 500 files.  Edit the two SETs as needed near the top.

@echo off
setlocal EnableDelayedExpansion
set BaseDir=c:\temp
set BaseZip=c:\save\files-
set MaxFiles=500
set FileNumber=0
for %%A in ("%BaseDir%\*.*") do (
  set /A FileNumber += 1
  set /A ZipNumber=!FileNumber! / %MaxFiles
  "c:\Program Files\7-Zip\7z.exe" a -tzip "%BaseZip%!ZipNumber!.zip" "%%A"
)

Open in new window

~bp
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
avoorheis

If you run your batch file from within the same folder where your files are located, you run the risk of adding your batch file to an archive along with your other files therefore, adding the following code will prevent that from happening:

   if not "%%~fa"=="%~f0" (
   :
   )

The revised code below shows where this fits into your batch file.


@echo off
setlocal enabledelayedexpansion
set filecount=1

for %%a in ("c:\yourfiles\*.*") do (
   if not "%%~fa"=="%~f0" (
      set /a archivenumber=filecount/500
      set /a filecount+=1
      "c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-!archivenumber!.zip" "%%a"
   )
)
 
Reviewing Pauls post, it looks like my code will come up one file short in the first ZIP file, but the rest would have 500 files in them.  This is easy to correct as shown below.  I think there is a bug in his code though, which is correct in my code, which starts FileNumber as 0 rather than 1.  Paul, check me on this, we need the 500th file to present to the divide statement as 499 rather than 500, which my changed code below should do.

Other than that the only difference I can see in his code is removing the variables used for the from folder, and the base of the zip filename. While these do reduce the number of lines in the script, I prefer to separate these near the top of the script for ease of change by the poster, and easier future maintenance.  This technique can also increase readability when used properly.

@echo off
setlocal EnableDelayedExpansion
set BaseDir=c:\temp
set BaseZip=c:\save\files-
set MaxFiles=500
set FileNumber=-1
for %%A in ("%BaseDir%\*.*") do (
  set /A FileNumber += 1
  set /A ZipNumber=!FileNumber! / %MaxFiles%
  "c:\Program Files\7-Zip\7z.exe" a -tzip "%BaseZip%!ZipNumber!.zip" "%%A"
)

Open in new window

~bp
paultomasi

==> You'll find my code is in good order and therefore billprew's comment  "I think there is a bug
==> in his code though" is not warranted.

Below is a test output of your code, and as you can see it only adds 499 files to the first ZIP file.  I removed some of the non interesting "duplicate lines" to show the boundary conditions.

"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-0.zip" "c:\yourfiles\file0001.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-0.zip" "c:\yourfiles\file0002.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-0.zip" "c:\yourfiles\file0003.txt"
. . .
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-0.zip" "c:\yourfiles\file0497.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-0.zip" "c:\yourfiles\file0498.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-0.zip" "c:\yourfiles\file0499.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-1.zip" "c:\yourfiles\file0500.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-1.zip" "c:\yourfiles\file0501.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-1.zip" "c:\yourfiles\file0502.txt"
. . .
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-1.zip" "c:\yourfiles\file0997.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-1.zip" "c:\yourfiles\file0998.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-1.zip" "c:\yourfiles\file0999.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-2.zip" "c:\yourfiles\file1000.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-2.zip" "c:\yourfiles\file1001.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-2.zip" "c:\yourfiles\file1002.txt"
. . .
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-2.zip" "c:\yourfiles\file1497.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-2.zip" "c:\yourfiles\file1498.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-2.zip" "c:\yourfiles\file1499.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-3.zip" "c:\yourfiles\file1500.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-3.zip" "c:\yourfiles\file1501.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-3.zip" "c:\yourfiles\file1502.txt"
. . .
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-3.zip" "c:\yourfiles\file1997.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-3.zip" "c:\yourfiles\file1998.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-3.zip" "c:\yourfiles\file1999.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-4.zip" "c:\yourfiles\file2000.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-4.zip" "c:\yourfiles\file2001.txt"
"c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-4.zip" "c:\yourfiles\file2002.txt"

Open in new window

~bp
Yep, that should be:

   set filecount=0

The (revised) code is given below.




@echo off
setlocal enabledelayedexpansion
set filecount=0

for %%a in ("c:\yourfiles\*.*") do (
   set /a archivenumber=filecount/500
   set /a filecount+=1
   "c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-!archivenumber!.zip" "%%a"
)




or,




@echo off
setlocal enabledelayedexpansion
set filecount=0

for %%a in ("c:\yourfiles\*.*") do (
   if not "%%~fa"=="%~f0" (
      set /a archivenumber=filecount/500
      set /a filecount+=1
      "c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-!archivenumber!.zip" "%%a"
   )
)
If this happens again, I'm gonna stick to writing articles only!
I thought that perhaps an independent and unbiassed test result may be useful.

Tested on a folder containing 2501 text files, and run from the root of the C:\ Drive using 7-Zip version 4.65, the following batch file packed 499 of the files into "test-0.zip" and thereafter (with the obvious exception of the final "test-5.zip which contained 2 files), each of the subsequent zip files had 500 files packed into them:

@echo off
setlocal enabledelayedexpansion
set filecount=1

for %%a in ("c:\ZipSource\*.*") do (
   if not "%%~fa"=="%~f0" (
      set /a archivenumber=filecount/500
      set /a filecount+=1
      "c:\program files\7-zip\7z" a -tzip "c:\ZipDest\test_!archivenumber!.zip" "%%a"
   )
)

By comparison, the following batch file executed in the same way and on the same files, packed 500 of the files into the first 4 zip files and packed the last solitary text file into the final zip file on its own:

@echo off
setlocal EnableDelayedExpansion
set BaseDir=c:\ZipSource
set BaseZip=c:\ZipDest\test_
set MaxFiles=500
set FileNumber=-1

for %%A in ("%BaseDir%\*.*") do (
    set /A FileNumber += 1
    set /A ZipNumber=!FileNumber! / %MaxFiles%
    "c:\Program Files\7-Zip\7z.exe" a -tzip "%BaseZip%!ZipNumber!.zip" "%%A"
)

The text files used had the naming convention "File_1.txt" to "File_2501.txt" and each contained a different single line of text in the format "Test File 1" through to "Test File 2501".  The C:\ drive is a Windows XP NTFS volume and the batch file extension was *.cmd.

As expected, the processing slowed down very noticeably as each successive file was packed into the increasingly larger first zip file, speeded up dramatically again as it created the second zip file, slowed down again, etc, etc. Both batch files took exactly the same length of time to process the files.

Conclusion:
The first batch file works exactly as intended, ie. 500 files per zip file. The second batch file skips one file in the first zip file but thereafter packs 500 files into the remaining zip files.

I am not taking sides, just observing.
Aha, the time it took me to test the batch files has meantime concluded the same. I should have waited, but the question was quite an interesting one and I was intrigued to see the differences in the code ;-)
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Now considering Bill's comments there I notice that 7zip has an option @Listfiles which I don't have time to test at the mo. but I assume takes a list of files to add.... In which case then it would be logical it would be faster to create the ZIP potentially with a loop to create a list of the first 500 filenames to a file and use that file 7zip @listfile, then the next etc.

Creating the 500 files at a time could be done with a FOR loop I guess most easily as now.

Not getting inolved in this one but interested in the results :-)

Steve
Spooky
Now that is the strangest coincidence in thought processes.
I had a similar thought to dragon-it's that the use of a listfile could speed things up.  It takes a little extra work, so in my initial approach I took the slightly easier, but more brute force approach of adding each file to the zip one at a time.

I have now written a version of the script which does the zipping in batches of 500, and the performance benefits are substantial.  In a rough test here, my original approach too about 3 minutes to run for 2,003 files, which the new approach took under 20 seconds.

@echo off
setlocal EnableDelayedExpansion

REM Define constants for script execution
set BaseDir=c:\temp\EE26673600\yourfiles
set BaseZip=c:\temp\EE26673600\files-
set TempFile=%TEMP%\%~n0.ctl
set ZipPgm=c:\Program Files\7-Zip\7z.exe
set MaxFiles=500

REM Initialize variables for looping below
set ZipNumber=0
set FileNumber=0

REM Make sure the work file for zipping doesn't exist
if exist "%TempFile%" del "%TempFile%"

REM Loop through all files, process in batches of 500
for %%A in ("%BaseDir%\*.*") do (
  REM Add this file name to the work file of a group of files to zip
  set /A FileNumber += 1
  echo %%A>>"%TempFile%"
  REM See if this is a multiple of 500, if so need to zip these files now
  set /A Remainder = !FileNumber! %% %MaxFiles%
  if !Remainder! EQU 0 (
    REM Increment sequence number in zip file name, zip these files, clear work file
    set /A ZipNumber += 1
    "%ZipPgm%" a -tzip "%BaseZip%!ZipNumber!.zip" @"%TempFile%" > NUL
    del "%TempFile%"
  )
)

REM If any files left to be zipped (less than 500) handle last group now
if exist "%TempFile%" (
  set /A ZipNumber += 1
  "%ZipPgm%" a -tzip "%BaseZip%!ZipNumber!.zip" @"%TempFile%" > NUL
  del "%TempFile%"
)

Open in new window

~bp
Paul

I didn't overlook anything. Bill Prew acknowledged your keen observation and corrected his code while you criticised him for using verbose coding and went on to suggest an alternative that reiterated the same error.

I'm not being critical of mistakes made by either of you, because my first attempt (which was practically the same as Bill Prew's, and therefore made no sense posting) fell right into the same trap. I only realised where it was failing after Bill Prew provided the reason in his comment 34331877 containing his corrected code.

What I AM being critical of is your pig-headed insistance that only your batch file was correct. This you were doing 6 hours after Bill Prew pointed out your reiteration of the same flaw that his had initially suffered from. Your comment 34332943 verged on being - no, it actually was - insulting, but in acknowledging your own slip-up you did so with some flippancy and no hint of apology.

It sure would be nice to see some closure, as you have stated, but you seem to have completely missed the fact that Gastone Canali's batch file worked as requested without necessitating any revisions.

- Gastone's batch file does seem to take a little longer to process the files, but speed hasn't been given as a criterion.
- It exhibits the similar quirk (maybe limited to WinXP) to the other working solutions with regard to which target files are packed into the zip files (ie. the odd sort order where the files are numerically named), but that was never specified as being of concern.
- It generates the default *.7z archive, but that's easily changed to suit by specifying the file type it in the command options and output file name rather than leaving 7-Zip to default.

So, perhaps instead of bulldozing your opinions and seeking "closure" immediately YOU are happy that YOUR suggestion works, you should patiently await the decision of avoorheis who will no doubt take into strong consideration that Gastone Canali's one and only comment (34327337) provided the first working solution straight out of the box.

Bill
Excellent job Bill. I had just reached a clunky "temp file list" result myself but yours is much more fluent and effective.
Avatar of avoorheis

ASKER

sorry for my tardiness in responses (had been on vacation prior to posting, so, am in the process of catching up and then caught a cold). I did try and use bp's code and it worked for what I needed. I'll review the other comments/suggestions soon and post again.
I chose to try bp's code first because it was shorter than Gastone's.

I am hesitant to close the thread, however, since I enjoy seeing different approaches and I always learn from them (I'm a relative novice at batch files)....not to mention the entertainment value of some of the comments.
==> avoorheis

Glad to hear you have gotten some useful, and educational, info from these answers.  Like you I have also gotten some entertainment from them as well.

~bp
avoorheis

1) Just out of curiosity, what filetypes are you compressing (what's their extensionnames)?

2) Are the files in a single folder or are there sub-folders also containing files?

3) Can you specify a full pathname of the folder where your files are located?

4) Can you specify a full pathname of the folder where you want the compressed files to be created?

5) Once the files are compressed, do you want the original files deleted or  processed in some other way?

6) Will you be placing the batch file inside the folder where your files are and running it from there, or will you be running it from another location?

7) Can you confirm you are using 7z.exe to compress your files and that it is located in the C:\Program Files\7-Zip\ folder.

Finally,  you stated "I chose to try bp's code first because it was shorter than Gastone's" however, you did not comment on my code which is shorter still. Please see below.


@echo off
setlocal enabledelayedexpansion
set filecount=0
for %%a in ("c:\yourfiles\*.*") do (
   set /a archivenumber=filecount/500
   set /a filecount+=1
   "c:\program files\7-zip\7z" a -tzip "c:\yourarchives\filename-!archivenumber!.zip" "%%a"
)



NOTE: You may refer to my notes in my previous comment ID 34331765 (above) if you need assistance in getting it to work on your computer.
file types = pdf (however, this seems like it would be a useful routine, so, would be good to be independent of file type)
single folder
c:\reports
c:\reports (or c:\reports\zip would work too)
no, leave original files as is
no, batch file will be in another folder
have 7zip, would be great to figure out how to use winzip, since that is our company preferred program.
haven't had a chance to try your code yet, Paul. But, certainly will. I'm also interested in the idea of using lists, but, not sure I want to maintain them. Maybe a pre-funciton that makes a list(s) of the files in the folder, then, uses that list(s).

One other thing, does anyone know if there is a silent mode for 7zip? I looked, but, didn't find it. Seems that having it show all the stuff it's doing could be slowing things down.
avoorheis

Apologies. Here's a few more questions...

1) Is there a specific name or naming convention you would like to use for naming your compressed files?

Examples:

   Reports-001.zip
   Reports-002.zip
   etc...

   PDF-Report-1.zip
   PDF-Report-2.zip
   etc...

   Or please specify your own...


2) Are you likely to add more PDF files to the c:\reports\ folder on later dates and if so, will they also  need to be added to the existing compressed files?


3) Because you are not deleting the PDF files from c:\reports\ after processing them, will any new PDF files added to the folder at a later date overwrite an existing PDF file will they never be named the same as an existing PDF file?


4) If you intend to run the batch file routinely (after adding additional PDF files to c:\reports\) should these new PDF files be added to the existing compressed files or do you intend to delete all existing compressed files and re-create them again thereby also compressing the newly added PDF files?


5)  Because you have a high volume of PDF files, can you give an example of one of their names or the naming convention used to avoid duplication?
avoorheis - the listfiles option was creating them dynamically - it is just a quicker way to do it, i.e. instead of calling 7zip 500 times saying to add a new file to the zip it creates a text file of the 500 names and then calls 7zip once with the 500 names and says to zip them in one go...

Going back into my observing hole...
Paul, hope you're not trying to get too specific, because, I think this sort of batch file would be useful to at least a few others and I'm sure I might use it in the future for other groups of files.
1. naming convention of files are aaa-aaaa_mm-yy.pdf, where a are alpha characters of just about any lenght, but, typically 3-6 on the first part and 3 - 15 on the second part (there can also be more parts (more than 2 "-"). and mm is 2 digit month and yy is 2 digit year (which is typically last months date).
As far as the compressed files, I think the option to change it is good, but, in this particular case it will usually be like TestReport_mm-yy_n.zip where mm and yy will be the current month/year and n will be the number of the zip file.
2.no
3.old reports deleted, new reports added (new reports will have last months date (mm-yy).
4.will be run monthly, starting from scratch
5.see 1.
==> avoorheis

==> One other thing, does anyone know if there is a silent mode for 7zip? I looked, but, didn't find it.
==> Seems that having it show all the stuff it's doing could be slowing things down.

I don't there is a 7z switch for "quiet mode", but notice I added "> NUL" to the command lines on my last examples.  That will prevent the output of the 7z command from being display (or logged to afile) and speed things up a bit.

If you want specific enhancements to the specific code I have posted (or questions about it) by all means let me know.  I'll hold off on further posts unless you want something more from me.  I enjoyed working on this problem with you.

~bp
'>nul' tacked onto the end of 7zip's command line will silence the output!
 
avoorheis

What is your system's DATE format when you enter the following command:

   date /t

(what does it show?)

Tues 12/14/2010
Hi avoorheis

>>> "would be great to figure out how to use winzip, since that is our company preferred program." <<<

WinZip command line support add-on is free for those with a paid licence and WinZip version 12.1 or later:
http://www.winzip.com/prodpagecl.htm

Download the installer:
http://www.winzip.com/downcl.htm
http://download.winzip.com/wzcline32.exe
If you don't intend to install it for now, just extract it to its own folder using WinZip and you will find the help file "WZCLINE.CHM".
Otherwise just install it to gain Command Line Support for your current WinZip installation.

The help file has the full usage syntax and some very good examples. Just be aware that there are upper and lowercase options of the same letter that do different things, and that it is good practice to enclose file names and paths containing spaces with double quotes. The general usage and many of the options are much the same as 7-Zip, but WinZip has a lot more options available and is more flexible.

One useful WinZip command line option is  -v  which allows you to list files in a zipped file without extracting the archive or having to open it in the WinZip interface.  Useful for testing results. The equivalent 7-Zip command is the letter L.

Another useful WinZip option is    -@listfile   (note the leading - symbol). Used this way in the command more or less does a test run and creates a listing of all the files that WOULD HAVE BEEN zipped if you had left out the  -@listfile  option.  Note the difference between   -@listfile   and   @listfile.  The former CREATES a listing without actually zipping up the files, whereas the latter causes WinZip to USE a named text based file as its list of files to zip up.

Hope this helps.
Bill
just tried bp's batch that creates a temp list, ran fast, worked fine.
Just for kicks, I tried his first one again, sending the output to the nul device...much slower that the list version. I'll close this up soon, have to run to a Dr. app. now.
Let me know if you want more help than "other Bill" provided on usage of WinZip instead of 7Zip.  As he described it's not too hard to use a different archive program.

~bp
Thank you Alan.