Link to home
Start Free TrialLog in
Avatar of bitstreaminc
bitstreaminc

asked on

Zip up a collection of files into multiple 20mb (close enough works) parts and each individual part is self contained.

Zip or any other software
Zip up a collection of files into multiple 20mb (close enough works) parts and each individual part is self contained.
Each individual piece can be extracted separately, without needing any of the other pieces.
Avatar of Paul MacDonald
Paul MacDonald
Flag of United States of America image

7-Zip (and I believe WinZip before it) will let you specify an archive size, splitting up the larger archive into smaller ones.

The self-contained bit depends on what you mean, but if you're expecting each smaller archive to be its own ZIP file, I'm not sure that's possible.  You're asking the ZIP program to "fit" your files into archives of certain sizes and I believe that's beyond what these two do.  In other words, you'd have to have all the smaller archives together in order to open the whole archive.  In order to achieve what you're looking for, I think you'd have to have a way to "pre-bundle" your data in to suitable chunks.
Avatar of Bill Prew
Bill Prew

With 7-ZIP it calls these "volumes" and you can do it from the GUI or command line programs.

User generated image
User generated image
»bp
Avatar of bitstreaminc

ASKER

Thanks for the replies, I am aware of the split volume functionality.
I need each zip file to be self contained and not incremental pieces that rely on each other for unzipping.
It doesn't have to be zip, it can be any other piece of software and zip may not even be able to do this.
Thank you for the replies.
Maybe it's possible to split files in increments?
For example if you have 100 files and you would like to zip them into 4 zip packages of 25.

Could you zip up the first 25, then the next 25 (another zip file) and so on, into self containing zips without manual intervention?
4 zip files, each with 25 files inside, and each one does not depend on the other for unzipping.
> a collection of files

How will you specify the source files — individual file names? a folder? a folder and its subfolders? multiple folders not sharing the same root folder?

Do you want a CLI, GUI, or both?

Regards, Joe

Update: Just saw your last comment. It's doable both ways — by either size, such as 20MB as you mentioned in the original question, or by count, as you mentioned in your most recent post.
Specify folder (a sub folder might be needed in some instances.)


Cli or gui is fine, whatever automates the process best.

Thanks
What naming convention do you want for the multiple output ZIP files?

Where do you want the output ZIP files stored — source root folder? specifiable output folder? fixed location, such as AppData or Desktop?
name convention = Name I provide + 001, 002, 003 etc is fine.
output path = path I provide .


Thank you
OK, here are my proposed specs for such a program (enhanced/generalized a bit beyond your comments):

• Allow specification of a maximum size (in MB) for each output archive file and/or a maximum number of input files to be stored in each output archive file (default is no maximum, i.e., all input files will be stored in a single archive file).

• Archive all files in a specified source folder, with an option to include subfolders (default is root folder only). Create multiple archive files, as needed, to meet the maximum size and/or count specification discussed above. Each output archive file must be independent, i.e., capable of being opened by itself without any requirement for the presence of the other archive files.

• Provide an option to specify the first component in the archive file names (default is the file name of the source folder).

• Provide an option to use a date/time stamp, with hyphens for separators (-YYYYMMDD_HHmmss-), as the second component in the archive file names (this guarantees unique names, since the time includes seconds).

• Use an N-digit number (where N is specifiable, with a default of 3), preceded by a hyphen, as the final component in the archive file names (e.g., -001, -002, -003, etc.).

• Allow specification of a destination folder where all archive files will be created.

• Use 7-Zip as the archiving tool.

• Provide choices for archive type: 7z, BZIP2, GZIP, TAR, WIM, XZ, ZIP (default is ZIP).

• Provide both a Command Line Interface (CLI) and Graphical User Interface (GUI).

Let me know if you think I missed anything, or got anything wrong. Regards, Joe
Nailed it!
Here's a starting point for you if you want to take a BAT approach, and zip based on number of files rather than a size threshold.  Adjust folders and settings near top.

@echo off
setlocal EnableDelayedExpansion

rem Define folders and options
set BaseDir=B:\EE\EE29081915\Files
set ZipDir=B:\EE\EE29081915\Zips
set ZipExe=C:\_pf\7-Zip\7z.exe
set IncludeSubDirs=N
set MaxFiles=10


rem Slightly different loops if subfolders included, process each folder
if "%IncludeSubDirs%" EQU "Y" (
    for /r "%BaseDir%" %%D in (.) do (
        call :ProcessDir "%%~dpnxD"
    )
) else (
    for /d %%D in ("%BaseDir%\*.*") do (
        call :ProcessDir "%%~D"
    )
)

exit /b

:ProcessDir [directory-path]
    rem Initialize variables
    set ZipNum=0
    set FileNum=0
    set FileList=

    rem Process all files in folder
    for %%F in ("%~1\*.*") do (
        rem Add to list
        set /a FileNum+=1
        set FileList=!FileList! "%%~F"

        rem When we hit max archive this group of filess
        if !FileNum! EQU %MaxFiles% (
            set /a ZipNum+=1
            "%ZipExe%" a "%ZipDir%\%~nx1-!ZipNum!.zip" !FileList! -tzip 1>NUL
            set FileNum=0
            set FileList=
        )
    )

    rem See if we have any files waiting to be zipped
    if !FileNum! GTR 0 (
        set /a ZipNum+=1
        "%ZipExe%" a "%ZipDir%\%~nx1-!ZipNum!.zip" !FileList! -tzip 1>NUL
    )

    exit /b

Open in new window



»bp
The batch files works excellent with number for files.
Would number of bytes be possible to include in a zip instead of file?
I considered number of bytes and then decided it was too much work for too little reward.  Here was my thinking.

There are two sizes that could be used: either the size of the files before zipping, or the size of the ZIP after compression is applied.  

The latter of these would be hard to impossible to do because you don't know the compressed size of the group of files until you ZIP them, you can't predict the size they will take in the ZIP.  So if we said make the ZIP files all 20MB or less, there's no way to know if a group of files will ZIP under that limit or not.  So it would be a massive trial and error to find combinations of files that ZIPped to the desired size.

In the other case, looking at sizes before ZIPping, that's almost as bad, unless you accept huge variances and under filled ZIP files.  Assuming the input files to be ZIPped are not all the same size, then working through them one at a time we would keep tallying a cumulative size, and when that hits the threshold, ZIP them up.  But you could have a few very small files first, all good, and then hit a 50MB file.  We check the tally, see it would put us over the threshold, and ZIP the small files, ending up with a very small ZIP file.  Or we include the last file that put us over, and this ZIP could be many times larger than the others.  Didn't feel like it would yield great results.

In theory you could take all the files at once, consider all their sizes and try to optimize the groupings to fall just under the max, but that would be a bit complex, and I wouldn't go near that in a BAT script.

Hope that helps you understand my thinking, not saying it's "right", just where I ended up and so I figured something was better than nothing and took a shot at the number of files approach.  Understand it may not meet your need.  Speaking of which, what is the motivation for this chopping up anyway?


»bp
Compression isn't needed at all, so the file size tally can be done at the beginning.
I'm wondering if there is something that can be done, were it can sort by file size?

Sorting from largest to smallest and then just zip 50mb increments?

Is that something that can be done without a lot of work?
No one file can be larger than 50MB?

And say you have 3 files of 30MB, and 3 files of 5MB.  You are okay with each of the 30MB files taking a separate ZIP, and the the three 5MB files taking another ZIP.  Whereas in theory they could have fit in a total of three ZIP files, this would use four?

So, should no compression be done during the zip command, that will process faster typically, but naturally the ZIP files could be larger.


»bp
If you scan the folder in order of largest to smallest file size.
First 3 files are 30 megs and then the next 3 are 5mb.
The first 3 zips are each a 30mb file and then the next zip has 3 files of 5mb, that is fine.

That would work without compression.
ASKER CERTIFIED SOLUTION
Avatar of Bill Prew
Bill Prew

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
This is a winner,  thank you for all your help!
Great job on the batch files, it would of taken me days to write something half baked.
Once again, thank you for your time and great work!
Nailed it!
Good to know that we have the specs nailed down — and I see that Bill has come up with another one of his superb batch scripts! So I'll put the specs for the full program on the back-burner. Regards, Joe
Welcome, glad that was useful.


»bp
One issue I ran into.
It started failing on some directories.
So it would create files test01.zip, test02.zip, test04.zip
It would skip a number and the archives are missing half the files of a directory of 300 files.
So out of 300 files it would only zip 150 of them.
Could there be some issue with a directory that has over a few hundred files?
Well, the only time it increments the zip number is right before the actual zipping, so it seems like if the number bumped up it should have zipped.  Only other thing could be if the zip is failing for some reason.  Try a test with out the 1>NUL on the end (as below) and see if it says anything interesting.  You will get a lot of output, but it will include the zip command output so look at those for possible problems.

set /a ZipNum+=1
"%ZipExe%" a "%ZipDir%\%~nx1-!ZipNum!.zip" !FileList! -tzip -mx0

Open in new window


»bp
Files read from disk: 13
Archive size: 12580984 bytes (12 MiB)
Everything is Ok

I don't get individual errors on files.

Something odd going on, but it's not the script.
The files I am trying to zip up are tifs, pdf's etc and nothing odd in the filenames.
I was able to zip up a windows directory with no issues.
I have to look further into what is the issue with these files.
Okay, keep me posted...


»bp
Figured it out..
I used magic file renamer:  http://www.finebytes.com/mfr/
Used the "Cleaner" option to strip all odd character "!"#$%&'()*+,/:;<=>?@[]\^`{}|~_ "
including spaces.
Now it works like a champ!   Thanks for all your help.
Ah yes, some special characters can upset the BAT script parser, it's not as flexible as some scripting languages...


»bp