Link to home
Start Free TrialLog in
Avatar of labradorchik
labradorchikFlag for United States of America

asked on

Looking for the most efficient way to unzip files using Bash scripting or in a Python program?

Hello, what would be the most efficient way to unzip files using Bash scripting or in a Python program? 

 

I am trying to unzip car_<makemodel>.zip, which includes 2 types of files (car_<makemodel>.shp and car_<makemodel>.shx) but I only need to unzip and use .shp files.   


<makemodel> consist of 9 digit numbers - 3 digits for <make> and 6 digits for <model>. Both are separate variables in each car_<makemodel>.shp file.


Note: There are hundreds of these .shp files and those are large files too, so I am really looking for the most efficient way to unzip these files. Later data processing of these .shp files will be done in a Python program.



What would be an example of this unzip process? 

If anything needs to be clarified, then just please let me know. 


Any help would be really appreciated!

Thank you!

Avatar of Norie
Norie

How large are the files?

Are they all located in the same folder?

Where would you like to extract them to?
Just a small note... If the other processing is to be done in a Python program, then probably also extraction of the file should be done in the same Python program.

There is the standard module zipfile for that purpose (https://docs.python.org/3/library/zipfile.html#module-zipfile). You can get the list of the files zipped inside a zip.

If you do not need to store your extracted files for other processing, you can even read their content without explicitly extracting them first into the extracted file on the disk (it is done behind). See the open method of the ZipFile object (https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.open).
If you actually want to extract the .shp files for later processing, a Bash script would be most straightforward.

I'm assuming that each car_nnnnnnnnn.zip file contains one car_nnnnnnnnn.shp and one car_nnnnnnnnn.shx file.

Be in the directory with all of the zip files, and run the following command to extract all of the car_*.shp files to that directory:
for zipfile in car_*.zip; do pref=${zipfile%.zip}; unzip ${zipfile} ${pref}.shp; done

Open in new window

Avatar of labradorchik

ASKER

Each zip file is about 1GB-2GB. There are probably 70-80 car_<makemodel>.zip files and each zip file includes hundreds of .shp and .shx files (car_<makemodel>.shp and car_<makemodel>.shx) but I only need to unzip and use (process in Python later one) .shp files. The Python program is initiated by Bash script anyway, so I was thinking to unzip these files in Bash first and then process (data calculations/manipulations) them in Python. But if these files can be unzipped more efficiently/faster in Python that's great two. 


as an option in Bash scripting, I was thinking something like this, but I am not sure how to specify that I only want to unzip files with .shp extension 

export DATA1=/root/data1  #directory where car_<makemodel>.zip zipped files are located
export DATA2=/root/data2  #directory where to unzip car_<makemodel>.shp files

# I think wild card (*) will just pick up any files with car_*.zip in that directory
unzip ${DATA1}/car_*.zip -d ${DATA2}

Open in new window


as an another option in Python program, I guess something like this?

import zipfile
# /root/data1 - where zipped fire are located
# /root/data2 - where unzipped files should be placed 

python -m zipfile -e /root/data1/car_*.zip /root/data2

Open in new window

or maybe something like this in Python but also I am not sure how to extract only .shp files 

import zipfile

#picking zip file from the directory
car_*.zip = raw_input("/root/data1:")  
fh = open( car_*.zip , 'rb')
z = zipfile.ZipFile(fh)

#assigning a name to the extracted zip directory
DestPathName = raw_input("/root/data2:")
DestPath = DestPathName +

for name in z.namelist():   
    outpath = DestPath
    z.extract(name, outpath)
fh.close()

Open in new window


I guess I can unzip files in Bash scripting or in a Python program, just need the most efficient method  

and method that works. 


Any help will be appreciated!

Thank you! 





Thank you, Simon!

I just saw your comment.


Yes, your assumption is correct. Each car_nnnnnnnnn.zip file contains one car_nnnnnnnnn.shp and one car_nnnnnnnnn.shx file.


So, that's I think what I am looking for. 


Question:  What if I need to run below bash script (zip_script.sh), for example, in one directory (/root/executables), zip files are in second directory(/root/data1), and then I would like to place those unzipped .shp files in a third directory (/root/data2)?  In this case, how directories in the script would look like for this process? 


I am in /root/executables directory and running zip_script.sh on the command line with sh zip_script.sh command

#!/bin/bash
export DATA1=/root/data1  #directory where car_<makemodel>.zip zipped files are located 
export DATA2=/root/data2  #directory where to unzip car_<makemodel>.shp files

for zipfile in ${DATA1}car_*.zip; 
 do pref=${zipfile%.zip}; 
 unzip ${zipfile} ${DATA2]${pref}.shp; 
done

Open in new window

 Is this correct? 


ASKER CERTIFIED SOLUTION
Avatar of simon3270
simon3270
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial

Couldn't you just do it with a wildcard?


unzip myzipfile.zip "*.shp"

Open in new window


Yes, wildcard would work too!

Getting the 'pref' value in my script would mean that you could immediately call the python script to process that .shp file, so it might still be useful.

Thanks, Simon and everyone! 


Simon, your advices and examples helped a lot! I have tested the code and it worked perfectly! 

Thank you gain!