asked on
Looking for the most efficient way to unzip files using Bash scripting or in a Python program?
Hello, what would be the most efficient way to unzip files using Bash scripting or in a Python program?
I am trying to unzip car_<makemodel>.zip, which includes 2 types of files (car_<makemodel>.shp and car_<makemodel>.shx) but I only need to unzip and use .shp files.
<makemodel> consist of 9 digit numbers - 3 digits for <make> and 6 digits for <model>. Both are separate variables in each car_<makemodel>.shp file.
Note: There are hundreds of these .shp files and those are large files too, so I am really looking for the most efficient way to unzip these files. Later data processing of these .shp files will be done in a Python program.
What would be an example of this unzip process?
If anything needs to be clarified, then just please let me know.
Any help would be really appreciated!
Thank you!
There is the standard module zipfile for that purpose (https://docs.python.org/3/library/zipfile.html#module-zipfile). You can get the list of the files zipped inside a zip.
If you do not need to store your extracted files for other processing, you can even read their content without explicitly extracting them first into the extracted file on the disk (it is done behind). See the open method of the ZipFile object (https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.open).
I'm assuming that each car_nnnnnnnnn.zip file contains one car_nnnnnnnnn.shp and one car_nnnnnnnnn.shx file.
Be in the directory with all of the zip files, and run the following command to extract all of the car_*.shp files to that directory:
for zipfile in car_*.zip; do pref=${zipfile%.zip}; unzip ${zipfile} ${pref}.shp; done
ASKER
Each zip file is about 1GB-2GB. There are probably 70-80 car_<makemodel>.zip files and each zip file includes hundreds of .shp and .shx files (car_<makemodel>.shp and car_<makemodel>.shx) but I only need to unzip and use (process in Python later one) .shp files. The Python program is initiated by Bash script anyway, so I was thinking to unzip these files in Bash first and then process (data calculations/manipulations) them in Python. But if these files can be unzipped more efficiently/faster in Python that's great two.
as an option in Bash scripting, I was thinking something like this, but I am not sure how to specify that I only want to unzip files with .shp extension
export DATA1=/root/data1 #directory where car_<makemodel>.zip zipped files are located
export DATA2=/root/data2 #directory where to unzip car_<makemodel>.shp files
# I think wild card (*) will just pick up any files with car_*.zip in that directory
unzip ${DATA1}/car_*.zip -d ${DATA2}
as an another option in Python program, I guess something like this?
import zipfile
# /root/data1 - where zipped fire are located
# /root/data2 - where unzipped files should be placed
python -m zipfile -e /root/data1/car_*.zip /root/data2
or maybe something like this in Python but also I am not sure how to extract only .shp files
import zipfile
#picking zip file from the directory
car_*.zip = raw_input("/root/data1:")
fh = open( car_*.zip , 'rb')
z = zipfile.ZipFile(fh)
#assigning a name to the extracted zip directory
DestPathName = raw_input("/root/data2:")
DestPath = DestPathName +
for name in z.namelist():
outpath = DestPath
z.extract(name, outpath)
fh.close()
I guess I can unzip files in Bash scripting or in a Python program, just need the most efficient method
and method that works.
Any help will be appreciated!
Thank you!
ASKER
Thank you, Simon!
I just saw your comment.
Yes, your assumption is correct. Each car_nnnnnnnnn.zip file contains one car_nnnnnnnnn.shp and one car_nnnnnnnnn.shx file.
So, that's I think what I am looking for.
Question: What if I need to run below bash script (zip_script.sh), for example, in one directory (/root/executables), zip files are in second directory(/root/data1), and then I would like to place those unzipped .shp files in a third directory (/root/data2)? In this case, how directories in the script would look like for this process?
I am in /root/executables directory and running zip_script.sh on the command line with sh zip_script.sh command
#!/bin/bash
export DATA1=/root/data1 #directory where car_<makemodel>.zip zipped files are located
export DATA2=/root/data2 #directory where to unzip car_<makemodel>.shp files
for zipfile in ${DATA1}car_*.zip;
do pref=${zipfile%.zip};
unzip ${zipfile} ${DATA2]${pref}.shp;
done
Is this correct?
Couldn't you just do it with a wildcard?
unzip myzipfile.zip "*.shp"
Getting the 'pref' value in my script would mean that you could immediately call the python script to process that .shp file, so it might still be useful.
ASKER
Thanks, Simon and everyone!
Simon, your advices and examples helped a lot! I have tested the code and it worked perfectly!
Thank you gain!
Are they all located in the same folder?
Where would you like to extract them to?