[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 223
  • Last Modified:

zip all files into seperate archives and then move based on date

I have hundreds of thousands of files all in the same naming convention (EX : DENI_acn_69083_2009093121200.csv)  i need to gzip them and move them to a folder based on its date.  This one would be moved to ../2009/09/31
how could i do this?  right now im just using bash to zip them all.
0
THEROMPSTER2000
Asked:
THEROMPSTER2000
  • 10
  • 4
  • 3
1 Solution
 
kevin_uCommented:
here's a bash version to do what you want.
for f in *.csv
do
  echo $f
  a=`expr length "$f" - 16`
  d=`expr substr "$f" $a 8`
  mm=`expr substr $d 5 2`
  yyyy=`expr substr $d 1 4`
  dd=`expr substr $d 7 2`
  dir="../$yyyy/$mm/$dd"
  mkdir -p $dir
  gzip $f
  mv $f.gz $dir
done

Open in new window

0
 
THEROMPSTER2000Author Commented:
could yopu please explain the lines?  bash would work for me but if you could show me what they mean.  lines 45678 what do they mean?
0
 
THEROMPSTER2000Author Commented:
*.csv
expr: syntax error
expr: syntax error
expr: syntax error
gzip: *.csv: No such file or directory
./newscript.sh: line 13: /bin/mv: Argument list too long
0
[Webinar] Improve your customer journey

A positive customer journey is important in attracting and retaining business. To improve this experience, you can use Google Maps APIs to increase checkout conversions, boost user engagement, and optimize order fulfillment. Learn how in this webinar presented by Dito.

 
THEROMPSTER2000Author Commented:
nevermind on telling me waht it does i figured that out but its giving me a syntax error like i cant use *.csv.....
0
 
kevin_uCommented:
The last error means there are no CSV files found.
0
 
THEROMPSTER2000Author Commented:
ok but what about arguement list too long?  there are too many fiels for mv to do anything?
0
 
kevin_uCommented:
This will fix the problem.

The mv is being done one at a time, but when *.csv appeared, it tried to move all your *.csv.gz that already existed all at once.
for f in *.csv
do
  if [ "$f" = "*.csv" ]
  then
    exit
  fi
  echo $f
  a=`expr length "$f" - 16`
  d=`expr substr "$f" $a 8`
  mm=`expr substr $d 5 2`
  yyyy=`expr substr $d 1 4`
  dd=`expr substr $d 7 2`
  dir="../$yyyy/$mm/$dd"
  mkdir -p $dir
  gzip $f
  mv $f.gz $dir
done

Open in new window

0
 
THEROMPSTER2000Author Commented:
but then it wouldnt do anything if there were any csv files.
0
 
kevin_uCommented:
Before I added the little code to stop it from running when there are no csv's, the mv would have found *.csv.gz, which there must have been a bunch of them already there, thus showing the too many arguments message.

The new script will have no such errors, and it will work on any new .csv's in the folder.

If you want it to move the .csv.gz files you already have, we'll need to make a slightly different script.
0
 
TintinCommented:
Are the CSV files all in the same folder?
Do the files all have the same length filename?

Note that a for loop using shell globbing will fail on large numbers of files.

Also note that using non-bash builtin commands will significantly slow down the script when you are dealing with large numbers of files.

I'm making the assumption the files are all in one dir.


#!/bin/bash
CSVDIR=/path/to/cvsfiles
NEWDIR=/path/to/newdir
 
cd $CSVDIR
 
find . --maxdepth 1 -name "*.csv" | while read file
do
  d=${file##*_}
  dir=${d:0:4}/${d:4:2}/${d:6:2}
  mkdir -p $NEWDIR/$dir 2>/dev/null
  mv $file $NEWDIR/$dir
  gzip $NEWDIR/$dir/$file
done

Open in new window

0
 
THEROMPSTER2000Author Commented:
but then would the above script not get the right dates because they are reading three more characters than the original (since they were gzippeD).
0
 
THEROMPSTER2000Author Commented:
ok guys so what you gave me messed up all my folders...what can i do now?  they are in directories like this now.....
/c02/qualution/Telcordia/_200/90/11/GENI_dci_41285_200901170430.csv.gz
/c02/qualution/Telcordia/_200/90/11/GENI_dci_41286_200901170445.csv.gz
/c02/qualution/Telcordia/_200/90/11/GENI_dci_41287_200901170500.csv.gz
all off by a number.
0
 
TintinCommented:
Which scripts did you run and in which order?

I did make one small typo in my script, where --maxdepth should be -maxdepth.

Once that mistake is corrected, it all works fine.  See the following output of my tests.

tintin$ ls -1
DENI_acn_69083_200903291200.csv.gz
DENI_acn_69083_200909301230.csv
DENI_acn_69083_2009093121200.csv
moveit.sh

tintin$ cat moveit.sh
#!/bin/bash
CSVDIR=$(pwd)
NEWDIR=newdir

cd $CSVDIR

find . -maxdepth 1 -name "*.csv" | while read file
do
  d=${file##*_}
  dir=${d:0:4}/${d:4:2}/${d:6:2}
  mkdir -p $NEWDIR/$dir 2>/dev/null
  mv $file $NEWDIR/$dir
  gzip $NEWDIR/$dir/$file
done

tintin$ ./moveit.sh

tintin$ ls -1
DENI_acn_69083_200903291200.csv.gz
moveit.sh
newdir

tintin$ ls -1R newdir/
newdir/:
2009

newdir/2009:
09

newdir/2009/09:
30
31

newdir/2009/09/30:
DENI_acn_69083_200909301230.csv.gz

newdir/2009/09/31:
DENI_acn_69083_2009093121200.csv.gz


0
 
THEROMPSTER2000Author Commented:
i ran this one not yours :

for f in *.csv
do
  if [ "$f" = "*.csv" ]
  then
    exit
  fi
  echo $f
  a=`expr length "$f" - 16`
  d=`expr substr "$f" $a 8`
  mm=`expr substr $d 5 2`
  yyyy=`expr substr $d 1 4`
  dd=`expr substr $d 7 2`
  dir="../$yyyy/$mm/$dd"
  mkdir -p $dir
  gzip $f
  mv $f.gz $dir
done

ill try yours next.
0
 
THEROMPSTER2000Author Commented:
ok so your code worked tin tin can you explain to me what each line means and is doing?  Thank you so much.
#!/bin/bash
CSVDIR=/path/to/cvsfiles
NEWDIR=/path/to/newdir
 
cd $CSVDIR
 
find . --maxdepth 1 -name "*.csv" | while read file
do
  d=${file##*_}
  dir=${d:0:4}/${d:4:2}/${d:6:2}
  mkdir -p $NEWDIR/$dir 2>/dev/null
  mv $file $NEWDIR/$dir
  gzip $NEWDIR/$dir/$file
done

Open in new window

0
 
TintinCommented:
Hopefully the first 6 lines are obvious (let me know if they aren't)

Line 7 uses the find command to match all .csv files in the current directory.  The -maxdepth 1 option prevents find from looking in subdirectories (I didn't know if you had subdirectories or not).

The output of the find command is piped to a while read loop to read each file.  When you are dealing with large numbers of files, you can't just do

for file in *.csv

as that will break when you exceed the file globbing (pattern matching) limit.

Line 9 uses the bash builtin expression to delete everything in the $file up to the last _

so if $file was DENI_acn_69083_2009093121200.csv
then $d will be set to 2009093121200.csv

Line 10 uses the bash expression to extract certain portions of the string (like substr).  The first number is the offset and the second number is the number of bytes to extract.

Line 11 creates the new dir and the 2>/dev/null is there to supress any errors if the directory already exists.

0
 
THEROMPSTER2000Author Commented:
thank you.
0

Featured Post

[Webinar] Kill tickets & tabs using PowerShell

Are you tired of cycling through the same browser tabs everyday to close the same repetitive tickets? In this webinar JumpCloud will show how you can leverage RESTful APIs to build your own PowerShell modules to kill tickets & tabs using the PowerShell command Invoke-RestMethod.

  • 10
  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now