zip all files into seperate archives and then move based on date

I have hundreds of thousands of files all in the same naming convention (EX : DENI_acn_69083_2009093121200.csv)  i need to gzip them and move them to a folder based on its date.  This one would be moved to ../2009/09/31
how could i do this?  right now im just using bash to zip them all.
THEROMPSTER2000Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

kevin_uCommented:
here's a bash version to do what you want.
for f in *.csv
do
  echo $f
  a=`expr length "$f" - 16`
  d=`expr substr "$f" $a 8`
  mm=`expr substr $d 5 2`
  yyyy=`expr substr $d 1 4`
  dd=`expr substr $d 7 2`
  dir="../$yyyy/$mm/$dd"
  mkdir -p $dir
  gzip $f
  mv $f.gz $dir
done

Open in new window

0
THEROMPSTER2000Author Commented:
could yopu please explain the lines?  bash would work for me but if you could show me what they mean.  lines 45678 what do they mean?
0
THEROMPSTER2000Author Commented:
*.csv
expr: syntax error
expr: syntax error
expr: syntax error
gzip: *.csv: No such file or directory
./newscript.sh: line 13: /bin/mv: Argument list too long
0
Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

THEROMPSTER2000Author Commented:
nevermind on telling me waht it does i figured that out but its giving me a syntax error like i cant use *.csv.....
0
kevin_uCommented:
The last error means there are no CSV files found.
0
THEROMPSTER2000Author Commented:
ok but what about arguement list too long?  there are too many fiels for mv to do anything?
0
kevin_uCommented:
This will fix the problem.

The mv is being done one at a time, but when *.csv appeared, it tried to move all your *.csv.gz that already existed all at once.
for f in *.csv
do
  if [ "$f" = "*.csv" ]
  then
    exit
  fi
  echo $f
  a=`expr length "$f" - 16`
  d=`expr substr "$f" $a 8`
  mm=`expr substr $d 5 2`
  yyyy=`expr substr $d 1 4`
  dd=`expr substr $d 7 2`
  dir="../$yyyy/$mm/$dd"
  mkdir -p $dir
  gzip $f
  mv $f.gz $dir
done

Open in new window

0
THEROMPSTER2000Author Commented:
but then it wouldnt do anything if there were any csv files.
0
kevin_uCommented:
Before I added the little code to stop it from running when there are no csv's, the mv would have found *.csv.gz, which there must have been a bunch of them already there, thus showing the too many arguments message.

The new script will have no such errors, and it will work on any new .csv's in the folder.

If you want it to move the .csv.gz files you already have, we'll need to make a slightly different script.
0
TintinCommented:
Are the CSV files all in the same folder?
Do the files all have the same length filename?

Note that a for loop using shell globbing will fail on large numbers of files.

Also note that using non-bash builtin commands will significantly slow down the script when you are dealing with large numbers of files.

I'm making the assumption the files are all in one dir.


#!/bin/bash
CSVDIR=/path/to/cvsfiles
NEWDIR=/path/to/newdir
 
cd $CSVDIR
 
find . --maxdepth 1 -name "*.csv" | while read file
do
  d=${file##*_}
  dir=${d:0:4}/${d:4:2}/${d:6:2}
  mkdir -p $NEWDIR/$dir 2>/dev/null
  mv $file $NEWDIR/$dir
  gzip $NEWDIR/$dir/$file
done

Open in new window

0
THEROMPSTER2000Author Commented:
but then would the above script not get the right dates because they are reading three more characters than the original (since they were gzippeD).
0
THEROMPSTER2000Author Commented:
ok guys so what you gave me messed up all my folders...what can i do now?  they are in directories like this now.....
/c02/qualution/Telcordia/_200/90/11/GENI_dci_41285_200901170430.csv.gz
/c02/qualution/Telcordia/_200/90/11/GENI_dci_41286_200901170445.csv.gz
/c02/qualution/Telcordia/_200/90/11/GENI_dci_41287_200901170500.csv.gz
all off by a number.
0
TintinCommented:
Which scripts did you run and in which order?

I did make one small typo in my script, where --maxdepth should be -maxdepth.

Once that mistake is corrected, it all works fine.  See the following output of my tests.

tintin$ ls -1
DENI_acn_69083_200903291200.csv.gz
DENI_acn_69083_200909301230.csv
DENI_acn_69083_2009093121200.csv
moveit.sh

tintin$ cat moveit.sh
#!/bin/bash
CSVDIR=$(pwd)
NEWDIR=newdir

cd $CSVDIR

find . -maxdepth 1 -name "*.csv" | while read file
do
  d=${file##*_}
  dir=${d:0:4}/${d:4:2}/${d:6:2}
  mkdir -p $NEWDIR/$dir 2>/dev/null
  mv $file $NEWDIR/$dir
  gzip $NEWDIR/$dir/$file
done

tintin$ ./moveit.sh

tintin$ ls -1
DENI_acn_69083_200903291200.csv.gz
moveit.sh
newdir

tintin$ ls -1R newdir/
newdir/:
2009

newdir/2009:
09

newdir/2009/09:
30
31

newdir/2009/09/30:
DENI_acn_69083_200909301230.csv.gz

newdir/2009/09/31:
DENI_acn_69083_2009093121200.csv.gz


0
THEROMPSTER2000Author Commented:
i ran this one not yours :

for f in *.csv
do
  if [ "$f" = "*.csv" ]
  then
    exit
  fi
  echo $f
  a=`expr length "$f" - 16`
  d=`expr substr "$f" $a 8`
  mm=`expr substr $d 5 2`
  yyyy=`expr substr $d 1 4`
  dd=`expr substr $d 7 2`
  dir="../$yyyy/$mm/$dd"
  mkdir -p $dir
  gzip $f
  mv $f.gz $dir
done

ill try yours next.
0
THEROMPSTER2000Author Commented:
ok so your code worked tin tin can you explain to me what each line means and is doing?  Thank you so much.
#!/bin/bash
CSVDIR=/path/to/cvsfiles
NEWDIR=/path/to/newdir
 
cd $CSVDIR
 
find . --maxdepth 1 -name "*.csv" | while read file
do
  d=${file##*_}
  dir=${d:0:4}/${d:4:2}/${d:6:2}
  mkdir -p $NEWDIR/$dir 2>/dev/null
  mv $file $NEWDIR/$dir
  gzip $NEWDIR/$dir/$file
done

Open in new window

0
TintinCommented:
Hopefully the first 6 lines are obvious (let me know if they aren't)

Line 7 uses the find command to match all .csv files in the current directory.  The -maxdepth 1 option prevents find from looking in subdirectories (I didn't know if you had subdirectories or not).

The output of the find command is piped to a while read loop to read each file.  When you are dealing with large numbers of files, you can't just do

for file in *.csv

as that will break when you exceed the file globbing (pattern matching) limit.

Line 9 uses the bash builtin expression to delete everything in the $file up to the last _

so if $file was DENI_acn_69083_2009093121200.csv
then $d will be set to 2009093121200.csv

Line 10 uses the bash expression to extract certain portions of the string (like substr).  The first number is the offset and the second number is the number of bytes to extract.

Line 11 creates the new dir and the 2>/dev/null is there to supress any errors if the directory already exists.

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
THEROMPSTER2000Author Commented:
thank you.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Shell Scripting

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.