Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

zip all files into seperate archives and then move based on date

Posted on 2009-07-06
17
Medium Priority
?
221 Views
Last Modified: 2013-12-26
I have hundreds of thousands of files all in the same naming convention (EX : DENI_acn_69083_2009093121200.csv)  i need to gzip them and move them to a folder based on its date.  This one would be moved to ../2009/09/31
how could i do this?  right now im just using bash to zip them all.
0
Comment
Question by:THEROMPSTER2000
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 4
  • 3
17 Comments
 
LVL 12

Expert Comment

by:kevin_u
ID: 24787784
here's a bash version to do what you want.
for f in *.csv
do
  echo $f
  a=`expr length "$f" - 16`
  d=`expr substr "$f" $a 8`
  mm=`expr substr $d 5 2`
  yyyy=`expr substr $d 1 4`
  dd=`expr substr $d 7 2`
  dir="../$yyyy/$mm/$dd"
  mkdir -p $dir
  gzip $f
  mv $f.gz $dir
done

Open in new window

0
 

Author Comment

by:THEROMPSTER2000
ID: 24787810
could yopu please explain the lines?  bash would work for me but if you could show me what they mean.  lines 45678 what do they mean?
0
 

Author Comment

by:THEROMPSTER2000
ID: 24787818
*.csv
expr: syntax error
expr: syntax error
expr: syntax error
gzip: *.csv: No such file or directory
./newscript.sh: line 13: /bin/mv: Argument list too long
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:THEROMPSTER2000
ID: 24787841
nevermind on telling me waht it does i figured that out but its giving me a syntax error like i cant use *.csv.....
0
 
LVL 12

Expert Comment

by:kevin_u
ID: 24787922
The last error means there are no CSV files found.
0
 

Author Comment

by:THEROMPSTER2000
ID: 24787933
ok but what about arguement list too long?  there are too many fiels for mv to do anything?
0
 
LVL 12

Expert Comment

by:kevin_u
ID: 24788055
This will fix the problem.

The mv is being done one at a time, but when *.csv appeared, it tried to move all your *.csv.gz that already existed all at once.
for f in *.csv
do
  if [ "$f" = "*.csv" ]
  then
    exit
  fi
  echo $f
  a=`expr length "$f" - 16`
  d=`expr substr "$f" $a 8`
  mm=`expr substr $d 5 2`
  yyyy=`expr substr $d 1 4`
  dd=`expr substr $d 7 2`
  dir="../$yyyy/$mm/$dd"
  mkdir -p $dir
  gzip $f
  mv $f.gz $dir
done

Open in new window

0
 

Author Comment

by:THEROMPSTER2000
ID: 24788791
but then it wouldnt do anything if there were any csv files.
0
 
LVL 12

Expert Comment

by:kevin_u
ID: 24788884
Before I added the little code to stop it from running when there are no csv's, the mv would have found *.csv.gz, which there must have been a bunch of them already there, thus showing the too many arguments message.

The new script will have no such errors, and it will work on any new .csv's in the folder.

If you want it to move the .csv.gz files you already have, we'll need to make a slightly different script.
0
 
LVL 48

Expert Comment

by:Tintin
ID: 24789067
Are the CSV files all in the same folder?
Do the files all have the same length filename?

Note that a for loop using shell globbing will fail on large numbers of files.

Also note that using non-bash builtin commands will significantly slow down the script when you are dealing with large numbers of files.

I'm making the assumption the files are all in one dir.


#!/bin/bash
CSVDIR=/path/to/cvsfiles
NEWDIR=/path/to/newdir
 
cd $CSVDIR
 
find . --maxdepth 1 -name "*.csv" | while read file
do
  d=${file##*_}
  dir=${d:0:4}/${d:4:2}/${d:6:2}
  mkdir -p $NEWDIR/$dir 2>/dev/null
  mv $file $NEWDIR/$dir
  gzip $NEWDIR/$dir/$file
done

Open in new window

0
 

Author Comment

by:THEROMPSTER2000
ID: 24789255
but then would the above script not get the right dates because they are reading three more characters than the original (since they were gzippeD).
0
 

Author Comment

by:THEROMPSTER2000
ID: 24789527
ok guys so what you gave me messed up all my folders...what can i do now?  they are in directories like this now.....
/c02/qualution/Telcordia/_200/90/11/GENI_dci_41285_200901170430.csv.gz
/c02/qualution/Telcordia/_200/90/11/GENI_dci_41286_200901170445.csv.gz
/c02/qualution/Telcordia/_200/90/11/GENI_dci_41287_200901170500.csv.gz
all off by a number.
0
 
LVL 48

Expert Comment

by:Tintin
ID: 24789625
Which scripts did you run and in which order?

I did make one small typo in my script, where --maxdepth should be -maxdepth.

Once that mistake is corrected, it all works fine.  See the following output of my tests.

tintin$ ls -1
DENI_acn_69083_200903291200.csv.gz
DENI_acn_69083_200909301230.csv
DENI_acn_69083_2009093121200.csv
moveit.sh

tintin$ cat moveit.sh
#!/bin/bash
CSVDIR=$(pwd)
NEWDIR=newdir

cd $CSVDIR

find . -maxdepth 1 -name "*.csv" | while read file
do
  d=${file##*_}
  dir=${d:0:4}/${d:4:2}/${d:6:2}
  mkdir -p $NEWDIR/$dir 2>/dev/null
  mv $file $NEWDIR/$dir
  gzip $NEWDIR/$dir/$file
done

tintin$ ./moveit.sh

tintin$ ls -1
DENI_acn_69083_200903291200.csv.gz
moveit.sh
newdir

tintin$ ls -1R newdir/
newdir/:
2009

newdir/2009:
09

newdir/2009/09:
30
31

newdir/2009/09/30:
DENI_acn_69083_200909301230.csv.gz

newdir/2009/09/31:
DENI_acn_69083_2009093121200.csv.gz


0
 

Author Comment

by:THEROMPSTER2000
ID: 24789643
i ran this one not yours :

for f in *.csv
do
  if [ "$f" = "*.csv" ]
  then
    exit
  fi
  echo $f
  a=`expr length "$f" - 16`
  d=`expr substr "$f" $a 8`
  mm=`expr substr $d 5 2`
  yyyy=`expr substr $d 1 4`
  dd=`expr substr $d 7 2`
  dir="../$yyyy/$mm/$dd"
  mkdir -p $dir
  gzip $f
  mv $f.gz $dir
done

ill try yours next.
0
 

Author Comment

by:THEROMPSTER2000
ID: 24795907
ok so your code worked tin tin can you explain to me what each line means and is doing?  Thank you so much.
#!/bin/bash
CSVDIR=/path/to/cvsfiles
NEWDIR=/path/to/newdir
 
cd $CSVDIR
 
find . --maxdepth 1 -name "*.csv" | while read file
do
  d=${file##*_}
  dir=${d:0:4}/${d:4:2}/${d:6:2}
  mkdir -p $NEWDIR/$dir 2>/dev/null
  mv $file $NEWDIR/$dir
  gzip $NEWDIR/$dir/$file
done

Open in new window

0
 
LVL 48

Accepted Solution

by:
Tintin earned 2000 total points
ID: 24797651
Hopefully the first 6 lines are obvious (let me know if they aren't)

Line 7 uses the find command to match all .csv files in the current directory.  The -maxdepth 1 option prevents find from looking in subdirectories (I didn't know if you had subdirectories or not).

The output of the find command is piped to a while read loop to read each file.  When you are dealing with large numbers of files, you can't just do

for file in *.csv

as that will break when you exceed the file globbing (pattern matching) limit.

Line 9 uses the bash builtin expression to delete everything in the $file up to the last _

so if $file was DENI_acn_69083_2009093121200.csv
then $d will be set to 2009093121200.csv

Line 10 uses the bash expression to extract certain portions of the string (like substr).  The first number is the offset and the second number is the number of bytes to extract.

Line 11 creates the new dir and the 2>/dev/null is there to supress any errors if the directory already exists.

0
 

Author Closing Comment

by:THEROMPSTER2000
ID: 31600277
thank you.
0

Featured Post

Enroll in September's Course of the Month

This month’s featured course covers 16 hours of training in installation, management, and deployment of VMware vSphere virtualization environments. It's free for Premium Members, Team Accounts, and Qualified Experts!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
Many old projects have bad code, but the budget doesn't exist to rewrite the codebase. You can update this code to be safer by introducing contemporary input validation, sanitation, and safer database queries.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
The viewer will learn how to count occurrences of each item in an array.

704 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question