Solved

zip all files into seperate archives and then move based on date

Posted on 2009-07-06
17
213 Views
Last Modified: 2013-12-26
I have hundreds of thousands of files all in the same naming convention (EX : DENI_acn_69083_2009093121200.csv)  i need to gzip them and move them to a folder based on its date.  This one would be moved to ../2009/09/31
how could i do this?  right now im just using bash to zip them all.
0
Comment
Question by:THEROMPSTER2000
  • 10
  • 4
  • 3
17 Comments
 
LVL 12

Expert Comment

by:kevin_u
ID: 24787784
here's a bash version to do what you want.
for f in *.csv

do

  echo $f

  a=`expr length "$f" - 16`

  d=`expr substr "$f" $a 8`

  mm=`expr substr $d 5 2`

  yyyy=`expr substr $d 1 4`

  dd=`expr substr $d 7 2`

  dir="../$yyyy/$mm/$dd"

  mkdir -p $dir

  gzip $f

  mv $f.gz $dir

done

Open in new window

0
 

Author Comment

by:THEROMPSTER2000
ID: 24787810
could yopu please explain the lines?  bash would work for me but if you could show me what they mean.  lines 45678 what do they mean?
0
 

Author Comment

by:THEROMPSTER2000
ID: 24787818
*.csv
expr: syntax error
expr: syntax error
expr: syntax error
gzip: *.csv: No such file or directory
./newscript.sh: line 13: /bin/mv: Argument list too long
0
 

Author Comment

by:THEROMPSTER2000
ID: 24787841
nevermind on telling me waht it does i figured that out but its giving me a syntax error like i cant use *.csv.....
0
 
LVL 12

Expert Comment

by:kevin_u
ID: 24787922
The last error means there are no CSV files found.
0
 

Author Comment

by:THEROMPSTER2000
ID: 24787933
ok but what about arguement list too long?  there are too many fiels for mv to do anything?
0
 
LVL 12

Expert Comment

by:kevin_u
ID: 24788055
This will fix the problem.

The mv is being done one at a time, but when *.csv appeared, it tried to move all your *.csv.gz that already existed all at once.
for f in *.csv

do

  if [ "$f" = "*.csv" ]

  then

    exit

  fi

  echo $f

  a=`expr length "$f" - 16`

  d=`expr substr "$f" $a 8`

  mm=`expr substr $d 5 2`

  yyyy=`expr substr $d 1 4`

  dd=`expr substr $d 7 2`

  dir="../$yyyy/$mm/$dd"

  mkdir -p $dir

  gzip $f

  mv $f.gz $dir

done

Open in new window

0
 

Author Comment

by:THEROMPSTER2000
ID: 24788791
but then it wouldnt do anything if there were any csv files.
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 12

Expert Comment

by:kevin_u
ID: 24788884
Before I added the little code to stop it from running when there are no csv's, the mv would have found *.csv.gz, which there must have been a bunch of them already there, thus showing the too many arguments message.

The new script will have no such errors, and it will work on any new .csv's in the folder.

If you want it to move the .csv.gz files you already have, we'll need to make a slightly different script.
0
 
LVL 48

Expert Comment

by:Tintin
ID: 24789067
Are the CSV files all in the same folder?
Do the files all have the same length filename?

Note that a for loop using shell globbing will fail on large numbers of files.

Also note that using non-bash builtin commands will significantly slow down the script when you are dealing with large numbers of files.

I'm making the assumption the files are all in one dir.


#!/bin/bash

CSVDIR=/path/to/cvsfiles

NEWDIR=/path/to/newdir
 

cd $CSVDIR
 

find . --maxdepth 1 -name "*.csv" | while read file

do

  d=${file##*_}

  dir=${d:0:4}/${d:4:2}/${d:6:2}

  mkdir -p $NEWDIR/$dir 2>/dev/null

  mv $file $NEWDIR/$dir

  gzip $NEWDIR/$dir/$file

done

Open in new window

0
 

Author Comment

by:THEROMPSTER2000
ID: 24789255
but then would the above script not get the right dates because they are reading three more characters than the original (since they were gzippeD).
0
 

Author Comment

by:THEROMPSTER2000
ID: 24789527
ok guys so what you gave me messed up all my folders...what can i do now?  they are in directories like this now.....
/c02/qualution/Telcordia/_200/90/11/GENI_dci_41285_200901170430.csv.gz
/c02/qualution/Telcordia/_200/90/11/GENI_dci_41286_200901170445.csv.gz
/c02/qualution/Telcordia/_200/90/11/GENI_dci_41287_200901170500.csv.gz
all off by a number.
0
 
LVL 48

Expert Comment

by:Tintin
ID: 24789625
Which scripts did you run and in which order?

I did make one small typo in my script, where --maxdepth should be -maxdepth.

Once that mistake is corrected, it all works fine.  See the following output of my tests.

tintin$ ls -1
DENI_acn_69083_200903291200.csv.gz
DENI_acn_69083_200909301230.csv
DENI_acn_69083_2009093121200.csv
moveit.sh

tintin$ cat moveit.sh
#!/bin/bash
CSVDIR=$(pwd)
NEWDIR=newdir

cd $CSVDIR

find . -maxdepth 1 -name "*.csv" | while read file
do
  d=${file##*_}
  dir=${d:0:4}/${d:4:2}/${d:6:2}
  mkdir -p $NEWDIR/$dir 2>/dev/null
  mv $file $NEWDIR/$dir
  gzip $NEWDIR/$dir/$file
done

tintin$ ./moveit.sh

tintin$ ls -1
DENI_acn_69083_200903291200.csv.gz
moveit.sh
newdir

tintin$ ls -1R newdir/
newdir/:
2009

newdir/2009:
09

newdir/2009/09:
30
31

newdir/2009/09/30:
DENI_acn_69083_200909301230.csv.gz

newdir/2009/09/31:
DENI_acn_69083_2009093121200.csv.gz


0
 

Author Comment

by:THEROMPSTER2000
ID: 24789643
i ran this one not yours :

for f in *.csv
do
  if [ "$f" = "*.csv" ]
  then
    exit
  fi
  echo $f
  a=`expr length "$f" - 16`
  d=`expr substr "$f" $a 8`
  mm=`expr substr $d 5 2`
  yyyy=`expr substr $d 1 4`
  dd=`expr substr $d 7 2`
  dir="../$yyyy/$mm/$dd"
  mkdir -p $dir
  gzip $f
  mv $f.gz $dir
done

ill try yours next.
0
 

Author Comment

by:THEROMPSTER2000
ID: 24795907
ok so your code worked tin tin can you explain to me what each line means and is doing?  Thank you so much.
#!/bin/bash

CSVDIR=/path/to/cvsfiles

NEWDIR=/path/to/newdir

 

cd $CSVDIR

 

find . --maxdepth 1 -name "*.csv" | while read file

do

  d=${file##*_}

  dir=${d:0:4}/${d:4:2}/${d:6:2}

  mkdir -p $NEWDIR/$dir 2>/dev/null

  mv $file $NEWDIR/$dir

  gzip $NEWDIR/$dir/$file

done

Open in new window

0
 
LVL 48

Accepted Solution

by:
Tintin earned 500 total points
ID: 24797651
Hopefully the first 6 lines are obvious (let me know if they aren't)

Line 7 uses the find command to match all .csv files in the current directory.  The -maxdepth 1 option prevents find from looking in subdirectories (I didn't know if you had subdirectories or not).

The output of the find command is piped to a while read loop to read each file.  When you are dealing with large numbers of files, you can't just do

for file in *.csv

as that will break when you exceed the file globbing (pattern matching) limit.

Line 9 uses the bash builtin expression to delete everything in the $file up to the last _

so if $file was DENI_acn_69083_2009093121200.csv
then $d will be set to 2009093121200.csv

Line 10 uses the bash expression to extract certain portions of the string (like substr).  The first number is the offset and the second number is the number of bytes to extract.

Line 11 creates the new dir and the 2>/dev/null is there to supress any errors if the directory already exists.

0
 

Author Closing Comment

by:THEROMPSTER2000
ID: 31600277
thank you.
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now