Solved

get multiple file size for date range

Posted on 2013-11-11
3
496 Views
Last Modified: 2013-11-19
Hello,
I want to get total file size from June 01, 2013 till today in HDFS. For example if I have 4 files within this date range(Jun through Nov) with each file being 100KB, I want the output as 400KB. My approach at this point is to perform hadoop fs -ls and get the modification datetime and individual file size. Next step is to exclude all the files that lies outside this range and then sum up the individual file size. Please suggest 1-2 liner approach here. I want to avoid multiple steps here.
Thank You
0
Comment
Question by:Nova17
  • 2
3 Comments
 
LVL 19

Accepted Solution

by:
simon3270 earned 400 total points
ID: 39641190
An example output from "hadoop fs -ls" would have been useful (I don't have hadoop installed, but this is a scripting exercise rather than a hadoop one).

I believe that it looks like:
drwxr.r.   1 user1 user1       0 2013-06-25 16:45 /user/user1
-rw-r.r.   1 user1 user1       1845 2013-05-25 16:45 /user/user1/file1.lst
-rw-r.r.   1 user1 user1       1322 2012-06-25 16:45 /user/user1/file2.old
-rw-r.r.   1 user1 user1       2241 2013-06-25 16:45 /user/user1/file3.new

with a leading "-" for regular files and d for directories.  In this case, file1.lst and file2.old are too old (before June this year, and last year), and file3.new is new enough (June or later this year).

The following awk script will select only regular files, will discard any with a year earlier than 2013, or a month earlier than June, then add up the sizes of the files left.  It uses "hadoop fs -ls" to return file sizes in bytes; if you tried using the human-readable version (hadoop -fs -ls -h) to get sizes such as 1.4k, it makes the problem *much* harder to solve.
hadoop fs -ls |  awk '/^-/{split($6,a,"-");if ( a[1]< 2013 || a[2] < 6){next};s=s+$5}END{print s}'

Open in new window

If you wanted it in the output to be in, say, kbytes, you could just change the print statement at the end (this version gives kbytes with one decimal place):
hadoop fs -ls |  awk '/^-/{split($6,a,"-");if ( a[1]< 2013 || a[2] < 6){next};s=s+$5}END{printf "%.1fk\n" s/1024}'

Open in new window

or megabytes with 3 decimal places
hadoop fs -ls |  awk '/^-/{split($6,a,"-");if ( a[1]< 2013 || a[2] < 6){next};s=s+$5}END{printf "%.3fM\n" s/1048576}'

Open in new window

0
 
LVL 20

Expert Comment

by:Daniel McAllister
ID: 39643206
This looks like overkill to me:


touch -d "starting date" /tmp/startdate
touch -d "stop date" /tmp/stoptime
SIZEOF=0

find $DIR -newer /tmp/starttime -a ! -newer /tmp/stoptime |
  while read PICKED ; do
   THISSIZE=`stat -c "%s" $PICKED`
   SIZEOF=`expr $SIZEOF + $THISSIZE`
  done

echo "SIZE is $SIZEOF"
exit 0


Dan
IT4SOHO

PS: No debugging that... just banged it out... probably got some details off...
0
 
LVL 19

Expert Comment

by:simon3270
ID: 39643232
I think that you need to use the "hadoop fs -ls" command to read the file system, otherwise a "find"-based system would be quite good (if a little more longwinded than a couple of awk statements).
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Over the last ten+ years I have seen Linux configuration tools come and go. In the early days there was the tried-and-true, all-powerful linuxconf that many thought would remain the one and only Linux configuration tool until the end of times. Well,…
Linux users are sometimes dumbfounded by the severe lack of documentation on a topic. Sometimes, the documentation is copious, but other times, you end up with some obscure "it varies depending on your distribution" over and over when searching for …
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

829 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question