Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

get multiple file size for date range

Posted on 2013-11-11
3
Medium Priority
?
541 Views
Last Modified: 2013-11-19
Hello,
I want to get total file size from June 01, 2013 till today in HDFS. For example if I have 4 files within this date range(Jun through Nov) with each file being 100KB, I want the output as 400KB. My approach at this point is to perform hadoop fs -ls and get the modification datetime and individual file size. Next step is to exclude all the files that lies outside this range and then sum up the individual file size. Please suggest 1-2 liner approach here. I want to avoid multiple steps here.
Thank You
0
Comment
Question by:Nova17
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 20

Accepted Solution

by:
simon3270 earned 1600 total points
ID: 39641190
An example output from "hadoop fs -ls" would have been useful (I don't have hadoop installed, but this is a scripting exercise rather than a hadoop one).

I believe that it looks like:
drwxr.r.   1 user1 user1       0 2013-06-25 16:45 /user/user1
-rw-r.r.   1 user1 user1       1845 2013-05-25 16:45 /user/user1/file1.lst
-rw-r.r.   1 user1 user1       1322 2012-06-25 16:45 /user/user1/file2.old
-rw-r.r.   1 user1 user1       2241 2013-06-25 16:45 /user/user1/file3.new

with a leading "-" for regular files and d for directories.  In this case, file1.lst and file2.old are too old (before June this year, and last year), and file3.new is new enough (June or later this year).

The following awk script will select only regular files, will discard any with a year earlier than 2013, or a month earlier than June, then add up the sizes of the files left.  It uses "hadoop fs -ls" to return file sizes in bytes; if you tried using the human-readable version (hadoop -fs -ls -h) to get sizes such as 1.4k, it makes the problem *much* harder to solve.
hadoop fs -ls |  awk '/^-/{split($6,a,"-");if ( a[1]< 2013 || a[2] < 6){next};s=s+$5}END{print s}'

Open in new window

If you wanted it in the output to be in, say, kbytes, you could just change the print statement at the end (this version gives kbytes with one decimal place):
hadoop fs -ls |  awk '/^-/{split($6,a,"-");if ( a[1]< 2013 || a[2] < 6){next};s=s+$5}END{printf "%.1fk\n" s/1024}'

Open in new window

or megabytes with 3 decimal places
hadoop fs -ls |  awk '/^-/{split($6,a,"-");if ( a[1]< 2013 || a[2] < 6){next};s=s+$5}END{printf "%.3fM\n" s/1048576}'

Open in new window

0
 
LVL 21

Expert Comment

by:Daniel McAllister
ID: 39643206
This looks like overkill to me:


touch -d "starting date" /tmp/startdate
touch -d "stop date" /tmp/stoptime
SIZEOF=0

find $DIR -newer /tmp/starttime -a ! -newer /tmp/stoptime |
  while read PICKED ; do
   THISSIZE=`stat -c "%s" $PICKED`
   SIZEOF=`expr $SIZEOF + $THISSIZE`
  done

echo "SIZE is $SIZEOF"
exit 0


Dan
IT4SOHO

PS: No debugging that... just banged it out... probably got some details off...
0
 
LVL 20

Expert Comment

by:simon3270
ID: 39643232
I think that you need to use the "hadoop fs -ls" command to read the file system, otherwise a "find"-based system would be quite good (if a little more longwinded than a couple of awk statements).
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Little introduction about CP: CP is a command on linux that use to copy files and folder from one location to another location. Example usage of CP as follow: cp /myfoder /pathto/destination/folder/ cp abc.tar.gz /pathto/destination/folder/ab…
Have you ever been frustrated by having to click seven times in order to retrieve a small bit of information from the web, always the same seven clicks, scrolling down and down until you reach your target? When you know the benefits of the command l…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
Suggested Courses

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question