Solved

How would I sort and organize an array of data around 100 files

Posted on 2014-11-01
17
151 Views
Last Modified: 2015-01-13
I was asked a question in an interview to sort over 100 files in a directory. I need sort them by size, by date and by owner. How could I do it in Bash scripting and Python, please help.

thanks.
0
Comment
Question by:Jason Yu
  • 6
  • 6
  • 3
  • +1
17 Comments
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 278 total points
Comment Utility
Is a Perl script acceptable?  If not, you should remove Perl topic from this question.

Is that three different sorts or sorting by those three attributes in order?
0
 
LVL 37

Assisted Solution

by:Gerwin Jansen
Gerwin Jansen earned 111 total points
Comment Utility
Bash:
- Sort by size: ls -S -l
  (or add -r for reverse order)
- By owner: ls -l | sort -k3
- By date: ls -lt
  (or add -r for reverse order)
0
 

Author Comment

by:Jason Yu
Comment Utility
Perl or Python will be fine.

They want I loop through these 100 files and find lines with specific data. how to make a loop check all 100 files?

thanks.
0
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 278 total points
Comment Utility
Assuming all of the files are in a single directory, this will work to check for specific data.  If they aren't then you'll need to use File::Find or some other way of checking multiple dirs.
use strict;
use warnings;
my $dir = shift or die "Usage: $0 dir_to_sort\n";
opendir DIR, $dir or die "could not open dir $dir: $!";
foreach my $fil (grep { -f $_ } readdir DIR) {
    my ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $siz, $atime, $mtime, $ctime) = stat($fil);
    # skip any file that doesn't match the desired attributes
    next unless ($uid == 1001 and $siz == 123_456_789 and $mtime == 128467986459);
    # do whatever processing you want on the files that do match
}
closedir DIR;

Open in new window


If you just want to sort by the three attributes then this will work.
use strict;
use warnings;
my $dir = shift or die "Usage: $0 dir_to_sort\n";
opendir DIR, $dir or die "could not open dir $dir: $!";
my @files = sort cust_sort grep { -f $_ } readdir DIR;
closedir DIR;
sub cust_sort {
    my ($adev, $aino, $amode, $anlink, $auid, $agid, $ardev, $asiz, $aatime, $amtime, $actime) = stat($a);
    my ($bdev, $bino, $bmode, $bnlink, $buid, $bgid, $brdev, $bsiz, $batime, $bmtime, $bctime) = stat($b);
    return ($asiz <=> $bsiz or $amtime <=> $bmtime or $auid <=> $buid);
}

Open in new window


You could also combine the approaches by adding "sort cust_sort" just before the grep on line 5 in the first script and adding the sub cust_sort code from the second script.
0
 

Author Comment

by:Jason Yu
Comment Utility
Hi, Wilcoxon:

thank you very much for your help. Is this Perl? I only know a bit of python, not perl at all.

Can you use python to do this?
0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
Yes, it is Perl (you said Perl was fine).  I'm sure you can use Python to do the equivalent but I don't know Python.
0
 

Author Comment

by:Jason Yu
Comment Utility
Thank you very much.

I am new to both Perl and python. Which language is easy to learn and what is the difference between them?

Any python experts here?
0
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 278 total points
Comment Utility
Perl and Python can mostly do the same things but they go about it somewhat differently (and both have good libraries of modules available).  Which is easier to learn likely depends on you.  The one big thing I hate about Python is that whitespace is relevant (eg the number of spaces matter and a tab is different than a space and code must be indented correctly to reflect nesting).
0
Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

 
LVL 37

Expert Comment

by:Gerwin Jansen
Comment Utility
You've got a working Bash example above, as you requested, did  you try that?
0
 
LVL 25

Assisted Solution

by:clockwatcher
clockwatcher earned 111 total points
Comment Utility
You've got a couple of working examples already and personally if I was on a *nix platform, I'd go with simply using ls as Gerwin posted.

But just because wilcoxon hates whitespace :-D ...  Here's one way to do it with python:
import sys, os, os.path, datetime
directory = sys.argv[1] if len(sys.argv)>1 else "."
entries = dict([(i, os.stat(os.path.join(directory, i))) for i in os.listdir(directory)])

# sorted by modify time
for i in sorted(entries.keys(), key=lambda x: entries[x].st_mtime):
    print("{0}\t{1}".format(i, datetime.datetime.fromtimestamp(entries[i].st_mtime)))

# sorted by size
for i in sorted(entries.keys(), key=lambda x: entries[x].st_size):
    print("{0}\t{1}".format(i, entries[i].st_size))

# sorted by owner 
if os.name=="posix":
    import pwd
    for i in sorted(entries.keys(), key=lambda x: pwd.getpwuid(entries[x].st_uid).pw_name):
        print("{0}\t{1}".format(i, pwd.getpwuid(entries[i].st_uid).pw_name))

Open in new window

0
 

Author Comment

by:Jason Yu
Comment Utility
Hi, all experts:

Thank you all for the posts. I will test the codes tomorrow at work. I appreciate the help and hope you all have a wonderful weekend.
0
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 278 total points
Comment Utility
One thing that's still unclear is did you want to sort by each of the three things separately or all three at once in the order specified?  Currently, the bash and python code sort by each separately while the perl code sorts by all three in a single sort.  I'm sure any of us could provide the "other" code if that's what you really want.
0
 

Author Comment

by:Jason Yu
Comment Utility
I was asked to sort under three conditions. If I wan to increase my skill in sorting and how to manuver data saved in array, which commands should I learn? They asked me blackboard the logic, but I didn't know how to do it.

i want to review the relevant knowledge and commands so that I won't fail again in the future interview. I am preparing two other senior linux system administrator's positions this week.


Thanks a lot.
0
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 278 total points
Comment Utility
Here's code to sort by any single field in perl (even some others not requested in your question).
use strict;
use warnings;
my $dir = shift or die "Usage: $0 dir_to_sort\n";
opendir DIR, $dir or die "could not open dir $dir: $!";
my %files;
# get files list in hash and include stats
foreach my $fil (grep { -f $_ } readdir DIR) {
    # $dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $siz, $atime, $mtime, $ctime = stat($fil)
    $files{$fil} = [stat($fil)];
}
closedir DIR;
# sort by size
my %sort_hash;
# to sort by any other attribute of the file, simply replace 7 with the appropriate index into stat in this foreach
foreach my $fil (keys %files) {
    $sort_hash{$files{$fil}[7]} = [] unless exists($sort_hash{$files{$fil}[7]});
    push @{$sort_hash{$files{$fil}[7]}}, $fil;
}
# sort by numeric ascending - switch to $b <=> $a to do descending (and/or use cmp instead of <=> for string compare)
foreach my $siz (sort { $a <=> $b } keys %sort_hash) {
    print $siz, ":\t", join(', ', @{$sort_hash{$siz}}), "\n";
}

Open in new window

0
 
LVL 37

Assisted Solution

by:Gerwin Jansen
Gerwin Jansen earned 111 total points
Comment Utility
>>  I am preparing two other senior linux system administrator's positions this week.
You mean you are applying for these positions?

If they wanted you to blackboard the logic for sorting then the above python and perl solutions will be of limited use to you. Logic is something that is difficult to teach, it's more a methodical way of thinking on how to solve a problem or issue. A language like perl, python or bash scripting is merely a way of writing down what you've 'designed'. I'm not sure we can help you here.
0
 
LVL 25

Accepted Solution

by:
clockwatcher earned 111 total points
Comment Utility
Interviewer: "How do you sort a group of 100 files?"

Applicant: "I've got no idea but I can get at least three highly skilled individuals who combined probably have over 50 years of experience in the field to tell me for free.  What do you want it in perl, python, ruby, lisp, Z80 assembly, applesoft basic?  Just let me know."  

Interviewer: "Let's get this guy back for the manager position"

On a serious note though, there's no substitute for practice and researching up solutions on your own.  It seems like you were interested in a python solution.  If you were, you could have googled "sort files by size python" or even better stepped back and searched for "python determine file size" and then "python sort" and started to piece together a solution on your own.   In the future, you might want to try that first and then come here with any questions if you can't get it to work.  It doesn't bode well for a future employer (or you to be perfectly honest) if you're not even interested enough to try and come up with a solution on your own but instead turn straight to an answer site for suggestions.  It'll save you time in the short term but in the long run you're just doing yourself a disservice.  

Stepping off my soapbox... and going back to checking on my whitespace.  I think I may have a tab where I meant to have spaces.  Oh screw it.  I'm going back to perl.  I miss ozo and her one-liners.
0
 

Author Comment

by:Jason Yu
Comment Utility
Dear clockwatcher:

Thank you very much for your sincere advise and I am taking it with my all heart.

I will do as you suggested by googling those phrases. Sometime, life is just too short to prepare so much things.

I really like the experts here and you guys set a good example to me. I hope one day I can become an expert in this website and help other people with my skill.


Thank you all very much and have a nice week.
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Introduction On September 29, 2012, the Python 3.3.0 was released; nothing extremely unexpected,  yet another, better version of Python. But, if you work in Microsoft Windows, you should notice that the Python Launcher for Windows was introduced wi…
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now