Solved

List of Unique values

Posted on 2008-06-24
7
732 Views
Last Modified: 2010-04-21
I need (preferably) a Perl script that

- reads input from stdin ... Input is a huge volume of records ... Fields/Columns in each record are tab separated ... Number of fields is not known in advance but all records will have same number of fields.
- accepts column numbers as command line arguments
- outputs all unique values seen in the input for the specified columns

e.g.
input file
A       22      78      rest
E       22      90      best
A       32      55      lest

./myscript.pl 1 4
ie, output all unique values in column 1 and column 4 ... output would look something like

COLUMN 1
A
E

COLUMN 4
rest
best
lest

While in most cases number of unique values must fit in the memory, there are some cases where they may be too big to fit in ... If such cases can be handled - well and good ... in case such cases cannot be handled, it would be good enough if a message saying "too many values in column n" is displayed.
0
Comment
Question by:sunnycoder
  • 4
  • 3
7 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
Comment Utility
perl -alne 'BEGIN{@c=splice @ARGV}$c{$_}{$F[$_-1]}++for@c;END{print join"\n","COLUMN $_",keys %{$c{$_}},""for @c}' 1 4 < input
0
 
LVL 45

Author Closing Comment

by:sunnycoder
Comment Utility
Perfect again ... thanks!!
0
 
LVL 45

Author Comment

by:sunnycoder
Comment Utility
Sorry about my complete unfamiliarity with perl ... how do I convert this above command into a script that accepts arguments.
0
Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

 
LVL 84

Expert Comment

by:ozo
Comment Utility
this only keeps one column in memory at a time

perl -alne 'BEGIN{@c=@ARGV; @ARGV=(pop @c)x@c}print"COLUMN ",$c=shift @c and %s=() if 1..1; $s{$_}++||print for $F[$c-1];close ARGV&&print""if eof' 1 4  input
0
 
LVL 45

Author Comment

by:sunnycoder
Comment Utility
Its okay if it keeps all column in memory at the same time ... What I want is to be able to put it in a script

./myscript.pl 1 4

instead of

perl -alne .....
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
#!/usr/bin/perl
while( <STDIN> ){
   $c=0;
   $c[$c++]{$_}++ for ('',split)[@ARGV];
}
$\=$/;
for( @ARGV ){
   print "COLUMN $_";
   print for keys %{shift @c};
}
0
 
LVL 45

Author Comment

by:sunnycoder
Comment Utility
perfect ... thanks a ton
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

I hope you'll find this tutorial useful and interesting. So let's try to extend Tcl with a new package.  For anyone more deeply interested please check out the book "Practical Programming in Tcl and Tk". It's really one of the best written books abo…
This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

7 Experts available now in Live!

Get 1:1 Help Now