List of Unique values

Posted on 2008-06-24
Last Modified: 2010-04-21
I need (preferably) a Perl script that

- reads input from stdin ... Input is a huge volume of records ... Fields/Columns in each record are tab separated ... Number of fields is not known in advance but all records will have same number of fields.
- accepts column numbers as command line arguments
- outputs all unique values seen in the input for the specified columns

input file
A       22      78      rest
E       22      90      best
A       32      55      lest

./ 1 4
ie, output all unique values in column 1 and column 4 ... output would look something like



While in most cases number of unique values must fit in the memory, there are some cases where they may be too big to fit in ... If such cases can be handled - well and good ... in case such cases cannot be handled, it would be good enough if a message saying "too many values in column n" is displayed.
Question by:sunnycoder
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
LVL 84

Accepted Solution

ozo earned 500 total points
ID: 21862697
perl -alne 'BEGIN{@c=splice @ARGV}$c{$_}{$F[$_-1]}++for@c;END{print join"\n","COLUMN $_",keys %{$c{$_}},""for @c}' 1 4 < input
LVL 45

Author Closing Comment

ID: 31470458
Perfect again ... thanks!!
LVL 45

Author Comment

ID: 21862729
Sorry about my complete unfamiliarity with perl ... how do I convert this above command into a script that accepts arguments.
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

LVL 84

Expert Comment

ID: 21862763
this only keeps one column in memory at a time

perl -alne 'BEGIN{@c=@ARGV; @ARGV=(pop @c)x@c}print"COLUMN ",$c=shift @c and %s=() if 1..1; $s{$_}++||print for $F[$c-1];close ARGV&&print""if eof' 1 4  input
LVL 45

Author Comment

ID: 21862782
Its okay if it keeps all column in memory at the same time ... What I want is to be able to put it in a script

./ 1 4

instead of

perl -alne .....
LVL 84

Expert Comment

ID: 21862824
while( <STDIN> ){
   $c[$c++]{$_}++ for ('',split)[@ARGV];
for( @ARGV ){
   print "COLUMN $_";
   print for keys %{shift @c};
LVL 45

Author Comment

ID: 21862831
perfect ... thanks a ton

Featured Post

Enroll in June's Course of the Month

June’s Course of the Month is now available! Experts Exchange’s Premium Members, Team Accounts, and Qualified Experts have access to a complimentary course each month as part of their membership—an extra way to sharpen your skills and increase training.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It is becoming increasingly popular to have a front-page slider on a web site. Nearly every TV website,  magazine or online news has one on their site, and even some e-commerce sites have one. Today you can use sliders with Joomla, WordPress or …
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question