Solved

how to sort and remove duplicates with Perl

Posted on 2009-07-07
13
259 Views
Last Modified: 2012-05-07
Have a list like the following:
XY-B
AB_C
ABB
ACB
ABB
XYZ
would like to get
ABB
AB_C
ACB
XY-B
XYZ
Prefer to a one-liner.
0
Comment
Question by:jl66
  • 5
  • 3
  • 3
  • +1
13 Comments
 

Author Comment

by:jl66
ID: 24795198
The list may contain numbers. Sorry for that.
0
 
LVL 39

Assisted Solution

by:Adam314
Adam314 earned 210 total points
ID: 24795236

my %lines;
while(<DATA>) {
	chomp;
	$lines{$_}++;
}
 
print "$_\n" foreach (sort keys %lines);
 
__DATA__
XY-B
AB_C
ABB
ACB
ABB
XYZ

Open in new window

0
 
LVL 13

Expert Comment

by:Carl Bohman
ID: 24795264

perl -ne '$a=$_;s/[^a-zA-Z0-9]//g;$h{$_}=$a;END{print map {$h{$_}} sort keys %h}'

Open in new window

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 13

Assisted Solution

by:Carl Bohman
Carl Bohman earned 70 total points
ID: 24795297
One thing that's not very clear: Do you want to sort ignoring non-alpha and non-numeric or are you just looking for a basic sort?  My last answer assumed you were ignoring the dash and underscore for sorting purposes.  If you want a basic sort, the following should work.
perl -ne 'chomp;push @a, $_;END{print join("\n", sort @a), "\n"}'

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
ID: 24795561
perl -lne '$s{$_}++;END{print for sort keys %s}' list
0
 

Author Comment

by:jl66
ID: 24800412
bouns, the basic sort is good enough.
ozo, if the list is a file, is the one-liner for a file different from the above? I tried to use it for a file, it gave the unexpected result.
0
 
LVL 84

Expert Comment

by:ozo
ID: 24800849
in what way was the result unexpected?
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24801195
The sort that bounsy gives will not remove duplicates.

If you are on unix, you could use uniq and sort commands.
0
 

Author Comment

by:jl66
ID: 24810249
ozo,  if I used yours exactly, I got the error:
perl -lne '$s{$_}++;END{print for sort keys %s}' test.txt
Can't find string terminator "'" anywhere before EOF at -e line 1.
if I replaced ' with ", I got the error:
perl -lne "$s{$_}++;END{print for sort keys %s}"   test.txt
Substitution pattern not terminated at -e line 1.
0
 

Author Comment

by:jl66
ID: 24810260
Adam314, unfortunately the OS is Windows.
0
 

Author Comment

by:jl66
ID: 24810323
bounsy's solution sorted the inputs but kept the duplicates as Adam314 mentioned.
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24810843
On windows, the command ozo gave would be this.  Replace test.txt with the actual input file name.  If you want the output to go to a file instead, the the second command, replacing output.txt with the desired output file name.

It looks like you tried this already.  Did you try it exactly as it is there?  Or could there have been a typo?  Try using copy/paste to make sure there are no typos.


#output to screen:
perl -lne "$s{$_}++;END{print for sort keys %s}" test.txt
 
#output to file:
perl -lne "$s{$_}++;END{print for sort keys %s}" test.txt > output.txt

Open in new window

0
 
LVL 84

Accepted Solution

by:
ozo earned 220 total points
ID: 24811261
perl -lne "$s{$_}++;END{print for sort keys %s}"   test.txt
is correct on windows
did you omit the % on the %s?
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

756 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question