Solved

how to sort and remove duplicates with Perl

Posted on 2009-07-07
13
251 Views
Last Modified: 2012-05-07
Have a list like the following:
XY-B
AB_C
ABB
ACB
ABB
XYZ
would like to get
ABB
AB_C
ACB
XY-B
XYZ
Prefer to a one-liner.
0
Comment
Question by:jl66
  • 5
  • 3
  • 3
  • +1
13 Comments
 

Author Comment

by:jl66
Comment Utility
The list may contain numbers. Sorry for that.
0
 
LVL 39

Assisted Solution

by:Adam314
Adam314 earned 210 total points
Comment Utility

my %lines;

while(<DATA>) {

	chomp;

	$lines{$_}++;

}
 

print "$_\n" foreach (sort keys %lines);
 

__DATA__

XY-B

AB_C

ABB

ACB

ABB

XYZ

Open in new window

0
 
LVL 13

Expert Comment

by:Carl Bohman
Comment Utility

perl -ne '$a=$_;s/[^a-zA-Z0-9]//g;$h{$_}=$a;END{print map {$h{$_}} sort keys %h}'

Open in new window

0
 
LVL 13

Assisted Solution

by:Carl Bohman
Carl Bohman earned 70 total points
Comment Utility
One thing that's not very clear: Do you want to sort ignoring non-alpha and non-numeric or are you just looking for a basic sort?  My last answer assumed you were ignoring the dash and underscore for sorting purposes.  If you want a basic sort, the following should work.
perl -ne 'chomp;push @a, $_;END{print join("\n", sort @a), "\n"}'

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
perl -lne '$s{$_}++;END{print for sort keys %s}' list
0
 

Author Comment

by:jl66
Comment Utility
bouns, the basic sort is good enough.
ozo, if the list is a file, is the one-liner for a file different from the above? I tried to use it for a file, it gave the unexpected result.
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 84

Expert Comment

by:ozo
Comment Utility
in what way was the result unexpected?
0
 
LVL 39

Expert Comment

by:Adam314
Comment Utility
The sort that bounsy gives will not remove duplicates.

If you are on unix, you could use uniq and sort commands.
0
 

Author Comment

by:jl66
Comment Utility
ozo,  if I used yours exactly, I got the error:
perl -lne '$s{$_}++;END{print for sort keys %s}' test.txt
Can't find string terminator "'" anywhere before EOF at -e line 1.
if I replaced ' with ", I got the error:
perl -lne "$s{$_}++;END{print for sort keys %s}"   test.txt
Substitution pattern not terminated at -e line 1.
0
 

Author Comment

by:jl66
Comment Utility
Adam314, unfortunately the OS is Windows.
0
 

Author Comment

by:jl66
Comment Utility
bounsy's solution sorted the inputs but kept the duplicates as Adam314 mentioned.
0
 
LVL 39

Expert Comment

by:Adam314
Comment Utility
On windows, the command ozo gave would be this.  Replace test.txt with the actual input file name.  If you want the output to go to a file instead, the the second command, replacing output.txt with the desired output file name.

It looks like you tried this already.  Did you try it exactly as it is there?  Or could there have been a typo?  Try using copy/paste to make sure there are no typos.


#output to screen:

perl -lne "$s{$_}++;END{print for sort keys %s}" test.txt
 

#output to file:

perl -lne "$s{$_}++;END{print for sort keys %s}" test.txt > output.txt

Open in new window

0
 
LVL 84

Accepted Solution

by:
ozo earned 220 total points
Comment Utility
perl -lne "$s{$_}++;END{print for sort keys %s}"   test.txt
is correct on windows
did you omit the % on the %s?
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This video explains how to create simple products associated to Magento configurable product and offers fast way of their generation with Store Manager for Magento tool.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now