?
Solved

how to sort and remove duplicates with Perl

Posted on 2009-07-07
13
Medium Priority
?
269 Views
Last Modified: 2012-05-07
Have a list like the following:
XY-B
AB_C
ABB
ACB
ABB
XYZ
would like to get
ABB
AB_C
ACB
XY-B
XYZ
Prefer to a one-liner.
0
Comment
Question by:jl66
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
  • 3
  • +1
13 Comments
 

Author Comment

by:jl66
ID: 24795198
The list may contain numbers. Sorry for that.
0
 
LVL 39

Assisted Solution

by:Adam314
Adam314 earned 840 total points
ID: 24795236

my %lines;
while(<DATA>) {
	chomp;
	$lines{$_}++;
}
 
print "$_\n" foreach (sort keys %lines);
 
__DATA__
XY-B
AB_C
ABB
ACB
ABB
XYZ

Open in new window

0
 
LVL 13

Expert Comment

by:Carl Bohman
ID: 24795264

perl -ne '$a=$_;s/[^a-zA-Z0-9]//g;$h{$_}=$a;END{print map {$h{$_}} sort keys %h}'

Open in new window

0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 13

Assisted Solution

by:Carl Bohman
Carl Bohman earned 280 total points
ID: 24795297
One thing that's not very clear: Do you want to sort ignoring non-alpha and non-numeric or are you just looking for a basic sort?  My last answer assumed you were ignoring the dash and underscore for sorting purposes.  If you want a basic sort, the following should work.
perl -ne 'chomp;push @a, $_;END{print join("\n", sort @a), "\n"}'

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
ID: 24795561
perl -lne '$s{$_}++;END{print for sort keys %s}' list
0
 

Author Comment

by:jl66
ID: 24800412
bouns, the basic sort is good enough.
ozo, if the list is a file, is the one-liner for a file different from the above? I tried to use it for a file, it gave the unexpected result.
0
 
LVL 84

Expert Comment

by:ozo
ID: 24800849
in what way was the result unexpected?
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24801195
The sort that bounsy gives will not remove duplicates.

If you are on unix, you could use uniq and sort commands.
0
 

Author Comment

by:jl66
ID: 24810249
ozo,  if I used yours exactly, I got the error:
perl -lne '$s{$_}++;END{print for sort keys %s}' test.txt
Can't find string terminator "'" anywhere before EOF at -e line 1.
if I replaced ' with ", I got the error:
perl -lne "$s{$_}++;END{print for sort keys %s}"   test.txt
Substitution pattern not terminated at -e line 1.
0
 

Author Comment

by:jl66
ID: 24810260
Adam314, unfortunately the OS is Windows.
0
 

Author Comment

by:jl66
ID: 24810323
bounsy's solution sorted the inputs but kept the duplicates as Adam314 mentioned.
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24810843
On windows, the command ozo gave would be this.  Replace test.txt with the actual input file name.  If you want the output to go to a file instead, the the second command, replacing output.txt with the desired output file name.

It looks like you tried this already.  Did you try it exactly as it is there?  Or could there have been a typo?  Try using copy/paste to make sure there are no typos.


#output to screen:
perl -lne "$s{$_}++;END{print for sort keys %s}" test.txt
 
#output to file:
perl -lne "$s{$_}++;END{print for sort keys %s}" test.txt > output.txt

Open in new window

0
 
LVL 84

Accepted Solution

by:
ozo earned 880 total points
ID: 24811261
perl -lne "$s{$_}++;END{print for sort keys %s}"   test.txt
is correct on windows
did you omit the % on the %s?
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question