Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

linux Sort Unique by Column

Posted on 2013-06-04
5
Medium Priority
?
605 Views
Last Modified: 2013-06-04
Environment:

Linux 2.6.18-308.11.1.el5 #1 SMP Fri Jun 15 15:41:53 EDT 2012 x86_64 GNU/Linux

I need to sort a file and eliminate duplicates based on the first column of a text file.

The text file looks like this (partial):

ADEYEFP2~emghlp1071~103.38
ADEYEFP2~emghlp1072~103.38
ADEYETC1~emghlc054~57.11
ADEYETC2~emghlc1037~145.32
ADEYETC2~emghlc1038~145.32  

What I want the output to be after the sort is:

ADEYEFP2~emghlp1071~103.38
ADEYETC1~emghlc054~57.11
ADEYETC2~emghlc1038~145.32  

--essentially eliminating duplicate rows based only on the first column as delimited by "~".

I have tried various combinations of sort, but I am unable to get it to work.

Any help will be much appreciated!

Thanks
0
Comment
Question by:dhite99
  • 2
  • 2
5 Comments
 
LVL 40

Expert Comment

by:omarfarid
ID: 39219943
try this

sed 's/\~/ /' filename | sort -u -n -k1,1
0
 

Author Comment

by:dhite99
ID: 39220007
It seems to only output the first line of the input file:

oracle@ebhlx001:/u01/app/oracle/dev> cat test.txt
AM8UETC1~emghlc002~29.36
AM8UETC2~emghlc1027~99.96
AM8UETC2~emghlc1028~99.96
AM9WAFP1~emghlp1058~213.32
AM9WAFP1~emghlp1059~213.32
AM9WAFP1~emghlp1060~213.32
AM9WAFP2~emghlp017~339.95
AM9WAFP3~emghlp017~8.59
AM9WARP2~emghlp044~324.5
AM9WASC1~emghlc002~444.59

oracle@ebhlx001:/u01/app/oracle/dev> sed 's/\~/ /' test.txt | sort -u -n -k1,1
AM8UETC1 emghlc002~29.36
oracle@ebhlx001:/u01/app/oracle/dev>

Open in new window

0
 
LVL 40

Expert Comment

by:omarfarid
ID: 39220059
please try

sed 's/\~/ /g' test.txt | sort -u -n -k1,1

If you don't want to see rest of the fields of lines

try

sed 's/\~/ /g' test.txt | sort -u -n -k1,1 | cut -d ' ' -f 1
0
 
LVL 23

Accepted Solution

by:
nemws1 earned 2000 total points
ID: 39220155
I'd suggest this:

sort -u -t~ -k1,1 test.txt

Open in new window


But I have a caveat with your example.  From these 2 lines, you want the first one:

ADEYEFP2~emghlp1071~103.38
ADEYEFP2~emghlp1072~103.38

That's all good.  But from these 2, you say you want the *second* one:

ADEYETC2~emghlc1037~145.32
ADEYETC2~emghlc1038~145.32  


Without additional info on how you are making that decision, we can't reproduce your results.
0
 

Author Comment

by:dhite99
ID: 39220254
omarfarid: It still only returns one row:

oracle@ebhlx001:/u01/app/oracle/dev> sed 's/\~/ /g' test.txt | sort -u -n -k1,1
AM8UETC1 emghlc002 29.36          

nemws1: That worked, thanks

oracle@ebhlx001:/u01/app/oracle/dev> sort -u -t~ -k1,1 test.txt
AM8UETC1~emghlc002~29.36
AM8UETC2~emghlc1027~99.96
AM9WAFP1~emghlp1058~213.32
AM9WAFP2~emghlp017~339.95
AM9WAFP3~emghlp017~8.59
AM9WARP2~emghlp044~324.5
AM9WASC1~emghlc002~444.59
0

Featured Post

Veeam Task Manager for Hyper-V

Task Manager for Hyper-V provides critical information that allows you to monitor Hyper-V performance by displaying real-time views of CPU and memory at the individual VM-level, so you can quickly identify which VMs are using host resources.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The purpose of this article is to demonstrate how we can use conditional statements using Python.
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.
Suggested Courses
Course of the Month12 days, 22 hours left to enroll

972 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question