Hi,
This will display the duplicates base on the first field where field saparator is a space:
awk -F' ' '{ print $1 }' data1 |sort -n | uniq -d | fgrep -f - data1
data1 is the name of your data file.
Cheers,
K.
Main Topics
Browse All TopicsI have the following file sorted by the first column as the sort key :
111 found in platter 3297
111 found in platter 3322
1122 found in platter 3232
1129 found in platter 3297
1133 found in platter 3238
1133 found in platter 3297
1133 found in platter 3211
1255 found in platter 3213
1258 found in platter 3289
.....
I need a script that reads the above file & produces only those
lines / records that have the sort key occurring more than once,
so based on the above sample, the output should be :
111 found in platter 3297
111 found in platter 3322
1133 found in platter 3238
1133 found in platter 3297
1133 found in platter 3211
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
It just compares the first field with the previous line first field and if the first fields are same displays it. But the thing is it does not sort the data and assumes the data was already sorted.
In fact even if it did sorting the sorting it would be somehow a buble sort. I expect the sort operation by the OS command would be faster.
Thanks KeremE.
I've created another question, if you have the time, perhaps can help take a look
if you are available, thanks very much
Title: Unix Shell Perl script to filter out repeated sort key within a large file ( sed awk grep wc )
Question:
I have following files (with millions of lines) which is currently not sorted
by first column value & may have preceding space(s) in front :
113 found in platter 3222
127 found in platter 3922
113 found in platter 3735
1323 found in platter 3213
1323 found in platter 3898
53323 found in platter 3288
1323 found in platter 3223
....
(127 & 53323 each occurs only once in this file)
The output should be (ie those records with first column value that
occur once only will be eliminated & sorted by first column as primary
key followed by 5th column value as secondary sort key, so only
those lines with repeated first column value will be retained) :
113 found in platter 3222
113 found in platter 3735
1323 found in platter 3213
1323 found in platter 3223
1323 found in platter 3898
....
Business Accounts
Answer for Membership
by: TintinPosted on 2009-10-20 at 12:25:17ID: 25617517
Depends on your version of uniq.
uniq -D -w4 file.list