Correction :
> ... those records with first column value that occur once only will
> be eliminated, retaining those with first column value that are repeated
Main Topics
Browse All TopicsI have following files (with millions of lines) which is currently not sorted
by first column value & may have preceding space(s) in front :
113 found in platter 3222
127 found in platter 3922
113 found in platter 3735
1323 found in platter 3213
1323 found in platter 3898
53323 found in platter 3288
1323 found in platter 3223
....
(127 & 53323 each occurs only once in this file)
The output should be (ie those records with first column value will
be eliminated & sorted by first column as primary key followed by
5th column value as secondary sort key) :
113 found in platter 3222
113 found in platter 3735
1323 found in platter 3213
1323 found in platter 3223
1323 found in platter 3898
....
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
Hi,
I'm not good at PERL script unfortunately. But the solution I've prvided for the other question will work here too. Becasue AWK will not be fooled with the leading space. so the result will be something like this:
awk -F' ' '{ print $1 }' data1 |sort -n | uniq -d | fgrep -f - data1
113 found in platter 3222
113 found in platter 3735
1323 found in platter 3213
1323 found in platter 3898
1323 found in platter 3223
Hi Arnold
127 and 53323 are the primary sort keys for 2 of the lines in the file, so
lines with the unique (ie occurring once) first column value should be
stripped out or excluded in the output
Hi KeremE,
For some reason the code you gave worked with a sample file earlier
but this time for searching within the same file gave incorrect output :
think everything including the lines with unique first value column got
included in the output .... but I'll try to describe more precisely later
Business Accounts
Answer for Membership
by: sunhuxPosted on 2009-10-21 at 02:05:40ID: 25621934
To elaborate:
this is a search for repeated primary sort key within the same file itself