Link to home
Start Free TrialLog in
Avatar of AsifMughal
AsifMughal

asked on

Using the Sort in Unix

Hello All

I am trying to sort a 70 Meg text file and also remain any duplicates.  Below is some sample data from the file.


15601584DSP4453 003508KIG235700
15692047DSP4453 003508DIG254701
11201584DSP4453 004508TIG254700
12392047DSP4453 004508UIG254701
10341928DSP1035 003508JCD265801

A duplicate is defined by three fields, first one is the first 8 characters (e.g. 15601584, in line 1), then the 7 characters at position 22 (e.g. KIG2357, in line 1) and then 2 characters at position 29 (e.g. 00 for line 1).

So any record with the same combination of the three fields is duplicate and needs to be omitted.   I have tried to use the sort command with -u switch, but uses the whole line as record for searching for duplicates.

It is possible to specify which fields to use to search for duplicates by specifying the start and end positions of the text, which marks a field.    You can do this with specifying a field to sort by using the -k switch and then specifying the fields, is there anything similar with the -u switch

I look forward to a reply.

Thanks in advance


Asif Mughal

ASKER CERTIFIED SOLUTION
Avatar of interiot
interiot

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial