sort but preserve order of file

mmatharu
mmatharu used Ask the Experts™
on
Hi,

I have a file which I am doing a sort –u to filter duplicate rows but I want to also preserve the sort order.  The data is just dates of each day of the year and sometimes there will be duplicate days, so when I perform the sort the file changes from:

01.05.2007
02.05.2007
……
To :
01.05.2007
01.06.2007

How can I still use the unix sort but preserve the order of the file.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Commented:
Try this instead:

uniq -d < file

OR

sort -m -u < file

Author

Commented:
hi,

i tried both the first one say the data source is empty and the second does not eliminate the the duplicates.

Commented:
Correction,

try: uniq < file
Become a CompTIA Certified Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

Commented:
Does the file have just dates or something else too?
sort file|uniq>filtered-file

Author

Commented:
the file just contains dates

Author

Commented:
ahoffmann:

I don't quit understand your command could you write with example filenames or commands

Commented:
If the name of the file is "foo" then run the below commands:

$ cp foo foo.bak
$ uniq < foo.bak > foo

The sort process (in both 'sort -u' and 'uniq') requires that the duplicates are on consecutive lines when they are removed.  So, they first sort alphanumerically to make sure that this is the case, then remove lines which are duplicated next to eachother.

I am guessing, but it sounds like your problem is that some duplicate lines are not consecutive IN THE ORDER YOU WISH TO PRESERVE, so your file looks something like:

  01.05.2007
  02.05.2007    <--- duplicated two lines below
  03.05.2007
  02.05.2007    <--- duplicated two lines above
  04.05.2007

Or, the lines are not necessarily appearing in sequence, so the following scenario may exist:

  01.05.2007
  03.05.2007    <--- numerical order is later than perceived date on next line - out of order
  02.05.2007    <--- numerical order is earlier than perceived date on previous line - out of order
  02.05.2007
  04.05.2007


If one of these is the case, or it is a combination of the two, then we need a two pass process to do the job, which I can script for you.  The reason for this is that we need to select which line is the right one to keep to preserve the order as you want it.  So, in my first example above, which of the lines should be removed?  The 'sort' and 'uniq' tools will ignore any rules you may have about sequence

Again, if this is the case, please provide a sample 50 lines (including duplicates), and show how you want them to appear in the final result.  In the sample, please provide as many cases of how the source data breaks the rules of:

   - consecutive duplicate lines.
   - source order is already sorted alphanumerically.

Post this back for a fix.
Hi The following script will remove duplicate, nonconsecutive lines from a file:
Just try it out:

sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'

> I don't quit understand your command could you write with example filenames or commands
that's exactly what I did

> the file just contains dates
well, assuming the dates look like your examples in the question, just use the -n option for sort
but I guess that you want a special sorting of your dates and your dates are in uncommon format, then you need to explain both: the sorting and the format of your dates
Suhas .Senior QA Manager

Commented:
No comment has been added to this question in more than 21 days, so it is now classified as abandoned.

I will leave the following recommendation for this question in the Cleanup Zone:
Delete - No Refund

Any objections should be posted here in the next 4 days. After that time, the question will be closed.

Suhas
Experts Exchange Cleanup Volunteer
disagreed, as I gave a 101% valid suggestion ;-)
Suhas .Senior QA Manager

Commented:
Ahoffman,

Thanks for the reply.

Your suggestion is valid, but again that is not the required output for the mmatharu.
so the question is requested for Delete-No Refund in cleanup area. As you asked for explanation from mmatharu, he/she didnt turned back to your query.

Also,
Since the question is worth of 500 points, questioner expects the solution with examples in detail.

Best Regards,
Suhas
Experts Exchange Cleanup Volunteer
PAQed with no points refunded (of 500)

Computer101
EE Admin

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial