I have a sample file (sample.txt) that looks like this:
What I wish to do is remove all duplicate lines leaving only unique lines:
1) I know I can use this command:
cat sample.txt | sort | uniq > newfile.txt
but would prefer to not sort the file as I wish to leave it in its original order.
2) Would prefer to make the change inline (not writing to a newfile etc.)
I found this link http://www.linuxquestions.org/questions/programming-9/removing-duplicate-lines-with-sed-276169/
which offered the following solution (which I modified as follows)
# delete duplicate, consecutive lines from a file (emulates "uniq").
# First line in a set of duplicate lines is kept, rest are deleted.
sed -i '$!N; /^\(.*\)\n\1$/!P; D' sample.txt
# delete duplicate, nonconsecutive lines from a file. Beware not to
# overflow the buffer size of the hold space, or else use GNU sed.
sed -i 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P' sample.txt
The 1st sed command (consecutive lines) worked.
The 2nd sed command (non-consecutive lines) did not work (i.e. lines were in fact replicated in the file) and I do not understand sed enough to fix it.
Any help would be greatly appreciated. If this requirement can not be fulfilled using sed or sed alone please supply alternative.