?
Solved

Remove duplicate lines using sed in Oracle Linux 6.5 environment

Posted on 2014-04-02
1
Medium Priority
?
543 Views
Last Modified: 2014-04-03
I have a sample file (sample.txt) that looks like this:

ddddddddddddddddddddddd
ddddddddddddddddddddddd
aaaaaaaaaaaaaaaaaaaaaaa
avvvvvvvvvvvvvvvvvvvvvv
bbbbbbbbbbbbbbbbbbbbbbb
ddddddddddddddddddddddd

What I wish to do is remove all duplicate lines leaving only unique lines:

ddddddddddddddddddddddd
aaaaaaaaaaaaaaaaaaaaaaa
avvvvvvvvvvvvvvvvvvvvvv
bbbbbbbbbbbbbbbbbbbbbbb


Constraints:
1) I know I can use this command:
cat  sample.txt | sort | uniq > newfile.txt 

Open in new window

but would prefer to not sort the file as I wish to leave it in its original order.
2) Would prefer to make the change inline (not writing to a newfile etc.)

I found this link http://www.linuxquestions.org/questions/programming-9/removing-duplicate-lines-with-sed-276169/

which offered the following solution (which I modified as follows)
# delete duplicate, consecutive lines from a file (emulates "uniq").
# First line in a set of duplicate lines is kept, rest are deleted.
sed -i '$!N; /^\(.*\)\n\1$/!P; D' sample.txt

# delete duplicate, nonconsecutive lines from a file. Beware not to
# overflow the buffer size of the hold space, or else use GNU sed.
sed -i 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P' sample.txt

Open in new window

The 1st sed command (consecutive lines) worked.
The 2nd sed command (non-consecutive lines) did not work (i.e. lines were in fact replicated in the file) and I do not understand sed enough to fix it.

Any help would be greatly appreciated.  If this requirement can not be fulfilled using sed or sed alone please supply alternative.
0
Comment
Question by:klyles95
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
1 Comment
 
LVL 84

Accepted Solution

by:
ozo earned 2000 total points
ID: 39974196
perl -i -ne 'print unless $s{$_}++' sample.txt
0

Featured Post

Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If you have a server on collocation with the super-fast CPU, that doesn't mean that you get it running at full power. Here is a preamble. When doing inventory of Linux servers, that I'm administering, I've found that some of them are running on l…
Linux users are sometimes dumbfounded by the severe lack of documentation on a topic. Sometimes, the documentation is copious, but other times, you end up with some obscure "it varies depending on your distribution" over and over when searching for …
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Suggested Courses
Course of the Month8 days, 18 hours left to enroll

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question