Link to home
Start Free TrialLog in
Avatar of paulwhelan
paulwhelan

asked on

remove duplicate lines

i have a file with information in it like this
word comment
where word is a single word and comment is a single word comment about a word....

i add in some information to the end of the file (always in the same format...word comment) but sometimes i add in the same information

would there be a way to, after i add in the information, remove all duplicate lines in the file?
thanks
paul
Avatar of Mindo
Mindo

I quite not understand your task. Is your task like this?:

Given a text file remove all the duplicate lines.

If we have a file:

==================
First line.
Second line.
Second line.
Third line.
==================

The resulting file would be:

==================
First line.
Second line.
Third line.
==================

if so, i can write it and get points, although i'm not guru at Perl :-)
Avatar of paulwhelan

ASKER

yes thats what i want
thanks
paul
#!/usr/local/bin/perl -w

######################################
# Mindaugas Genutis,                 #
# mindg@nomagiclt.com                #
######################################
# usage: perl filter.pl <file>       #
# The program removes duplicate      #
# lines from a given file and leaves #
# old file by adding a suffix .orig  #
# to its end.                        #
######################################

my $old = shift or die "usage: $0 <filename>\n";
$new = "new.txt";

open(OLD, "< $old") or die "can't open $old: $!";
open(NEW, "> $new") or die "can't open $new: $!";

select(NEW);

%lines = ();

while(<OLD>)
{
  if(!exists($lines{$_}))
  {
    $lines{$_} = 1;
    print NEW $_ or die "can't write $new: $!";
  }
};

close(OLD) or die "can't close $old: $!";
close(NEW) or die "can't close $new: $!";
rename($old, "$old.orig") or die "can't rename $old to $old.orig: $!";
rename($new, "$old") or die "can't rename $new to $old: $!";
Keep in mind, it removes only the absolutely identical lines from a file. And it assumes that you do not have a file new.txt on your current directory. It uses file new.txt as a temporary file.

Cheers :-)
is there a way to do this without the temporary file...
a lot of people would be accessing this at the same time
it might lead to errors
Here is the version without a temporary file :-)

#!/usr/local/bin/perl -w

######################################
# Mindaugas Genutis,                 #
# mindg@nomagiclt.com                #
######################################
# usage: perl filter.pl <file>       #
# The program removes duplicate      #
# lines from a given file.           #
######################################

my $file = shift or die "usage: $0 <filename>\n";

open(F, "+< $file") or die "can't open $file: $!";

%lines = ();
$out = '';

while(<F>)
{
  if(!exists($lines{$_}))
  {
    $lines{$_} = 1;
    $out .= $_;
  }
};

seek(F, 0, 0) or die "can't seek to start of $file: $!";
print F $out or die "can't print to $file: $!";
truncate(F, tell(F)) or die "can't truncate $file: $!";
close(F) or die "can't close $file: $!";
Avatar of ozo
flock to prevent accessing at the same time
ozo do u know a script to do this with flock?
thanks
paul
does this work for u i cant get it to work
it doesnt delete the duplicate lines.....
ASKER CERTIFIED SOLUTION
Avatar of Mindo
Mindo

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
#!/usr/local/bin/perl -w

######################################
# Mindaugas Genutis,                 #
# mindg@nomagiclt.com                #
######################################
# usage: perl filter.pl <file>       #
# The program removes duplicate      #
# lines from a given file.           #
######################################

use Fcntl ':flock'; # import LOCK_* constants

my $file = shift or die "usage: $0 <filename>\n";

open(F, "+< $file") or die "can't open $file: $!";
flock(F, LOCK_EX); # Lock the file.

%lines = ();
$out = '';

while(<F>)
{
  if(!exists($lines{$_}))
  {
    $lines{$_} = 1;
    $out .= $_;
  }
};

seek(F, 0, 0) or die "can't seek to start of $file: $!";
print F $out or die "can't print to $file: $!";
truncate(F, tell(F)) or die "can't truncate $file: $!";
close(F) or die "can't close $file: $!";

flock(F, LOCK_UN); # Unlock the file.
I've uploaded a version with flock() - (above). It works, i don't know what is your data. You should adjust this example to your case. I think it's enough.
the last code u posted was, as u said, perfect
sorry it took so long to grade
paul