asked on

remove duplicate lines

i have a file with information in it like this
word comment
where word is a single word and comment is a single word comment about a word....

i add in some information to the end of the file (always in the same format...word comment) but sometimes i add in the same information

would there be a way to, after i add in the information, remove all duplicate lines in the file?
thanks
paul

Mindo

I quite not understand your task. Is your task like this?:

Given a text file remove all the duplicate lines.

If we have a file:

==================
First line.
Second line.
Second line.
Third line.
==================

The resulting file would be:

==================
First line.
Second line.
Third line.
==================

if so, i can write it and get points, although i'm not guru at Perl :-)

paulwhelan

ASKER

yes thats what i want
thanks
paul

Mindo

#!/usr/local/bin/perl -w

######################################
# Mindaugas Genutis, #
# mindg@nomagiclt.com #
######################################
# usage: perl filter.pl <file> #
# The program removes duplicate #
# lines from a given file and leaves #
# old file by adding a suffix .orig #
# to its end. #
######################################

my $old = shift or die "usage: $0 <filename>\n";
$new = "new.txt";

open(OLD, "< $old") or die "can't open $old: $!";
open(NEW, "> $new") or die "can't open $new: $!";

select(NEW);

%lines = ();

while(<OLD>)
{
if(!exists($lines{$_}))
{
$lines{$_} = 1;
print NEW $_ or die "can't write $new: $!";
}
};

close(OLD) or die "can't close $old: $!";
close(NEW) or die "can't close $new: $!";
rename($old, "$old.orig") or die "can't rename $old to $old.orig: $!";
rename($new, "$old") or die "can't rename $new to $old: $!";

Mindo

Keep in mind, it removes only the absolutely identical lines from a file. And it assumes that you do not have a file new.txt on your current directory. It uses file new.txt as a temporary file.

Cheers :-)

paulwhelan

ASKER

is there a way to do this without the temporary file...
a lot of people would be accessing this at the same time
it might lead to errors

Mindo

Here is the version without a temporary file :-)

#!/usr/local/bin/perl -w

######################################
# Mindaugas Genutis, #
# mindg@nomagiclt.com #
######################################
# usage: perl filter.pl <file> #
# The program removes duplicate #
# lines from a given file. #
######################################

my $file = shift or die "usage: $0 <filename>\n";

open(F, "+< $file") or die "can't open $file: $!";

%lines = ();
$out = '';

while(<F>)
{
if(!exists($lines{$_}))
{
$lines{$_} = 1;
$out .= $_;
}
};

seek(F, 0, 0) or die "can't seek to start of $file: $!";
print F $out or die "can't print to $file: $!";
truncate(F, tell(F)) or die "can't truncate $file: $!";
close(F) or die "can't close $file: $!";

ozo

flock to prevent accessing at the same time

paulwhelan

ASKER

ozo do u know a script to do this with flock?
thanks
paul

paulwhelan

ASKER

does this work for u i cant get it to work
it doesnt delete the duplicate lines.....

ASKER CERTIFIED SOLUTION

Mindo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Mindo

#!/usr/local/bin/perl -w

######################################
# Mindaugas Genutis, #
# mindg@nomagiclt.com #
######################################
# usage: perl filter.pl <file> #
# The program removes duplicate #
# lines from a given file. #
######################################

use Fcntl ':flock'; # import LOCK_* constants

my $file = shift or die "usage: $0 <filename>\n";

open(F, "+< $file") or die "can't open $file: $!";
flock(F, LOCK_EX); # Lock the file.

%lines = ();
$out = '';

while(<F>)
{
if(!exists($lines{$_}))
{
$lines{$_} = 1;
$out .= $_;
}
};

seek(F, 0, 0) or die "can't seek to start of $file: $!";
print F $out or die "can't print to $file: $!";
truncate(F, tell(F)) or die "can't truncate $file: $!";
close(F) or die "can't close $file: $!";

flock(F, LOCK_UN); # Unlock the file.

Mindo

I've uploaded a version with flock() - (above). It works, i don't know what is your data. You should adjust this example to your case. I think it's enough.

paulwhelan

ASKER

the last code u posted was, as u said, perfect
sorry it took so long to grade
paul