• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 240
  • Last Modified:

remove duplicate lines

i have a file with information in it like this
word comment
where word is a single word and comment is a single word comment about a word....

i add in some information to the end of the file (always in the same format...word comment) but sometimes i add in the same information

would there be a way to, after i add in the information, remove all duplicate lines in the file?
thanks
paul
0
paulwhelan
Asked:
paulwhelan
  • 7
  • 5
1 Solution
 
MindoCommented:
I quite not understand your task. Is your task like this?:

Given a text file remove all the duplicate lines.

If we have a file:

==================
First line.
Second line.
Second line.
Third line.
==================

The resulting file would be:

==================
First line.
Second line.
Third line.
==================

if so, i can write it and get points, although i'm not guru at Perl :-)
0
 
paulwhelanAuthor Commented:
yes thats what i want
thanks
paul
0
 
MindoCommented:
#!/usr/local/bin/perl -w

######################################
# Mindaugas Genutis,                 #
# mindg@nomagiclt.com                #
######################################
# usage: perl filter.pl <file>       #
# The program removes duplicate      #
# lines from a given file and leaves #
# old file by adding a suffix .orig  #
# to its end.                        #
######################################

my $old = shift or die "usage: $0 <filename>\n";
$new = "new.txt";

open(OLD, "< $old") or die "can't open $old: $!";
open(NEW, "> $new") or die "can't open $new: $!";

select(NEW);

%lines = ();

while(<OLD>)
{
  if(!exists($lines{$_}))
  {
    $lines{$_} = 1;
    print NEW $_ or die "can't write $new: $!";
  }
};

close(OLD) or die "can't close $old: $!";
close(NEW) or die "can't close $new: $!";
rename($old, "$old.orig") or die "can't rename $old to $old.orig: $!";
rename($new, "$old") or die "can't rename $new to $old: $!";
0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

 
MindoCommented:
Keep in mind, it removes only the absolutely identical lines from a file. And it assumes that you do not have a file new.txt on your current directory. It uses file new.txt as a temporary file.

Cheers :-)
0
 
paulwhelanAuthor Commented:
is there a way to do this without the temporary file...
a lot of people would be accessing this at the same time
it might lead to errors
0
 
MindoCommented:
Here is the version without a temporary file :-)

#!/usr/local/bin/perl -w

######################################
# Mindaugas Genutis,                 #
# mindg@nomagiclt.com                #
######################################
# usage: perl filter.pl <file>       #
# The program removes duplicate      #
# lines from a given file.           #
######################################

my $file = shift or die "usage: $0 <filename>\n";

open(F, "+< $file") or die "can't open $file: $!";

%lines = ();
$out = '';

while(<F>)
{
  if(!exists($lines{$_}))
  {
    $lines{$_} = 1;
    $out .= $_;
  }
};

seek(F, 0, 0) or die "can't seek to start of $file: $!";
print F $out or die "can't print to $file: $!";
truncate(F, tell(F)) or die "can't truncate $file: $!";
close(F) or die "can't close $file: $!";
0
 
ozoCommented:
flock to prevent accessing at the same time
0
 
paulwhelanAuthor Commented:
ozo do u know a script to do this with flock?
thanks
paul
0
 
paulwhelanAuthor Commented:
does this work for u i cant get it to work
it doesnt delete the duplicate lines.....
0
 
MindoCommented:
Yes, it does delete duplicate lines for me. I think your lines aren't precisely duplicate. No one will help if you need to remove duplicate lines which aren't duplicate :-)

Given the file:

==================
Second Line.
Third Line.
Second Line.
Fourth Line.
Ninth Line.
Third Line.
First Line.
Second Line.
==================

I run the command line:

$perl filter.pl file.txt

The resulting file is:

==================
First Line.
Second Line.
Third Line.
Fourth Line.
Ninth Line.
==================

So it works for me. Give me your files. I'll check this out.
0
 
MindoCommented:
#!/usr/local/bin/perl -w

######################################
# Mindaugas Genutis,                 #
# mindg@nomagiclt.com                #
######################################
# usage: perl filter.pl <file>       #
# The program removes duplicate      #
# lines from a given file.           #
######################################

use Fcntl ':flock'; # import LOCK_* constants

my $file = shift or die "usage: $0 <filename>\n";

open(F, "+< $file") or die "can't open $file: $!";
flock(F, LOCK_EX); # Lock the file.

%lines = ();
$out = '';

while(<F>)
{
  if(!exists($lines{$_}))
  {
    $lines{$_} = 1;
    $out .= $_;
  }
};

seek(F, 0, 0) or die "can't seek to start of $file: $!";
print F $out or die "can't print to $file: $!";
truncate(F, tell(F)) or die "can't truncate $file: $!";
close(F) or die "can't close $file: $!";

flock(F, LOCK_UN); # Unlock the file.
0
 
MindoCommented:
I've uploaded a version with flock() - (above). It works, i don't know what is your data. You should adjust this example to your case. I think it's enough.
0
 
paulwhelanAuthor Commented:
the last code u posted was, as u said, perfect
sorry it took so long to grade
paul
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

  • 7
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now