find duplicates in pipe delimited file based on first value

Hi all,

I have a script in perl to look through a flat file and find duplicate lines. I output the dupes found to a text file.
I am trying to change this so it finds duplicates based on the first value in the pipe delmited file and not based on the whole line.

FIND DUPES SCRIPT:

open(FILE,"test.txt") || die "$!";
%seen =();
$line=0 ;
while (<FILE>) {
  $seen{$_}++;
  $line++;
  ## output dupes to text file
  open (MYFILE, '>>dupes.txt');
  print MYFILE "line $line : $_" if $seen{$_} > 1 ;
  close (MYFILE);
}

So as an example if I have a flat file with the following data:

774143||Mahou Tsukai Ninaru Houhou|||n||||||n|0|05/30/2009||
774143||Mahou Tsukai Ninaru Houhou|||n||||||n|0|05/30/2009||
774143||Mahou Tsukai Ninaru Houhou|||n||||||n|0|05/30/2009||
774143||Mahou Tsukai Ninaru Houhou|||n||||||n|0|05/30/2007||
773752||Dream Generation: Koi Ka? Shigoto Ka?|||n||||||n|0|05/30/2009||
773752||Dream Generation: Koi Ka? Shigoto Ka?|||n||||||n|0|05/30/2009||
773752||Dream Generation: Koi Ka? Shigoto Ka?|||n||||||n|0|05/30/2009||
773752||Dream Generation: Koi Ka? Shigoto Ka?|||n||||||n|0|05/30/2006||
773234||Burpy|||n||||||n|0|05/30/2006||

Checking for dupe lines my result file will output the following which
is correct. Here we find the exact duplicate "lines".

line 2 : 774143||Mahou Tsukai Ninaru Houhou|||n||||||n|0|05/30/2009||
line 3 : 774143||Mahou Tsukai Ninaru Houhou|||n||||||n|0|05/30/2009||
line 6 : 773752||Dream Generation: Koi Ka? Shigoto Ka?|||n||||||n|0|05/30/2009||
line 7 : 773752||Dream Generation: Koi Ka? Shigoto Ka?|||n||||||n|0|05/30/2009||


What I need to do is get dupe lines based on the very first value which
is the ID number. So my output file should instead show:

line 2 : 774143||Mahou Tsukai Ninaru Houhou|||n||||||n|0|05/30/2009||
line 3 : 774143||Mahou Tsukai Ninaru Houhou|||n||||||n|0|05/30/2009||
line 4 : 774143||Mahou Tsukai Ninaru Houhou|||n||||||n|0|05/30/2007||
line 6 : 773752||Dream Generation: Koi Ka? Shigoto Ka?|||n||||||n|0|05/30/2009||
line 7 : 773752||Dream Generation: Koi Ka? Shigoto Ka?|||n||||||n|0|05/30/2009||
line 8 : 773752||Dream Generation: Koi Ka? Shigoto Ka?|||n||||||n|0|05/30/2006||

Lastly. I am new to perl but I would like to learn more about how perl expression work. Im finding perl very handy for setting up fast little utility scripts to process large files (on windows using active perl) but I'm a complete newb to perl.
binovpdAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

mrjoltcolaCommented:
Instead do:

open(FILE,"test.txt") || die "$!";
%seen =();
$line=0 ;
while (<FILE>) {
  /^(\d+)/;
  $seen{$1}++;
  $line++;
  ## output dupes to text file
  open (MYFILE, '>>dupes.txt');
  print MYFILE "line $line : $_" if $seen{$1} > 1 ;
  close (MYFILE);
}

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
ozoCommented:
open(FILE,"test.txt") || die "$!";
open (MYFILE, '>>dupes.txt') || die $!;
%seen =();
while (<FILE>) {
  ## output dupes to text file
  print MYFILE "line $. : $_" if $seen{(/(\d+)/)[0]}++ ;
}
0
binovpdAuthor Commented:
Thanks mrjoltcola and ozo. Both solutions work fine. I split the points between you both since both solutions will work. I gave mrjoltcola a bit more since he answered first.

ozo your solutions intersting you. you look though the orignal file then reparse through the output results and look through that.

If I may ask, Im trying to understand perl expressions. What does /^(\d+)/; do?
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.