remove duplicates from the csv file

Hi - I wrote the below perl script to get emp Id and emp date of join.
I write the output of the script into a csv file in the below format

100000001,20150601
100000002,20150101
100000231,20150101
100002431,20150101
100023545,20150301
100000021,20150101
100000031,20150101
100000051,20150418
102044353,20150401
100000054,20150601
104560071,20150301
100045671,20150301
100045671,20150301
100045671,20150601
215645567,20150301


if you observe the emp id 100045671 was repeated 3 times, but i want it only once and only the first occurence
so from the below three
100045671,20150301
100045671,20150301
100045671,20150601
i just wanted the first one
100045671,20150301

in order to achieve that how can i modify the script.
I just don't want any duplicate employee id's in the file, if there are duplicates i want just the first occurence and remove the remaining from the csv.

use strict;
use warnings;
use Pod::Usage;
use Getopt::Long;
use Time::Piece;

my $helpme = 0;
my $man = 0;

my $outputFileName = 'C:\\temp\\test_v1.csv';
my $inputFileName =  'C:\\temp\\test.txt';

my $errorcode = 0;
my $DEBUG=0;

if(exists $ENV{DEBUG}) {
	$DEBUG = ($ENV{DEBUG} eq "") ? 0 : $ENV{DEBUG};
}

GetOptions('help' => \$helpme, 'man' => \$man, 'infile=s' => \$inputFileName, 'outfile=s' => \$outputFileName) or pod2usage(2);

pod2usage(1) if $helpme;
pod2usage(-verbose => 2) if $man;

die 'No input file name specified!' unless $inputFileName;
die 'No output file name specified!' unless $outputFileName;

open(INFILE, '<', $inputFileName) or die "Could not open input file: $!";
open(OUTFILE, '>', $outputFileName) or die "Could not open/create output file: $!";

while(<INFILE>) {
	chomp;
	
	if ((/^50\|([^|]+)\|/) || (/^51\|([^|]+)\|/)) {
		my $empID = $1;
		print OUTFILE $empID, ",";		
	}
	 if ((/^90\|([^|]+)\|/) || (/^91\|([^|]+)\|/)) {
		my $eDate = $1;
		my $dt = Time::Piece->strptime($eDate, '%m/%d/%Y');
		print OUTFILE $dt->strftime('%Y%m%d'), "\n";
	}
}

close INFILE;
close OUTFILE;

Open in new window


Thanks,
shragiAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ozoCommented:
$seen{$empID}++ || print OUTFILE $empID, ",";
0
shragiAuthor Commented:
OZO - there are two problems with your solution

1) Error - Global symbol "%seen" requires explicit package name at line 38
2) if the error is fixed and let say it ran well, then it will remove my emp id's but the dates that i get from other if conditions are not gone... so if your code works then my output will be something like

100000001,20150601
100000002,20150101
100000231,20150101
100002431,20150101
100023545,20150301
100000021,20150101
100000031,20150101
100000051,20150418
102044353,20150401
100000054,20150601
104560071,20150301
100045671,20150301
 ,20150301
 ,20150601
215645567,20150301

Thanks
0
ozoCommented:
$seen{$empID}++ && next;
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

shragiAuthor Commented:
how to get rid of that error - any ideas ?
0
wilcoxonCommented:
You just need to declare %seen somewhere before the loop:
my %seen;

Open in new window

0
shragiAuthor Commented:
I fixed that issue, but looks like i did not ask the question properly so I am not getting what i am expecting... so here is the input file that i am using

05|IDC|MEMBER|MAT EAGLE|5324 AKFHADLKFHSDAKLJFHDSLKF||NEWYORK|NY|402291375
10|526|Health Cooperation||534526|I
51|100000128|002|MAT||EAGLE|||NEWYORK|NY|4022434234291375|2|NEW|New Addition| ||||292343244890|||| |08/30/1986
91|06/01/2015|12/31/2015|M|HGFGLT|450045|1503450|FGADSFSDSUPNE|||ASDFD65465465||| |||||0.00|0|D|0|D|0|D|06/01/2015|||0.00|08/01/2015|0.00||||Y||0.00
36|PLNSUM|1|101808208|AF;DLKFJ AKLDFJD|4ADFDSF6546501393366|01/01/2015|    |M|20|40|150|60|0|294890|
40|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
42|||||||
44||||||||
05|IDC|MEMBER|STEVE AFJHADKJFH|547 DFLAKJHKAJSD||RTGYH|ERT|654651435
10|IDP|Individual Sub-Group||1|||O
50|10000001|001|STEVE|DFB|BEAMAFDDSF||RTGYH|ERT|40W23452450331435|2| |100132413243289|||294124869|ADFD|ASDF||||||  ||02/21/1958||654657988|
90|01/01/2015|12/31/2015|M|HLT|3000|1000|KYHC SUPNE|||72341344131000101||0.00|0|D|0|D|0|D|01/01/2014|ISDFND||0.00|08/01/2015|0.00||||YERE|CADFADFSR|0.00
40||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
42|||||||
44||||||||

so when i update my script with the solution provided by ozo it did not remove the date part
solution like below

100045671,20150301
 20150301
 20150601
215645567,20150301

#!perl

use strict;
use warnings;
use Pod::Usage;
use Getopt::Long;
use Time::Piece;

my $helpme = 0;
my $man = 0;

my $outputFileName = 'C:\\temp\\test_v1.csv';
my $inputFileName =  'C:\\temp\\test.txt';

my $errorcode = 0;
my $DEBUG=0;

if(exists $ENV{DEBUG}) {
	$DEBUG = ($ENV{DEBUG} eq "") ? 0 : $ENV{DEBUG};
}

GetOptions('help' => \$helpme, 'man' => \$man, 'infile=s' => \$inputFileName, 'outfile=s' => \$outputFileName) or pod2usage(2);

pod2usage(1) if $helpme;
pod2usage(-verbose => 2) if $man;

die 'No input file name specified!' unless $inputFileName;
die 'No output file name specified!' unless $outputFileName;

open(INFILE, '<', $inputFileName) or die "Could not open input file: $!";
open(OUTFILE, '>', $outputFileName) or die "Could not open/create output file: $!";

while(<INFILE>) {
	chomp;
	
	if ((/^50\|([^|]+)\|/) || (/^51\|([^|]+)\|/)) {
		my $empID = $1;
		$seen{$empID}++ && next; 	
	}
	 if ((/^90\|([^|]+)\|/) || (/^91\|([^|]+)\|/)) {
		my $eDate = $1;
		my $dt = Time::Piece->strptime($eDate, '%m/%d/%Y');
		print OUTFILE $dt->strftime('%Y%m%d'), "\n";
	}
}

close INFILE;
close OUTFILE;

Open in new window

0
ozoCommented:
my $empID = $1;
                $seen{$empID}++ && next;       
            print OUTFILE $empID, ",";
0
shragiAuthor Commented:
yes i did that ozo, no effect.
but if you check the script

let's say we are checking line that starts with 50 and lets say empid is seen so "&&next"
 will go to next line which is line that starts with 90 (second if condition in script)
as it passes second if condition it gets the date - so now i have no emp id but have date

Thanks,
0
ozoCommented:
my %seen;
my $empID;
while(<INFILE>) {
      if ((/^50\|([^|]+)\|/) || (/^51\|([^|]+)\|/)) {
            $empID = $1;
      }
       if ((/^90\|([^|]+)\|/) || (/^91\|([^|]+)\|/)) {
              $seen{$empID}++ && next;       
            my $eDate = $1;
            my $dt = Time::Piece->strptime($eDate, '%m/%d/%Y');
            print OUTFILE $empID, ","$dt->strftime('%Y%m%d'), "\n";
      }
}
0
shragiAuthor Commented:
OZO I just figured out another solution, it worked can you please check whether i made any mistakes I mean does it gonna fail ever.

#!perl

use strict;
use warnings;
use Pod::Usage;
use Getopt::Long;
use Time::Piece;

my $helpme = 0;
my $man = 0;

my $outputFileName = 'C:\\temp\\test_v1.csv';
my $inputFileName =  'C:\\temp\\test.txt';

my $errorcode = 0;
my $DEBUG=0;
my $tempflag = 0;

if(exists $ENV{DEBUG}) {
	$DEBUG = ($ENV{DEBUG} eq "") ? 0 : $ENV{DEBUG};
}

GetOptions('help' => \$helpme, 'man' => \$man, 'infile=s' => \$inputFileName, 'outfile=s' => \$outputFileName) or pod2usage(2);

pod2usage(1) if $helpme;
pod2usage(-verbose => 2) if $man;

die 'No input file name specified!' unless $inputFileName;
die 'No output file name specified!' unless $outputFileName;

open(INFILE, '<', $inputFileName) or die "Could not open input file: $!";
open(OUTFILE, '>', $outputFileName) or die "Could not open/create output file: $!";

while(<INFILE>) {
	chomp;
	
	if (((/^50\|([^|]+)\|/) || (/^51\|([^|]+)\|/)) && ($tempflag == 0)) {
		my $empID = $1;
		if($seen{$empID}++){
			$tempflag++ ; 	
			next;
		}
		print OUTFILE $empID, ",";
	}
	 if ((/^90\|([^|]+)\|/) || (/^91\|([^|]+)\|/)) {
		my $eDate = $1;
		my $dt = Time::Piece->strptime($eDate, '%m/%d/%Y');
		if($tempflag == 0){
			print OUTFILE $dt->strftime('%Y%m%d'), "\n";
		}		
		if($tempflag > 0){
			$tempflag = 0;
		}
	}
}

close INFILE;
close OUTFILE;

Open in new window

0
wilcoxonCommented:
I think this should work...
#!perl

use strict;
use warnings;
use Pod::Usage;
use Getopt::Long;
use Time::Piece;

my $helpme = 0;
my $man = 0;

my $outputFileName = 'C:\\temp\\test_v1.csv';
my $inputFileName =  'C:\\temp\\test.txt';

my $errorcode = 0;
my $DEBUG=0;

if(exists $ENV{DEBUG}) {
	$DEBUG = ($ENV{DEBUG} eq "") ? 0 : $ENV{DEBUG};
}

GetOptions('help' => \$helpme, 'man' => \$man, 'infile=s' => \$inputFileName, 'outfile=s' => \$outputFileName) or pod2usage(2);

pod2usage(1) if $helpme;
pod2usage(-verbose => 2) if $man;

die 'No input file name specified!' unless $inputFileName;
die 'No output file name specified!' unless $outputFileName;

open(INFILE, '<', $inputFileName) or die "Could not open input file: $!";
open(OUTFILE, '>', $outputFileName) or die "Could not open/create output file: $!";

my $empID;
while(<INFILE>) {
	chomp;
	
	if ((/^50\|([^|]+)\|/) || (/^51\|([^|]+)\|/)) {
		$empID = $1;
	}
	 if ((/^90\|([^|]+)\|/) || (/^91\|([^|]+)\|/)) {
		my $eDate = $1;
		my $dt = Time::Piece->strptime($eDate, '%m/%d/%Y');
		$seen{$empID}++ && next; 	
		print OUTFILE $empID, ',', $dt->strftime('%Y%m%d'), "\n";
	}
}

close INFILE;
close OUTFILE;

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
wilcoxonCommented:
I think your way will work but it's certainly not as clean or efficient as ozo/my solution (I always hate when someone else responds with the same answer while I'm typing it out).
0
shragiAuthor Commented:
Thanks OZO and wilcoxon - both solution are same and it worked along with mine. I will use your solution as it is much cleaner than mine.

Thanks,
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.