shragi
asked on
remove duplicates from the csv file
Hi - I wrote the below perl script to get emp Id and emp date of join.
I write the output of the script into a csv file in the below format
100000001,20150601
100000002,20150101
100000231,20150101
100002431,20150101
100023545,20150301
100000021,20150101
100000031,20150101
100000051,20150418
102044353,20150401
100000054,20150601
104560071,20150301
100045671,20150301
100045671,20150301
100045671,20150601
215645567,20150301
if you observe the emp id 100045671 was repeated 3 times, but i want it only once and only the first occurence
so from the below three
100045671,20150301
100045671,20150301
100045671,20150601
i just wanted the first one
100045671,20150301
in order to achieve that how can i modify the script.
I just don't want any duplicate employee id's in the file, if there are duplicates i want just the first occurence and remove the remaining from the csv.
Thanks,
I write the output of the script into a csv file in the below format
100000001,20150601
100000002,20150101
100000231,20150101
100002431,20150101
100023545,20150301
100000021,20150101
100000031,20150101
100000051,20150418
102044353,20150401
100000054,20150601
104560071,20150301
100045671,20150301
100045671,20150301
100045671,20150601
215645567,20150301
if you observe the emp id 100045671 was repeated 3 times, but i want it only once and only the first occurence
so from the below three
100045671,20150301
100045671,20150301
100045671,20150601
i just wanted the first one
100045671,20150301
in order to achieve that how can i modify the script.
I just don't want any duplicate employee id's in the file, if there are duplicates i want just the first occurence and remove the remaining from the csv.
use strict;
use warnings;
use Pod::Usage;
use Getopt::Long;
use Time::Piece;
my $helpme = 0;
my $man = 0;
my $outputFileName = 'C:\\temp\\test_v1.csv';
my $inputFileName = 'C:\\temp\\test.txt';
my $errorcode = 0;
my $DEBUG=0;
if(exists $ENV{DEBUG}) {
$DEBUG = ($ENV{DEBUG} eq "") ? 0 : $ENV{DEBUG};
}
GetOptions('help' => \$helpme, 'man' => \$man, 'infile=s' => \$inputFileName, 'outfile=s' => \$outputFileName) or pod2usage(2);
pod2usage(1) if $helpme;
pod2usage(-verbose => 2) if $man;
die 'No input file name specified!' unless $inputFileName;
die 'No output file name specified!' unless $outputFileName;
open(INFILE, '<', $inputFileName) or die "Could not open input file: $!";
open(OUTFILE, '>', $outputFileName) or die "Could not open/create output file: $!";
while(<INFILE>) {
chomp;
if ((/^50\|([^|]+)\|/) || (/^51\|([^|]+)\|/)) {
my $empID = $1;
print OUTFILE $empID, ",";
}
if ((/^90\|([^|]+)\|/) || (/^91\|([^|]+)\|/)) {
my $eDate = $1;
my $dt = Time::Piece->strptime($eDate, '%m/%d/%Y');
print OUTFILE $dt->strftime('%Y%m%d'), "\n";
}
}
close INFILE;
close OUTFILE;
Thanks,
$seen{$empID}++ || print OUTFILE $empID, ",";
ASKER
OZO - there are two problems with your solution
1) Error - Global symbol "%seen" requires explicit package name at line 38
2) if the error is fixed and let say it ran well, then it will remove my emp id's but the dates that i get from other if conditions are not gone... so if your code works then my output will be something like
100000001,20150601
100000002,20150101
100000231,20150101
100002431,20150101
100023545,20150301
100000021,20150101
100000031,20150101
100000051,20150418
102044353,20150401
100000054,20150601
104560071,20150301
100045671,20150301
,20150301
,20150601
215645567,20150301
Thanks
1) Error - Global symbol "%seen" requires explicit package name at line 38
2) if the error is fixed and let say it ran well, then it will remove my emp id's but the dates that i get from other if conditions are not gone... so if your code works then my output will be something like
100000001,20150601
100000002,20150101
100000231,20150101
100002431,20150101
100023545,20150301
100000021,20150101
100000031,20150101
100000051,20150418
102044353,20150401
100000054,20150601
104560071,20150301
100045671,20150301
,20150301
,20150601
215645567,20150301
Thanks
$seen{$empID}++ && next;
ASKER
how to get rid of that error - any ideas ?
You just need to declare %seen somewhere before the loop:
my %seen;
ASKER
I fixed that issue, but looks like i did not ask the question properly so I am not getting what i am expecting... so here is the input file that i am using
05|IDC|MEMBER|MAT EAGLE|5324 AKFHADLKFHSDAKLJFHDSLKF||N EWYORK|NY| 402291375
10|526|Health Cooperation||534526|I
51|100000128|002|MAT||EAGL E|||NEWYOR K|NY|40224 3423429137 5|2|NEW|Ne w Addition| ||||292343244890|||| |08/30/1986
91|06/01/2015|12/31/2015|M |HGFGLT|45 0045|15034 50|FGADSFS DSUPNE|||A SDFD654654 65||| |||||0.00|0|D|0|D|0|D|06/0 1/2015|||0 .00|08/01/ 2015|0.00| |||Y||0.00
36|PLNSUM|1|101808208|AF;D LKFJ AKLDFJD|4ADFDSF65465013933 66|01/01/2 015| |M|20|40|150|60|0|294890|
40|||||||||||||||||||||||| |||||||||| |||||||||| |||||||||| |||||||||| |||||||||| |||||||||| |||||
42|||||||
44||||||||
05|IDC|MEMBER|STEVE AFJHADKJFH|547 DFLAKJHKAJSD||RTGYH|ERT|65 4651435
10|IDP|Individual Sub-Group||1|||O
50|10000001|001|STEVE|DFB| BEAMAFDDSF ||RTGYH|ER T|40W23452 450331435| 2| |100132413243289|||2941248 69|ADFD|AS DF|||||| ||02/21/1958||654657988|
90|01/01/2015|12/31/2015|M |HLT|3000| 1000|KYHC SUPNE|||72341344131000101| |0.00|0|D| 0|D|0|D|01 /01/2014|I SDFND||0.0 0|08/01/20 15|0.00||| |YERE|CADF ADFSR|0.00
40|||||||||||||||||||||||| |||||||||| |||||||||| |||||||||| |||||||||| |||||||||| |||||||||| ||||||
42|||||||
44||||||||
so when i update my script with the solution provided by ozo it did not remove the date part
solution like below
100045671,20150301
20150301
20150601
215645567,20150301
05|IDC|MEMBER|MAT EAGLE|5324 AKFHADLKFHSDAKLJFHDSLKF||N
10|526|Health Cooperation||534526|I
51|100000128|002|MAT||EAGL
91|06/01/2015|12/31/2015|M
36|PLNSUM|1|101808208|AF;D
40||||||||||||||||||||||||
42|||||||
44||||||||
05|IDC|MEMBER|STEVE AFJHADKJFH|547 DFLAKJHKAJSD||RTGYH|ERT|65
10|IDP|Individual Sub-Group||1|||O
50|10000001|001|STEVE|DFB|
90|01/01/2015|12/31/2015|M
40||||||||||||||||||||||||
42|||||||
44||||||||
so when i update my script with the solution provided by ozo it did not remove the date part
solution like below
100045671,20150301
20150301
20150601
215645567,20150301
#!perl
use strict;
use warnings;
use Pod::Usage;
use Getopt::Long;
use Time::Piece;
my $helpme = 0;
my $man = 0;
my $outputFileName = 'C:\\temp\\test_v1.csv';
my $inputFileName = 'C:\\temp\\test.txt';
my $errorcode = 0;
my $DEBUG=0;
if(exists $ENV{DEBUG}) {
$DEBUG = ($ENV{DEBUG} eq "") ? 0 : $ENV{DEBUG};
}
GetOptions('help' => \$helpme, 'man' => \$man, 'infile=s' => \$inputFileName, 'outfile=s' => \$outputFileName) or pod2usage(2);
pod2usage(1) if $helpme;
pod2usage(-verbose => 2) if $man;
die 'No input file name specified!' unless $inputFileName;
die 'No output file name specified!' unless $outputFileName;
open(INFILE, '<', $inputFileName) or die "Could not open input file: $!";
open(OUTFILE, '>', $outputFileName) or die "Could not open/create output file: $!";
while(<INFILE>) {
chomp;
if ((/^50\|([^|]+)\|/) || (/^51\|([^|]+)\|/)) {
my $empID = $1;
$seen{$empID}++ && next;
}
if ((/^90\|([^|]+)\|/) || (/^91\|([^|]+)\|/)) {
my $eDate = $1;
my $dt = Time::Piece->strptime($eDate, '%m/%d/%Y');
print OUTFILE $dt->strftime('%Y%m%d'), "\n";
}
}
close INFILE;
close OUTFILE;
my $empID = $1;
$seen{$empID}++ && next;
print OUTFILE $empID, ",";
$seen{$empID}++ && next;
print OUTFILE $empID, ",";
ASKER
yes i did that ozo, no effect.
but if you check the script
let's say we are checking line that starts with 50 and lets say empid is seen so "&&next"
will go to next line which is line that starts with 90 (second if condition in script)
as it passes second if condition it gets the date - so now i have no emp id but have date
Thanks,
but if you check the script
let's say we are checking line that starts with 50 and lets say empid is seen so "&&next"
will go to next line which is line that starts with 90 (second if condition in script)
as it passes second if condition it gets the date - so now i have no emp id but have date
Thanks,
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
OZO I just figured out another solution, it worked can you please check whether i made any mistakes I mean does it gonna fail ever.
#!perl
use strict;
use warnings;
use Pod::Usage;
use Getopt::Long;
use Time::Piece;
my $helpme = 0;
my $man = 0;
my $outputFileName = 'C:\\temp\\test_v1.csv';
my $inputFileName = 'C:\\temp\\test.txt';
my $errorcode = 0;
my $DEBUG=0;
my $tempflag = 0;
if(exists $ENV{DEBUG}) {
$DEBUG = ($ENV{DEBUG} eq "") ? 0 : $ENV{DEBUG};
}
GetOptions('help' => \$helpme, 'man' => \$man, 'infile=s' => \$inputFileName, 'outfile=s' => \$outputFileName) or pod2usage(2);
pod2usage(1) if $helpme;
pod2usage(-verbose => 2) if $man;
die 'No input file name specified!' unless $inputFileName;
die 'No output file name specified!' unless $outputFileName;
open(INFILE, '<', $inputFileName) or die "Could not open input file: $!";
open(OUTFILE, '>', $outputFileName) or die "Could not open/create output file: $!";
while(<INFILE>) {
chomp;
if (((/^50\|([^|]+)\|/) || (/^51\|([^|]+)\|/)) && ($tempflag == 0)) {
my $empID = $1;
if($seen{$empID}++){
$tempflag++ ;
next;
}
print OUTFILE $empID, ",";
}
if ((/^90\|([^|]+)\|/) || (/^91\|([^|]+)\|/)) {
my $eDate = $1;
my $dt = Time::Piece->strptime($eDate, '%m/%d/%Y');
if($tempflag == 0){
print OUTFILE $dt->strftime('%Y%m%d'), "\n";
}
if($tempflag > 0){
$tempflag = 0;
}
}
}
close INFILE;
close OUTFILE;
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I think your way will work but it's certainly not as clean or efficient as ozo/my solution (I always hate when someone else responds with the same answer while I'm typing it out).
ASKER
Thanks OZO and wilcoxon - both solution are same and it worked along with mine. I will use your solution as it is much cleaner than mine.
Thanks,
Thanks,