Avatar of shragi
shragi
Flag for India asked on

remove duplicates from the csv file

Hi - I wrote the below perl script to get emp Id and emp date of join.
I write the output of the script into a csv file in the below format

100000001,20150601
100000002,20150101
100000231,20150101
100002431,20150101
100023545,20150301
100000021,20150101
100000031,20150101
100000051,20150418
102044353,20150401
100000054,20150601
104560071,20150301
100045671,20150301
100045671,20150301
100045671,20150601
215645567,20150301


if you observe the emp id 100045671 was repeated 3 times, but i want it only once and only the first occurence
so from the below three
100045671,20150301
100045671,20150301
100045671,20150601
i just wanted the first one
100045671,20150301

in order to achieve that how can i modify the script.
I just don't want any duplicate employee id's in the file, if there are duplicates i want just the first occurence and remove the remaining from the csv.

use strict;
use warnings;
use Pod::Usage;
use Getopt::Long;
use Time::Piece;

my $helpme = 0;
my $man = 0;

my $outputFileName = 'C:\\temp\\test_v1.csv';
my $inputFileName =  'C:\\temp\\test.txt';

my $errorcode = 0;
my $DEBUG=0;

if(exists $ENV{DEBUG}) {
	$DEBUG = ($ENV{DEBUG} eq "") ? 0 : $ENV{DEBUG};
}

GetOptions('help' => \$helpme, 'man' => \$man, 'infile=s' => \$inputFileName, 'outfile=s' => \$outputFileName) or pod2usage(2);

pod2usage(1) if $helpme;
pod2usage(-verbose => 2) if $man;

die 'No input file name specified!' unless $inputFileName;
die 'No output file name specified!' unless $outputFileName;

open(INFILE, '<', $inputFileName) or die "Could not open input file: $!";
open(OUTFILE, '>', $outputFileName) or die "Could not open/create output file: $!";

while(<INFILE>) {
	chomp;
	
	if ((/^50\|([^|]+)\|/) || (/^51\|([^|]+)\|/)) {
		my $empID = $1;
		print OUTFILE $empID, ",";		
	}
	 if ((/^90\|([^|]+)\|/) || (/^91\|([^|]+)\|/)) {
		my $eDate = $1;
		my $dt = Time::Piece->strptime($eDate, '%m/%d/%Y');
		print OUTFILE $dt->strftime('%Y%m%d'), "\n";
	}
}

close INFILE;
close OUTFILE;

Open in new window


Thanks,
PerlScripting LanguagesProgramming

Avatar of undefined
Last Comment
shragi

8/22/2022 - Mon
ozo

$seen{$empID}++ || print OUTFILE $empID, ",";
shragi

ASKER
OZO - there are two problems with your solution

1) Error - Global symbol "%seen" requires explicit package name at line 38
2) if the error is fixed and let say it ran well, then it will remove my emp id's but the dates that i get from other if conditions are not gone... so if your code works then my output will be something like

100000001,20150601
100000002,20150101
100000231,20150101
100002431,20150101
100023545,20150301
100000021,20150101
100000031,20150101
100000051,20150418
102044353,20150401
100000054,20150601
104560071,20150301
100045671,20150301
 ,20150301
 ,20150601
215645567,20150301

Thanks
ozo

$seen{$empID}++ && next;
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
shragi

ASKER
how to get rid of that error - any ideas ?
wilcoxon

You just need to declare %seen somewhere before the loop:
my %seen;

Open in new window

shragi

ASKER
I fixed that issue, but looks like i did not ask the question properly so I am not getting what i am expecting... so here is the input file that i am using

05|IDC|MEMBER|MAT EAGLE|5324 AKFHADLKFHSDAKLJFHDSLKF||NEWYORK|NY|402291375
10|526|Health Cooperation||534526|I
51|100000128|002|MAT||EAGLE|||NEWYORK|NY|4022434234291375|2|NEW|New Addition| ||||292343244890|||| |08/30/1986
91|06/01/2015|12/31/2015|M|HGFGLT|450045|1503450|FGADSFSDSUPNE|||ASDFD65465465||| |||||0.00|0|D|0|D|0|D|06/01/2015|||0.00|08/01/2015|0.00||||Y||0.00
36|PLNSUM|1|101808208|AF;DLKFJ AKLDFJD|4ADFDSF6546501393366|01/01/2015|    |M|20|40|150|60|0|294890|
40|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
42|||||||
44||||||||
05|IDC|MEMBER|STEVE AFJHADKJFH|547 DFLAKJHKAJSD||RTGYH|ERT|654651435
10|IDP|Individual Sub-Group||1|||O
50|10000001|001|STEVE|DFB|BEAMAFDDSF||RTGYH|ERT|40W23452450331435|2| |100132413243289|||294124869|ADFD|ASDF||||||  ||02/21/1958||654657988|
90|01/01/2015|12/31/2015|M|HLT|3000|1000|KYHC SUPNE|||72341344131000101||0.00|0|D|0|D|0|D|01/01/2014|ISDFND||0.00|08/01/2015|0.00||||YERE|CADFADFSR|0.00
40||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
42|||||||
44||||||||

so when i update my script with the solution provided by ozo it did not remove the date part
solution like below

100045671,20150301
 20150301
 20150601
215645567,20150301

#!perl

use strict;
use warnings;
use Pod::Usage;
use Getopt::Long;
use Time::Piece;

my $helpme = 0;
my $man = 0;

my $outputFileName = 'C:\\temp\\test_v1.csv';
my $inputFileName =  'C:\\temp\\test.txt';

my $errorcode = 0;
my $DEBUG=0;

if(exists $ENV{DEBUG}) {
	$DEBUG = ($ENV{DEBUG} eq "") ? 0 : $ENV{DEBUG};
}

GetOptions('help' => \$helpme, 'man' => \$man, 'infile=s' => \$inputFileName, 'outfile=s' => \$outputFileName) or pod2usage(2);

pod2usage(1) if $helpme;
pod2usage(-verbose => 2) if $man;

die 'No input file name specified!' unless $inputFileName;
die 'No output file name specified!' unless $outputFileName;

open(INFILE, '<', $inputFileName) or die "Could not open input file: $!";
open(OUTFILE, '>', $outputFileName) or die "Could not open/create output file: $!";

while(<INFILE>) {
	chomp;
	
	if ((/^50\|([^|]+)\|/) || (/^51\|([^|]+)\|/)) {
		my $empID = $1;
		$seen{$empID}++ && next; 	
	}
	 if ((/^90\|([^|]+)\|/) || (/^91\|([^|]+)\|/)) {
		my $eDate = $1;
		my $dt = Time::Piece->strptime($eDate, '%m/%d/%Y');
		print OUTFILE $dt->strftime('%Y%m%d'), "\n";
	}
}

close INFILE;
close OUTFILE;

Open in new window

⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
ozo

my $empID = $1;
                $seen{$empID}++ && next;       
            print OUTFILE $empID, ",";
shragi

ASKER
yes i did that ozo, no effect.
but if you check the script

let's say we are checking line that starts with 50 and lets say empid is seen so "&&next"
 will go to next line which is line that starts with 90 (second if condition in script)
as it passes second if condition it gets the date - so now i have no emp id but have date

Thanks,
SOLUTION
ozo

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
shragi

ASKER
OZO I just figured out another solution, it worked can you please check whether i made any mistakes I mean does it gonna fail ever.

#!perl

use strict;
use warnings;
use Pod::Usage;
use Getopt::Long;
use Time::Piece;

my $helpme = 0;
my $man = 0;

my $outputFileName = 'C:\\temp\\test_v1.csv';
my $inputFileName =  'C:\\temp\\test.txt';

my $errorcode = 0;
my $DEBUG=0;
my $tempflag = 0;

if(exists $ENV{DEBUG}) {
	$DEBUG = ($ENV{DEBUG} eq "") ? 0 : $ENV{DEBUG};
}

GetOptions('help' => \$helpme, 'man' => \$man, 'infile=s' => \$inputFileName, 'outfile=s' => \$outputFileName) or pod2usage(2);

pod2usage(1) if $helpme;
pod2usage(-verbose => 2) if $man;

die 'No input file name specified!' unless $inputFileName;
die 'No output file name specified!' unless $outputFileName;

open(INFILE, '<', $inputFileName) or die "Could not open input file: $!";
open(OUTFILE, '>', $outputFileName) or die "Could not open/create output file: $!";

while(<INFILE>) {
	chomp;
	
	if (((/^50\|([^|]+)\|/) || (/^51\|([^|]+)\|/)) && ($tempflag == 0)) {
		my $empID = $1;
		if($seen{$empID}++){
			$tempflag++ ; 	
			next;
		}
		print OUTFILE $empID, ",";
	}
	 if ((/^90\|([^|]+)\|/) || (/^91\|([^|]+)\|/)) {
		my $eDate = $1;
		my $dt = Time::Piece->strptime($eDate, '%m/%d/%Y');
		if($tempflag == 0){
			print OUTFILE $dt->strftime('%Y%m%d'), "\n";
		}		
		if($tempflag > 0){
			$tempflag = 0;
		}
	}
}

close INFILE;
close OUTFILE;

Open in new window

Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
ASKER CERTIFIED SOLUTION
wilcoxon

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
wilcoxon

I think your way will work but it's certainly not as clean or efficient as ozo/my solution (I always hate when someone else responds with the same answer while I'm typing it out).
shragi

ASKER
Thanks OZO and wilcoxon - both solution are same and it worked along with mine. I will use your solution as it is much cleaner than mine.

Thanks,