asked on

Compare 2 csv files to get common values

Hi all,

A Perl beginner's first post in expert exchange XD

I a nutshell, I am attempting to determine whether a column (pipe delimited file) from file1.csv match one of the columns of file2.txt. If they do I need to output the some of the columns of file1.txt and file 2.txt to outputfile.txt.

To help me illustrate, my 2 files are:

File 1:
1234|P|2
1235|P|3
1236|P|4
1237|P|5

File 2:
139387|1234
139853|1235
140030|3531
140730|3568

The output should be:
1234|139387|2
1235|139853|3

Looking forward to responses from the perl gurus out there!

Many thanks!

Jason

ozo

perl -F"[|]" -lane 'if( @ARGV ){ $h{$F[1]}=$F[0] }elsif( $F[1]=$h{$F[0]} ){ print join"|",@F }' file2 file1 > output

Jason_Sutiono

ASKER

Thanks for the quick reply. Would you be able to simplify the code slightly as your code is too efficient for a beginner to comprehend.

Many thanks!

Jason_Sutiono

ASKER

Hi all,

I actually did write a code but somehow I am not able to get column 1 from file 1 to be displayed. Code is as per below:

#!/usr/bin/perl -s
#use strict;
use warnings;
my @so;

open(FILE1,"<$file1");
open(FILE2,"<$file2");
open(INFO, ">$file");

while (<FILE2>) {
my @col = split /\|/;
$so{$col[1]}="$col[0]";
}

close FILE2;
open(FILE2,"<$file2");

my @arr1=<FILE1>;
my @arr2=<FILE2>;
close FILE1;
close FILE2;

my %chash;

for (@arr2){
chomp;
my($col1a,$col2a)=split(/\|/);
my $ckey="$col2a";

$chash{$ckey}=1;
}
for (@arr1){
chomp;
my($col1,$col2,$col3)=split(/\|/);
$col1="" if(!$col1);
my $ckey="$col1";

if($chash{$ckey}){
print INFO "$so{$col1}|$col1|$col3\n";

}
}
close INFO;

----------------------------------
When i run the code above an error is invoked saying that $so has not been initialized???

ozo

my @so;

did you mean
my %so;

ozo

perl -Mdiagnostics Q_27321601.perl
Name "main::file" used only once: possible typo at Q_27321601.perl line 8 (#1)
(W once) Typographical errors often show up as unique variable names.
If you had a good reason for having a unique name, then just mention it
again somehow to suppress the message. The our declaration is
provided for this purpose.

NOTE: This warning detects symbols that have been used only once so $c, @c,
%c, *c, &c, sub c{}, c(), and c (the filehandle or format) are considered
the same; if a program uses $c only once but also uses any of the others it
will not trigger this warning.

Name "main::file1" used only once: possible typo at Q_27321601.perl line 6 (#1)

Use of uninitialized value $file1 in concatenation (.) or string at
Q_27321601.perl line 6 (#2)
(W uninitialized) An undefined value was used as if it were already
defined. It was interpreted as a "" or a 0, but maybe it was a mistake.
To suppress this warning assign a defined value to your variables.

To help you figure out what was undefined, perl will try to tell you the
name of the variable (if any) that was undefined. In some cases it cannot
do this, so it also tells you what operation you used the undefined value
in. Note, however, that perl optimizes your program and the operation
displayed in the warning may not necessarily appear literally in your
program. For example, "that $foo" is usually optimized into "that "
. $foo, and the warning will refer to the concatenation (.) operator,
even though there is no . in your program.

Jason_Sutiono

ASKER

Yes. my %so;

ASKER CERTIFIED SOLUTION

ozo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Jason_Sutiono

ASKER

Thanks ozo u r the legend