Solved

Compare 2 csv files to get common values

Posted on 2011-09-22
8
440 Views
Last Modified: 2012-05-12
Hi all,

A Perl beginner's first post in expert exchange XD

I a nutshell, I am attempting to determine whether a column (pipe delimited file) from file1.csv match one of the columns of file2.txt. If they do I need to output the some of the columns of file1.txt and file 2.txt to outputfile.txt.

To help me illustrate, my 2 files are:

File 1:
1234|P|2
1235|P|3
1236|P|4
1237|P|5

File 2:
139387|1234
139853|1235
140030|3531
140730|3568

The output should be:
1234|139387|2
1235|139853|3

Looking forward to responses from the perl gurus out there!

Many thanks!

Jason
0
Comment
Question by:Jason_Sutiono
  • 4
  • 4
8 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 36578862
perl -F"[|]" -lane 'if( @ARGV ){ $h{$F[1]}=$F[0] }elsif( $F[1]=$h{$F[0]} ){ print join"|",@F }' file2 file1 > output
0
 

Author Comment

by:Jason_Sutiono
ID: 36578909
Thanks for the quick reply. Would you be able to simplify the code slightly as your code is too efficient for a beginner to comprehend.

Many thanks!

0
 

Author Comment

by:Jason_Sutiono
ID: 36578968
Hi all,

I actually did write a code but somehow I am not able to get column 1 from file 1 to be displayed. Code is as per below:

#!/usr/bin/perl -s
#use strict;
use warnings;
my @so;

open(FILE1,"<$file1");
open(FILE2,"<$file2");
open(INFO, ">$file");  

while (<FILE2>) {
my @col = split /\|/;
$so{$col[1]}="$col[0]";
}

close FILE2;
open(FILE2,"<$file2");

my @arr1=<FILE1>;
my @arr2=<FILE2>;
close FILE1;
close FILE2;

my %chash;

for (@arr2){
chomp;
my($col1a,$col2a)=split(/\|/);
my $ckey="$col2a";

$chash{$ckey}=1;
}
for (@arr1){
chomp;
my($col1,$col2,$col3)=split(/\|/);
$col1="" if(!$col1);
my $ckey="$col1";

if($chash{$ckey}){
print INFO "$so{$col1}|$col1|$col3\n";

}
}
close INFO;


----------------------------------
When i run the code above an error is invoked saying that $so has not been initialized???
0
 
LVL 84

Expert Comment

by:ozo
ID: 36579208
my @so;
did you mean
my %so;
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 84

Expert Comment

by:ozo
ID: 36579219
perl -Mdiagnostics  Q_27321601.perl
Name "main::file" used only once: possible typo at Q_27321601.perl line 8 (#1)
    (W once) Typographical errors often show up as unique variable names.
    If you had a good reason for having a unique name, then just mention it
    again somehow to suppress the message.  The our declaration is
    provided for this purpose.
   
    NOTE: This warning detects symbols that have been used only once so $c, @c,
    %c, *c, &c, sub c{}, c(), and c (the filehandle or format) are considered
    the same; if a program uses $c only once but also uses any of the others it
    will not trigger this warning.
   
Name "main::file1" used only once: possible typo at Q_27321601.perl line 6 (#1)

Use of uninitialized value $file1 in concatenation (.) or string at
      Q_27321601.perl line 6 (#2)
    (W uninitialized) An undefined value was used as if it were already
    defined.  It was interpreted as a "" or a 0, but maybe it was a mistake.
    To suppress this warning assign a defined value to your variables.
   
    To help you figure out what was undefined, perl will try to tell you the
    name of the variable (if any) that was undefined. In some cases it cannot
    do this, so it also tells you what operation you used the undefined value
    in.  Note, however, that perl optimizes your program and the operation
    displayed in the warning may not necessarily appear literally in your
    program.  For example, "that $foo" is usually optimized into "that "
    . $foo, and the warning will refer to the concatenation (.) operator,
    even though there is no . in your program.
0
 

Author Comment

by:Jason_Sutiono
ID: 36579236
Yes.  my %so;
0
 
LVL 84

Accepted Solution

by:
ozo earned 125 total points
ID: 36579243
#!/usr/bin/perl -s                                                              
use strict;
use warnings;
my %so;
my $file1="file1.csv";
my $file2="file2.txt";
my $file="outputfile.txt";
open(FILE1,"<$file1") or die "$file1 $!";
open(FILE2,"<$file2") or die "$file2 $!";
open(INFO, ">$file") or die "$file $!";

while (<FILE2>) {
  chomp;
  my @col = split /\|/;
  $so{$col[1]}=$col[0];
}

close FILE2;
open(FILE2,"<$file2") or die "$file2 $!";

my @arr1=<FILE1>;
my @arr2=<FILE2>;
close FILE1;
close FILE2;

my %chash;

for (@arr2){
  chomp;
  my($col1a,$col2a)=split(/\|/);
  $chash{$col2a}=1;
}
for (@arr1){
  chomp;
  my($col1,$col2,$col3)=split(/\|/);
  $col1="" if(!$col1);

  if( $so{$col1} ){
    print INFO "$so{$col1}|$col1|$col3\n";
  }
}
close INFO;
0
 

Author Closing Comment

by:Jason_Sutiono
ID: 36579594
Thanks ozo u r the legend
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Windows 10 is mostly good. However the one thing that annoys me is how many clicks you have to do to dial a VPN connection. You have to go to settings from the start menu, (2 clicks), Network and Internet (1 click), Click VPN (another click) then fi…

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now