We help IT Professionals succeed at work.

Unix Shell script or Perl script to merge two csv text files based on a common key column

sunhux
sunhux asked
on

I have the following 2 text files (with their columns separated by comma)
& I need to merge them.  File2 has all the keys (ie the first column) values
of file1  but both files are not sorted.  So all the keys (ie 1st column) in
file1 is a subset of the keys found in file2.

file1:
INC00023233,description 1 text,field 1 text,.....,field 1 text
INC00023132,description 2 text,field 2 text,.....,field 2 text
INC00023073,description 3 text,field 3 text,.....,field 3 text
INC00023573,description 4 text,field 4 text,.....,field 4 text
........


file2:
INC00011312,start date fieldA,end date fieldA
......
INC00023233,start date fieldB,end date fieldB
PBI00023232,start date fieldC,end date fieldC
......
INC00023073,start date fieldD,end date fieldD
......
INC00023132,start date fieldE,end date fieldE
.....
INC00023573,start date fieldF,end date fieldF
.....


I would like to merge the above 2 files such that the resultant merged csv file is as follows :
INC00023233,description 1 text,field 1 text,.....,field 1 text,start date fieldB,end date fieldB
INC00023132,description 2 text,field 2 text,.....,field 2 text,start date fieldE,end date fieldE
INC00023073,description 3 text,field 3 text,.....,field 3 text,start date fieldD,end date fieldD
INC00023573,description 4 text,field 4 text,.....,field 4 text,start date fieldF,end date fieldF
. . . . .



The UNIX I'm running is a Redhat 4 but I do have HP-UX B11.11 (Perl interpreter is in
the Redhat servers but I'm not too sure about the HP-UX servers), so the script
provided need to be able to run on those platforms
Comment
Watch Question

Distinguished Expert 2019
Commented:
Presuming there are no commas in the various fields i.e. description, etc. do not include commas.

A simpler method might be to use mysql and load the first file and then issue updates with data from the second file.

You can store the output in a file and load it.
In this case make sure to add #!/usr/local/bin/perl at the top of the file and 1; at the end of the output file so you do not get errors when you require the file where the hash is stored..
#!/usr/local/bin/perl
$file1="path to filename1";
my $hash={};
open (File1, "<${file1}") || die "Unable to open ${file1} for reading $! \n";
while (<File1>){
chomp();
@array=split(/,/,$_);
#The below builds a referenced hash of hashes
$hash->{$array[0]}{'exists'}=1;
$hash->{$array[0]}{'description 1 test'}=$array[1];
$hash->{$array[0]}{'field 1 test'}=$array[2];
$hash->{$array[0]}{'field 2 test'}=$array[3];
.
.
.
}
#done with processing file1.
close(File1);
open (File2,"<$file2") || die "Unable to open $file2 for reading: $!\n:"
while (<File2>){
chomp();
@array=split (/,/,$_);
if (exists $hash->{$array[0]}{'exists'} and $hash->{$array[0]}{'exists'}==1){
$hash->{$array[0]}{'first field from second file'}=$array[1];
$hash->{$array[0]}{'second field'}=$array[2];
$hash->{$array[0]}{'third field'}=$array[3];
.
.
.
.
}

}
#done with processing files when there are matching keys.
close(File2);

#you now have a $hash that is a reference to hashes.
foreach $key (keys %{$hash}){
#    print "$key :$hash->{$key}\n";
        foreach $index (keys %{$hash->{$key}}) {
           print "\$hash->{'$key'}{'$index'}=$hash->{$key}{$index}\n";
        }
}

Open in new window

Distinguished Expert 2019
Commented:
You should work/process file2 first and then file1.
To account for missing data from the first file, the inclusion of the empty hash entries
would maintain a uniform output of fields.


For the output you wanted replace:
#you now have a $hash that is a reference to hashes.
foreach $key (keys %{$hash}){
#    print "$key :$hash->{$key}\n";
        foreach $index (keys %{$hash->{$key}}) {
           print "\$hash->{'$key'}{'$index'}=$hash->{$key}{$index}\n";
        }
}

with
#you now have a $hash that is a reference to hashes.
foreach $key (keys %{$hash}){
    print "$key,";
        $count=0;
        foreach $index (keys %{$hash->{$key}}) {
           print ',' if ($count>0);
           print "$hash->{$key}{$index}";
           $count=1;
        }
print "\n";
}

Open in new window

Distinguished Expert 2019
Commented:
as follows:
#!/usr/local/bin/perl
$file1="file2";
$file2="file1";
my $hash={};
open (File1, "<${file1}") || die "Unable to open ${file1} for reading $! \n";
while (<File1>){
chomp();
@array=split(/,/,$_);
#The below builds a referenced hash of hashes
$hash->{$array[0]}{'exists'}=1;
$hash->{$array[0]}{'description 1 test'}=$array[1];
$hash->{$array[0]}{'field 1 test'}=$array[2];
$hash->{$array[0]}{'field 2 test'}=$array[3];
$hash->{$array[0]}{'field 3 test'}=$array[4];
$hash->{$array[0]}{'field 4 test'}=$array[5];
$hash->{$array[0]}{'first field from second file'}='';
$hash->{$array[0]}{'second field'}='';
$hash->{$array[0]}{'second field 2 test'}='';
$hash->{$array[0]}{'second field 3 test'}='';
$hash->{$array[0]}{'second field 4 test'}='';
}
#done with processing file1.
close(File1);
open (File2,"<$file2") || die "Unable to open $file2 for reading: $!\n:";
while (<File2>){
chomp();
@array=split (/,/,$_);
if (exists $hash->{$array[0]}{'exists'} and $hash->{$array[0]}{'exists'}==1){
$hash->{$array[0]}{'first field from second file'}=$array[1];
$hash->{$array[0]}{'second field'}=$array[2];
$hash->{$array[0]}{'second field 2 test'}=$array[3];
$hash->{$array[0]}{'second field 3 test'}=$array[4];
$hash->{$array[0]}{'second field 4 test'}=$array[5];
}
else {
   print STDERR "$array[0] does not exist\n";
}

}
#done with processing files when there are matching keys.
close(File2);

#you now have a $hash that is a reference to hashes.
#foreach $key (keys %{$hash}){
#   print "$key :$hash->{$key}\n";
#foreach $index (keys %{$hash->{$key}}) {
#          print "\$hash->{'$key'}{'$index'}=$hash->{$key}{$index}\n";
#}
#i}
#you now have a $hash that is a reference to hashes.
foreach $key (keys %{$hash}){
    print "$key,";
        $count=0;
        foreach $index (keys %{$hash->{$key}}) {
           print ',' if ($count>0);
           print "$hash->{$key}{$index}";
           $count=1;
        }
print "\n";
}

Open in new window



Or you could do a join like this:

sort -o file1.srt file1
sort file2 |\
join  -j 1 -t, -o 1.1 1.2 1.3 2.2 2.3 file1.srt -

Open in new window

Author

Commented:

Thanks guys.  I also tried the following :
Use Excel to extract the 1st column from the 1st file & save it as a text file, file1.


Then use the following script to merge :

for key in `cat file1`
do
{
grep -w $key file2 >> file3
if [ $? -gt 0 ] ; then
 echo $key >> file3
fi
}
done

& the resultant file3 is then opened using Excel & then & I copied the columns
that were removed earlier in file1 & paste/merge it.