Solved

Perl expert complex data manupulation

Posted on 2014-01-10
13
298 Views
Last Modified: 2014-01-16
Hi ,

I have a data which need to be manipulated as below.

filename1.csv
Pat_name,Pat_number,pat_id
raj,123,3
dinesh,,4

filename2.csv
Pat_name,Pat_number
raj,123


compare both files on only same column names and get the desired output .
If the value in rows match = success, else = failure
In this case only 2 columns present in both files (pat_id can be ignored)

output.csv

Pat_name_file1Name,Pat_name_file2Name,Match,Pat_number_Filename1,Pat_number_Filename2,Match
raj,raj,Success,123,123,Success
dinesh,,Failure,,,Success


PS. My original files has around 10 columns .

Thanks
Magento
0
Comment
Question by:magento
  • 7
  • 6
13 Comments
 
LVL 84

Expert Comment

by:ozo
Comment Utility
Do you mean something like this?
perl -F, -lane '$m{$F[0]}[1-@ARGV]=$F[1];END{print join",",exists $m{$_}[0]&&$_,exists $m{$_}[1]&&$_,($#{$m{$_}}?"Success":"Failure"),@{$m{$_}}[0,1],$m{$_}[0]eq$m{$_}[1]?"Success":"Failure" for keys %m}' filename1.csv filename2.csv
0
 
LVL 5

Author Comment

by:magento
Comment Utility
Hi Ozo,

Getting the below error..

C:\Expert-Exchange\perl>perl -F, -lane '$m{$F[0]}[1-@ARGV]=$F[1];END{print join
,",exists $m{$_}[0]&&$_,exists $m{$_}[1]&&$_,($#{$m{$_}}?"Success":"Failure"),@
$m{$_}}[0,1],$m{$_}[0]eq$m{$_}[1]?"Success":"Failure" for keys %m}' 1.txt 2.txt

Can't find string terminator "'" anywhere before EOF at -e line 1.

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
It looks like you are using the DOS shell, which uses different quoting rules.
Try
perl -F, -lane "$m{$F[0]}[1-@ARGV]=$F[1];END{print join',',exists $m{$_}[0]&&$_,exists $m{$_}[1]&&$_,($#{$m{$_}}?'Success':'Failure'),@{$m{$_}}[0,1],$m{$_}[0]eq$m{$_}[1]?'Success':'Failure' for keys %m}" filename1.csv filename2.csv
0
 
LVL 5

Author Comment

by:magento
Comment Utility
Hi Ozo ,

It works only for 2 columns , my file has ~ 10 columns.

Also the header is not working for my original files.

Please advice.
1.csv
2.csv
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
What should the results be when given 1.csv and 2.csv as input?
0
 
LVL 5

Author Comment

by:magento
Comment Utility
Hi Ozo,

The result should be in 3.csv

The idea is to combine columns present in both files and check for matches.
so columnName of file1 is matched with columName of file2 and a match result should say success Iif matched) or false (if not matched)

If you need the sample from original files, please tell me i will attach it.

The header should be modified with filenames.

1.csv
col1,col2,col3,col4
a       b       c      d

2.csv
col1,col2,col4
a         b     e

3.csv
col1_1 col1_2 match col2_1 col2_2 match col4_1 cpl4_2 match
a              a        Success  b       b     Success    d       e      false


Thanks,
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
Comment Utility
Do you mean like this?
#!/usr/bin/perl
use strict;
use warnings;
my(@c,@m);
while( <> ){
    chomp;
    my @F=split/,/; 
    @c=@F,next unless @c;
    @{$m[1-@ARGV]}{@c}=@F;
    close ARGV,undef @c if 2..2;
}
$\=$/;
my @k=sort grep $m[1]{$_},keys %{$m[0]};
print join"\t",map{$_."_1",$_."_2","match"}@k;
print join"\t",map{$m[0]{$_},$m[1]{$_},$m[0]{$_}eq$m[1]{$_}?"Success":"false"}@k;

Open in new window

0
 
LVL 5

Author Comment

by:magento
Comment Utility
Hi Ozo ,

I run ur code and it produces the output in command prompt which was looks like  the below.

But it only prints for 1st record only also not comma separated.

C:\Expert-Exchange\perl>perl test.pl 1.csv 2.csv
Activity Status_1       Activity Status_2       match   Available Quantity_1
Available Quantity_2    match   Collection Status_1     Collection Status_2
match   Container name_1        Container name_2        match   Created On_1
Created On_2    match   Initial Quantity_1      Initial Quantity_2      match
Is Available_1  Is Available_2  match   Label_1 Label_2 match   Lineage_1
Lineage_2       match   Pathological Status_1   Pathological Status_2   match
Position one_1  Position one_2  match   Position two_1  Position two_2  match
SCG name_1      SCG name_2      match   Specimen Class_1        Specimen Class_2
        match   Tissue Side_1   Tissue Side_2   match   Tissue Site_1   Tissue S
ite_2   match   Type_1  Type_2  match
Active  Active  Success 13.8    13.8    Success Collected       Collected
Success D6 S4 R16 B5    D6 S4 R16 B5    Success 9/2/2008        9/2/2008 0:00
false   13.8    13.8    Success TRUE    1       false   B0006697        B0006697
        Success New     New     Success Not Specified   Not Specified   Success
6       9       false   9       6       false   A RANDOMIZED  PHASE II TRIAL _30
834_180328      A RANDOMIZED  PHASE II TRIAL _30834_180328      Success Molecula
r       Molecular       Success Not Specified   Not Specified   Success Blood
Blood   Success DNA     DNA     Success

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
3.csv in http:#a39773244 prints for 1st record only also not comma separated

If there are more records, how would you want to determine which _1 record is to be matched with which _2 record?
0
 
LVL 5

Author Comment

by:magento
Comment Utility
Hi Ozo ,

I am sorry i have posted that only as a sample.

It should be in csv format.

>>>If there are more records, how would you want to determine which _1 record is to be matched with which _2 record?

The matching will be based on column name.

File1.csv has a column named Activity Status then it should be matched with the same column name (Activity Status) in file 2.csv.

Thanks
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
It looks like lots of records have the same Activity Status
Which Active in 1.csv goes with which Active in 2.csv?
0
 
LVL 5

Author Comment

by:magento
Comment Utility
Ok , here is the column in 1.csv

SCG name,Label,Barcode,External Id Name#1,External Id Value#1,External Id Name#2,External Id Value#2,Created On,Activity Status,Is Available,Collection Status,Specimen Class,Type,Tissue Site,Tissue Side,Initial Quantity,Available Quantity,Lineage,Concentration,Pathological Status,Container name,Position one,Position two,Comments

Here is the column in 2.csv

SCG name,Label,Created On,Activity Status,Is Available,Collection Status,Specimen Class,Type,Tissue Site,Tissue Side,Initial Quantity,Available Quantity,Lineage,Pathological Status,Container name,Position one,Position two,Comments

SCG name in 1.csv need to match with the same name in 2.csv . In 2.csv the column name SCG name is present and hence it will match with them.
If a column in 1.csv not present in 2.csv (in this case Barcode,External Id Name#1,External Id Value#1,External Id Name#2,External Id Value#2,Concentration) those columns should be removed or ignored.

The same way row1 of column SCG name match with row1 of SCG name in 2.csv
Hope i was clear now.

Thanks ozo for ur help on this.
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
It looks like several records have the same SCG name
which A RANDOMIZED  PHASE II TRIAL _30834_180328 in 1.csv goes with which A RANDOMIZED  PHASE II TRIAL _30834_180328 in 2.csv?
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

In this tutorial I will show you how to make a simple HTML bar chart with the usage of WhizBase, If you want more information about WhizBase please read my previous articles at http://www.experts-exchange.com/ARTH_5123186.html (http://www.experts-ex…
Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now