asked on

Perl expert complex data manupulation

Hi ,

I have a data which need to be manipulated as below.

filename1.csv
Pat_name,Pat_number,pat_id
raj,123,3
dinesh,,4

filename2.csv
Pat_name,Pat_number
raj,123

compare both files on only same column names and get the desired output .
If the value in rows match = success, else = failure
In this case only 2 columns present in both files (pat_id can be ignored)

output.csv

Pat_name_file1Name,Pat_name_file2Name,Match,Pat_number_Filename1,Pat_number_Filename2,Match
raj,raj,Success,123,123,Success
dinesh,,Failure,,,Success

PS. My original files has around 10 columns .

Thanks
Magento

ozo

Do you mean something like this?
perl -F, -lane '$m{$F[0]}[1-@ARGV]=$F[1];END{print join",",exists $m{$_}[0]&&$_,exists $m{$_}[1]&&$_,($#{$m{$_}}?"Success":"Failure"),@{$m{$_}}[0,1],$m{$_}[0]eq$m{$_}[1]?"Success":"Failure" for keys %m}' filename1.csv filename2.csv

magento

ASKER

Hi Ozo,

Getting the below error..

C:\Expert-Exchange\perl>perl -F, -lane '$m{$F[0]}[1-@ARGV]=$F[1];END{print join
,",exists $m{$_}[0]&&$_,exists $m{$_}[1]&&$_,($#{$m{$_}}?"Success":"Failure"),@
$m{$_}}[0,1],$m{$_}[0]eq$m{$_}[1]?"Success":"Failure" for keys %m}' 1.txt 2.txt

Can't find string terminator "'" anywhere before EOF at -e line 1.

Open in new window

ozo

It looks like you are using the DOS shell, which uses different quoting rules.
Try
perl -F, -lane "$m{$F[0]}[1-@ARGV]=$F[1];END{print join',',exists $m{$_}[0]&&$_,exists $m{$_}[1]&&$_,($#{$m{$_}}?'Success':'Failure'),@{$m{$_}}[0,1],$m{$_}[0]eq$m{$_}[1]?'Success':'Failure' for keys %m}" filename1.csv filename2.csv

magento

ASKER

Hi Ozo ,

It works only for 2 columns , my file has ~ 10 columns.

Also the header is not working for my original files.

Please advice.
1.csv
2.csv

ozo

What should the results be when given 1.csv and 2.csv as input?

magento

ASKER

Hi Ozo,

The result should be in 3.csv

The idea is to combine columns present in both files and check for matches.
so columnName of file1 is matched with columName of file2 and a match result should say success Iif matched) or false (if not matched)

If you need the sample from original files, please tell me i will attach it.

The header should be modified with filenames.

1.csv
col1,col2,col3,col4
a b c d

2.csv
col1,col2,col4
a b e

3.csv
col1_1 col1_2 match col2_1 col2_2 match col4_1 cpl4_2 match
a a Success b b Success d e false

Thanks,

ASKER CERTIFIED SOLUTION

ozo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

magento

ASKER

Hi Ozo ,

I run ur code and it produces the output in command prompt which was looks like the below.

But it only prints for 1st record only also not comma separated.

C:\Expert-Exchange\perl>perl test.pl 1.csv 2.csv
Activity Status_1       Activity Status_2       match   Available Quantity_1
Available Quantity_2    match   Collection Status_1     Collection Status_2
match   Container name_1        Container name_2        match   Created On_1
Created On_2    match   Initial Quantity_1      Initial Quantity_2      match
Is Available_1  Is Available_2  match   Label_1 Label_2 match   Lineage_1
Lineage_2       match   Pathological Status_1   Pathological Status_2   match
Position one_1  Position one_2  match   Position two_1  Position two_2  match
SCG name_1      SCG name_2      match   Specimen Class_1        Specimen Class_2
        match   Tissue Side_1   Tissue Side_2   match   Tissue Site_1   Tissue S
ite_2   match   Type_1  Type_2  match
Active  Active  Success 13.8    13.8    Success Collected       Collected
Success D6 S4 R16 B5    D6 S4 R16 B5    Success 9/2/2008        9/2/2008 0:00
false   13.8    13.8    Success TRUE    1       false   B0006697        B0006697
        Success New     New     Success Not Specified   Not Specified   Success
6       9       false   9       6       false   A RANDOMIZED  PHASE II TRIAL _30
834_180328      A RANDOMIZED  PHASE II TRIAL _30834_180328      Success Molecula
r       Molecular       Success Not Specified   Not Specified   Success Blood
Blood   Success DNA     DNA     Success

Open in new window

ozo

3.csv in http:#a39773244 prints for 1st record only also not comma separated

If there are more records, how would you want to determine which _1 record is to be matched with which _2 record?

magento

ASKER

Hi Ozo ,

I am sorry i have posted that only as a sample.

It should be in csv format.

>>>If there are more records, how would you want to determine which _1 record is to be matched with which _2 record?

The matching will be based on column name.

File1.csv has a column named Activity Status then it should be matched with the same column name (Activity Status) in file 2.csv.

Thanks

ozo

It looks like lots of records have the same Activity Status
Which Active in 1.csv goes with which Active in 2.csv?

magento

ASKER

Ok , here is the column in 1.csv

SCG name,Label,Barcode,External Id Name#1,External Id Value#1,External Id Name#2,External Id Value#2,Created On,Activity Status,Is Available,Collection Status,Specimen Class,Type,Tissue Site,Tissue Side,Initial Quantity,Available Quantity,Lineage,Concentration,Pathological Status,Container name,Position one,Position two,Comments

Here is the column in 2.csv

SCG name,Label,Created On,Activity Status,Is Available,Collection Status,Specimen Class,Type,Tissue Site,Tissue Side,Initial Quantity,Available Quantity,Lineage,Pathological Status,Container name,Position one,Position two,Comments

SCG name in 1.csv need to match with the same name in 2.csv . In 2.csv the column name SCG name is present and hence it will match with them.
If a column in 1.csv not present in 2.csv (in this case Barcode,External Id Name#1,External Id Value#1,External Id Name#2,External Id Value#2,Concentration) those columns should be removed or ignored.

The same way row1 of column SCG name match with row1 of SCG name in 2.csv
Hope i was clear now.

Thanks ozo for ur help on this.

ozo

It looks like several records have the same SCG name
which A RANDOMIZED PHASE II TRIAL _30834_180328 in 1.csv goes with which A RANDOMIZED PHASE II TRIAL _30834_180328 in 2.csv?