magento
asked on
Perl expert complex data manupulation
Hi ,
I have a data which need to be manipulated as below.
filename1.csv
Pat_name,Pat_number,pat_id
raj,123,3
dinesh,,4
filename2.csv
Pat_name,Pat_number
raj,123
compare both files on only same column names and get the desired output .
If the value in rows match = success, else = failure
In this case only 2 columns present in both files (pat_id can be ignored)
output.csv
Pat_name_file1Name,Pat_nam e_file2Nam e,Match,Pa t_number_F ilename1,P at_number_ Filename2, Match
raj,raj,Success,123,123,Su ccess
dinesh,,Failure,,,Success
PS. My original files has around 10 columns .
Thanks
Magento
I have a data which need to be manipulated as below.
filename1.csv
Pat_name,Pat_number,pat_id
raj,123,3
dinesh,,4
filename2.csv
Pat_name,Pat_number
raj,123
compare both files on only same column names and get the desired output .
If the value in rows match = success, else = failure
In this case only 2 columns present in both files (pat_id can be ignored)
output.csv
Pat_name_file1Name,Pat_nam
raj,raj,Success,123,123,Su
dinesh,,Failure,,,Success
PS. My original files has around 10 columns .
Thanks
Magento
ASKER
Hi Ozo,
Getting the below error..
Getting the below error..
C:\Expert-Exchange\perl>perl -F, -lane '$m{$F[0]}[1-@ARGV]=$F[1];END{print join
,",exists $m{$_}[0]&&$_,exists $m{$_}[1]&&$_,($#{$m{$_}}?"Success":"Failure"),@
$m{$_}}[0,1],$m{$_}[0]eq$m{$_}[1]?"Success":"Failure" for keys %m}' 1.txt 2.txt
Can't find string terminator "'" anywhere before EOF at -e line 1.
It looks like you are using the DOS shell, which uses different quoting rules.
Try
perl -F, -lane "$m{$F[0]}[1-@ARGV]=$F[1]; END{print join',',exists $m{$_}[0]&&$_,exists $m{$_}[1]&&$_,($#{$m{$_}}? 'Success': 'Failure') ,@{$m{$_}} [0,1],$m{$ _}[0]eq$m{ $_}[1]?'Su ccess':'Fa ilure' for keys %m}" filename1.csv filename2.csv
Try
perl -F, -lane "$m{$F[0]}[1-@ARGV]=$F[1];
ASKER
What should the results be when given 1.csv and 2.csv as input?
ASKER
Hi Ozo,
The result should be in 3.csv
The idea is to combine columns present in both files and check for matches.
so columnName of file1 is matched with columName of file2 and a match result should say success Iif matched) or false (if not matched)
If you need the sample from original files, please tell me i will attach it.
The header should be modified with filenames.
1.csv
col1,col2,col3,col4
a b c d
2.csv
col1,col2,col4
a b e
3.csv
col1_1 col1_2 match col2_1 col2_2 match col4_1 cpl4_2 match
a a Success b b Success d e false
Thanks,
The result should be in 3.csv
The idea is to combine columns present in both files and check for matches.
so columnName of file1 is matched with columName of file2 and a match result should say success Iif matched) or false (if not matched)
If you need the sample from original files, please tell me i will attach it.
The header should be modified with filenames.
1.csv
col1,col2,col3,col4
a b c d
2.csv
col1,col2,col4
a b e
3.csv
col1_1 col1_2 match col2_1 col2_2 match col4_1 cpl4_2 match
a a Success b b Success d e false
Thanks,
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi Ozo ,
I run ur code and it produces the output in command prompt which was looks like the below.
But it only prints for 1st record only also not comma separated.
I run ur code and it produces the output in command prompt which was looks like the below.
But it only prints for 1st record only also not comma separated.
C:\Expert-Exchange\perl>perl test.pl 1.csv 2.csv
Activity Status_1 Activity Status_2 match Available Quantity_1
Available Quantity_2 match Collection Status_1 Collection Status_2
match Container name_1 Container name_2 match Created On_1
Created On_2 match Initial Quantity_1 Initial Quantity_2 match
Is Available_1 Is Available_2 match Label_1 Label_2 match Lineage_1
Lineage_2 match Pathological Status_1 Pathological Status_2 match
Position one_1 Position one_2 match Position two_1 Position two_2 match
SCG name_1 SCG name_2 match Specimen Class_1 Specimen Class_2
match Tissue Side_1 Tissue Side_2 match Tissue Site_1 Tissue S
ite_2 match Type_1 Type_2 match
Active Active Success 13.8 13.8 Success Collected Collected
Success D6 S4 R16 B5 D6 S4 R16 B5 Success 9/2/2008 9/2/2008 0:00
false 13.8 13.8 Success TRUE 1 false B0006697 B0006697
Success New New Success Not Specified Not Specified Success
6 9 false 9 6 false A RANDOMIZED PHASE II TRIAL _30
834_180328 A RANDOMIZED PHASE II TRIAL _30834_180328 Success Molecula
r Molecular Success Not Specified Not Specified Success Blood
Blood Success DNA DNA Success
3.csv in http:#a39773244 prints for 1st record only also not comma separated
If there are more records, how would you want to determine which _1 record is to be matched with which _2 record?
If there are more records, how would you want to determine which _1 record is to be matched with which _2 record?
ASKER
Hi Ozo ,
I am sorry i have posted that only as a sample.
It should be in csv format.
>>>If there are more records, how would you want to determine which _1 record is to be matched with which _2 record?
The matching will be based on column name.
File1.csv has a column named Activity Status then it should be matched with the same column name (Activity Status) in file 2.csv.
Thanks
I am sorry i have posted that only as a sample.
It should be in csv format.
>>>If there are more records, how would you want to determine which _1 record is to be matched with which _2 record?
The matching will be based on column name.
File1.csv has a column named Activity Status then it should be matched with the same column name (Activity Status) in file 2.csv.
Thanks
It looks like lots of records have the same Activity Status
Which Active in 1.csv goes with which Active in 2.csv?
Which Active in 1.csv goes with which Active in 2.csv?
ASKER
Ok , here is the column in 1.csv
SCG name,Label,Barcode,Externa l Id Name#1,External Id Value#1,External Id Name#2,External Id Value#2,Created On,Activity Status,Is Available,Collection Status,Specimen Class,Type,Tissue Site,Tissue Side,Initial Quantity,Available Quantity,Lineage,Concentra tion,Patho logical Status,Container name,Position one,Position two,Comments
Here is the column in 2.csv
SCG name,Label,Created On,Activity Status,Is Available,Collection Status,Specimen Class,Type,Tissue Site,Tissue Side,Initial Quantity,Available Quantity,Lineage,Pathologi cal Status,Container name,Position one,Position two,Comments
SCG name in 1.csv need to match with the same name in 2.csv . In 2.csv the column name SCG name is present and hence it will match with them.
If a column in 1.csv not present in 2.csv (in this case Barcode,External Id Name#1,External Id Value#1,External Id Name#2,External Id Value#2,Concentration) those columns should be removed or ignored.
The same way row1 of column SCG name match with row1 of SCG name in 2.csv
Hope i was clear now.
Thanks ozo for ur help on this.
SCG name,Label,Barcode,Externa
Here is the column in 2.csv
SCG name,Label,Created On,Activity Status,Is Available,Collection Status,Specimen Class,Type,Tissue Site,Tissue Side,Initial Quantity,Available Quantity,Lineage,Pathologi
SCG name in 1.csv need to match with the same name in 2.csv . In 2.csv the column name SCG name is present and hence it will match with them.
If a column in 1.csv not present in 2.csv (in this case Barcode,External Id Name#1,External Id Value#1,External Id Name#2,External Id Value#2,Concentration) those columns should be removed or ignored.
The same way row1 of column SCG name match with row1 of SCG name in 2.csv
Hope i was clear now.
Thanks ozo for ur help on this.
It looks like several records have the same SCG name
which A RANDOMIZED PHASE II TRIAL _30834_180328 in 1.csv goes with which A RANDOMIZED PHASE II TRIAL _30834_180328 in 2.csv?
which A RANDOMIZED PHASE II TRIAL _30834_180328 in 1.csv goes with which A RANDOMIZED PHASE II TRIAL _30834_180328 in 2.csv?
perl -F, -lane '$m{$F[0]}[1-@ARGV]=$F[1];