Link to home
Start Free TrialLog in
Avatar of Luis Diaz
Luis DiazFlag for Colombia

asked on

VB Script : matching between three files

Hello experts,

I have three files
File1.csv
File.csv
Reference-file.csv


The initial Reference-file.csv file is composed by the following information:

Relation; Internal ID;ID File 2;ID File1
1-0;5858;658File2;
0-1;;;869File1

Open in new window



File2.csv
Field1;IDFile2;Field;3;Field4;Field5;Field6
5858;658File2;orga8956;tititi;name;Pedro

Open in new window


File1.csv
Field1;IDFile1;Field3;Field4;Field5;Field6;Field7;Field;8;Field9;Field10;Field11;Field12
85;869File1;tttt;name878;ttggg;gfjfhg;gujfh;yyyyy; yyyyy;ttttt;orga8989;tati;Thierr

Open in new window


Based on the IDFile2 and IDFile1 (submatch) I need to be able to generate a reference file_v2.csv with the following information

Relation;Internal ID;ID File 2;ID File1;File2-Field3; File2-Field5; File2-Field6;File1-Field3; File1-Field10;File1-Field12
1-0;5858;658File2; orga8956;name;Pedro;;;;;;;;;;;;
0-1;;;869File1;;;;; name878; orga8989; Thierr

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Bill Prew
Bill Prew

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Luis Diaz

ASKER

Hello Bill,

Your script works perfectly. However I am trying to adapt it to other files and I am not able to launch properly I don't know if there is an specific syntax for the field. Could you please help me with this.

Attached you will find the reference files.

Thank you in advance for your help.
e-matching-2.vbs
mstt.csv
spot.csv
spot-mstt-match.csv
schema.ini
Avatar of Bill Prew
Bill Prew

Well, this is why it's often better to work with the real data in these complicated questions.  A couple of things.

First, the column names in the schema.ini file have to be SQL compatible, so they can't contain spaces.  Easy to fix that, I will replace the spaces with "_", easy to do, and doesn't affect the files themselves, just the name we reference the fields by in the script.

More importantly, from your data I don't see the keys to match on.  Here is a sample of a few rows from each file:

spot-mstt-match.csv
====================
Cardinality;SPoT Internal ID;SPOT ID;MSTT ID
1-0;107828;EMEAS 121214;
1-0;118621;18ETRG4;
1-0;127538;20359, 18G4NFV;

spot.csv
====================
Internal ID;ID;Owning Organization;Organization Code;Name;Leader;Login ID;Project Type;OCP Process;Program Code;Project management framework;Sta
107828;EMEAS 121214;S.Tff.EcoBuilding._EcoB Private;C96;TRAINING project 1;PEDRO;ID26601;NEW;Development;Tff.EcoBuilding.CYPRUS;PMP;Completed;1/
118621;18ETRG4;S.Tff.C4C.PP-C4C;P94;Negative C4C projects buffer;TITI;ID30646;ORGA;N/A;;PMP;Opened;1/1/2013;1/1/1970;1/1/1970;1/1/1970;1/1/1970;

mstt.csv
=====================
SPoT Internal ID;ID;Name;Program code;Project Type;Sharepoint category;Status;Scope;BU ranking;Owning Organization;SPoT Organization Code;Mana
1407;1433861;Advan6 Historic;;NEW;NEW;Completed;Global;0;TINDER.ENERGY.MVS.SPoT Portfolio Management.Offer Dev;E47;MILAN, Sandrine;T10071;C, T
1408;1433862;MATRIX - Project 1;Energy.MVS.MATRIX;NEW;NEW;Completed;Global;0;TINDER.ENERGY.MVS.SPoT Portfolio Management.Offer Dev;E47;LECOQ,
1415;1433869;JARRAH- wave1;Energy.MVS.Jarrah;NEW;NEW;Opened;Global;0;TINDER.ENERGY.MVS.SPoT Portfolio Management.Offer Dev;E47;WIEHE, Clinton;

Open in new window


Ignore the truncation to the right, I just clipped some of the data to paste here.

In the spot-mstt-match.csv file, most of the values for the column "MSTT ID" are missing, is that right?  How will we match on that?  Also, what is the "Cardinality" column, I didn't see that in the data.

The real question is what column in that file should be used to match to what column in the other two CSV files?

Also, notice the third row of data, is "20359, 18G4NFV" really a value for a field, or should that comma be a ";", etc?

~bp
Hello Bill,

Thank you again for your help.
I have a couple of questions:

1)      if the Shema.ini cannot contains spaces and I can put a "_" I am required to also put a "_" for each field in the various files as the field in the Shema.ini needs to match with the field of the various files? However those files are automatically generated created so I need to create another loop to replace the space by “_” on each field? Can you help me with this ?
2)      Concerning the keys to match The key to match on are MSTT ID from spot-mstt-match.csv with ID from mstt.csv and SPOT ID from spot-mstt-match with spot.csv.
If MSTT ID is missing no data related to the MSTT field need to be completed, and the same process is apply for SPOT fields
3)      The delimiter is always a “;” the “;” is a part of the string.
Let me know if you have additional questions.

Thank you very much for your help.
Sorry I miss something in point 2

2)      Concerning the keys to match The key to match on are MSTT ID from spot-mstt-match.csv with ID from mstt.csv and SPOT ID from spot-mstt-match with ID from spot.csv.
The header is ignored in the CSV file, the schema defines the column names that VBS will reference, so the CSV files will not need to change the headers it contains.  They don't affect this.

You missed a couple of questions:

Also, what is the "Cardinality" column, I didn't see that in the data.


Also, notice the third row of data, is "20359, 18G4NFV" really a value for a field, or should that comma be a ";", etc?

~bp
Hello Bill,

Cardinality is just a field in spot-mstt-match in order to indicate which is the relation between spot projects and mstt.

1-0 1 spot for 0 mstt
0-1 0  spot 1 mstt
1-1 1 1 spot 1 mstt
1-N 1 spot N mstt
N-1 N spot 1 mstt

20359, 18G4NFV is a unique value the delimiter is always ";"

I can send you a new spot-mstt-match.csv if it is not clear?

Thank you again for your help
I am trying to debug but I don't know what is wrong:

Here is the script:

' Define needed constants
Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2
 
' Define files to work with
strFile1 = "spot.csv"
strFile2 = "mstt.csv"
strRefFile = "spot-mstt-match.csv"
strFileOut = "out.csv"
strFolder="C:\Scripts-revised\test\folder\matching2"
 
' Define needed objects 
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objConnection = CreateObject("ADODB.Connection")
Set objRecordSet = CreateObject("ADODB.Recordset")

' Set up ADO to access the CSV files and join on reference column
objConnection.Open "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & strFolder & ";Extended Properties='text'"
strQuery = "SELECT * FROM (" & strRefFile & " RF" & _
           " LEFT JOIN " & strFile1 & " F1 ON F1.ID=RF.SPOT_ID)" & _
           " LEFT JOIN " & strFile2 & " F2 ON F2.ID=RF.MSTT_ID WHERE RF.Cardinality IS NOT NULL"
objRecordset.Open strQuery, objConnection

' Open output file, write header row
Set objFile = objFSO.OpenTextFile(strFileOut, ForWriting, True)
objFile.WriteLine "Cardinality;InternalID;SPoT Internal ID;SPOT ID;MSTT ID;Owning OrganizationS;Organization CodeS;Owning OrganizationM;SPoT Organization Codem"

' Read all records and reformat before writing to output file
Do Until objRecordset.EOF
    objFile.WriteLine _
      objRecordset.Fields("RF.Cardinality").Value & ";" & _
      objRecordset.Fields("RF.InternalID").Value & ";" & _
      objRecordset.Fields("RF.SPoT_Internal_ID").Value & ";" & _
      objRecordset.Fields("RF.SPOT_ID").Value & ";" & _
	  objRecordset.Fields("RF.MSTT_ID").Value & ";" & _
	  objRecordset.Fields("F1.Owning_Organization").Value & ";" & _
	  objRecordset.Fields("F1.Organization_Code").Value & ";" & _
	  objRecordset.Fields("F2.Owning_Organization").Value & ";" & _
	  objRecordset.Fields("F2.SPoT_Organization_Code").Value
   objRecordset.MoveNext
Loop

Open in new window


The Schema.ini
Format=Delimited(;)
ColNameHeader=True
Col1=Cardinality Text
Col2=SPoT_Internal_ID Text
Col3=SPOT_ID Text
Col4=MSTT_ID Text

[spot.csv]
Format=Delimited(;)
ColNameHeader=True
Col1=Internal_ID Text
Col2=ID Text
Col3=Owning_Organization Text
Col4=Organization_Code Text
Col5=Name Text
Col6=Leader Text
Col7=Login_ID Text
Col8=Project_Type Text
Col9=OCP_Process Text
Col10=Program_Code Text
Col11=Project_management_framework Text
Col12=StatusText
Col13=PMP_OPEN_Date Text
Col14=PMP_SELECT_Date Text
Col15=PMP_DO_Date Text
Col16=PMP_IMPLEMENT_Date Text
Col17=PMP_PRODUCE_Date Text
Col18=PMP_SELL_DateText
Col19=PMP_CLOSE_Date Text
Col20=PMO_1 Text
Col21=Login_ID Text

[mstt.csv]
Format=Delimited(;)
ColNameHeader=True
Col1=SPoT_Internal_ID Text
Col2=ID Text
Col3=Name Text
Col4=Program_code Text
Col5=Project_Type Text
Col6=Sharepoint_category Text
Col7=Status Text
Col8=Scope Text
Col9=BU_ranking Text
Col10=Owning_Organization Text
Col11=SPoT_Organization_Code Text
Col12=Manager_1 Text
Col13=Manager_1_SESA Text
Col14=PMO_1 Text
Col15=PMO_1_SESA Text
Col16=Customer Text
Col17=Start_constraint Text
Col18=Open Text
Col19=Select Text
Col20=Do Text
Col21=Produce Text
Col22=Implement Text
Col23=Sell Text
Col24=Close Text
Col25=Solution_code Text
Col26=OCP_Process Text
Col27=Old_Project_Id Text

Open in new window


spot-mstt-match.csv
Cardinality;SPoT_Internal_ID;SPOT_ID;MSTT_ID[\code]

spot.csv
[code]Internal ID;ID;Owning Organization;Organization Code;Name;Leader;Login ID;Project Type;OCP Process;Program Code;Project management framework;Status;PMP OPEN Date;PMP SELECT Date;PMP DO Date;PMP IMPLEMENT Date;PMP PRODUCE Date;PM

Open in new window


mstt.csv
SPoT_Internal_ID;ID;Name;Program_code;Project_Type;Sharepoint_category;Status;Scope;BU_ranking;Owning_Organization;SPoT_Organization_Code;Manager_1;Manager_1_SESA;PMO_1;PMO_1_SESA;Customer;Start_constraint;Open;Select;Do;Produce;Implement;Sell;Close;Solution_code;OCP_Process;Old_Project_Id

Open in new window


Thank you again for your help Bill!
Will work this later today...

~bp
Thank you Bill!
Sorry, doing "real work" the last few days, but will get back to this.

~bp
Ok, no issue, thank you again.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hello, Bill it works,

Thank you very much!
I have a question concerning spot-mstt-match which is automatically generated by a sql query at a specific folder.
If the VB script cannot take into account filenames with dash such as spot-mstt-match, the potential solution will be to perform a rename action at the beginning of the vbscript spot-mstt-match will have the following new name spot_mstt_match and at the end of the script reverse the action by performing a new rename spot_mstt_match by spot-mstt-match and like this everything will be transparent.
I don’t see another solution to bypass the dash constrain. What do you think?
I agree in general.  I might "protect" the source data file just a bit and rather than two renames, do a copy, and then a delete to a file we use for the VBS script.  SO essentially copy the "-" version of the file to a "_" version, process it, and then delete the "_" version.  Seems slightly lower risk since we never touch or change the original file.

This could be done in a BAT script before the VBS is executed, or right ion the VBS itself.

~bp
Totally agree, I will do this directly in the vbscript in order to have everything centrized.

Thank you again for your help!
Excellent!
Welcome, thanks.

~bp