VB Script : matching between three files

Hello experts,

I have three files

The initial Reference-file.csv file is composed by the following information:

Relation; Internal ID;ID File 2;ID File1

Open in new window


Open in new window

85;869File1;tttt;name878;ttggg;gfjfhg;gujfh;yyyyy; yyyyy;ttttt;orga8989;tati;Thierr

Open in new window

Based on the IDFile2 and IDFile1 (submatch) I need to be able to generate a reference file_v2.csv with the following information

Relation;Internal ID;ID File 2;ID File1;File2-Field3; File2-Field5; File2-Field6;File1-Field3; File1-Field10;File1-Field12
1-0;5858;658File2; orga8956;name;Pedro;;;;;;;;;;;;
0-1;;;869File1;;;;; name878; orga8989; Thierr

Open in new window

Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Bill PrewIT / Software Engineering ConsultantCommented:
Okay, had a little fun with this one, although had to adjust a few names and such to be acceptable to SQL syntax.  Attaching below the VBS scrit, as well as a needed schema.ini file, and then my test files, as well as the output produced.  See what you think of this.  Place all files in the same folder for testing.


Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
LD16Author Commented:
Hello Bill,

Your script works perfectly. However I am trying to adapt it to other files and I am not able to launch properly I don't know if there is an specific syntax for the field. Could you please help me with this.

Attached you will find the reference files.

Thank you in advance for your help.
Bill PrewIT / Software Engineering ConsultantCommented:
Well, this is why it's often better to work with the real data in these complicated questions.  A couple of things.

First, the column names in the schema.ini file have to be SQL compatible, so they can't contain spaces.  Easy to fix that, I will replace the spaces with "_", easy to do, and doesn't affect the files themselves, just the name we reference the fields by in the script.

More importantly, from your data I don't see the keys to match on.  Here is a sample of a few rows from each file:

Cardinality;SPoT Internal ID;SPOT ID;MSTT ID
1-0;107828;EMEAS 121214;
1-0;127538;20359, 18G4NFV;

Internal ID;ID;Owning Organization;Organization Code;Name;Leader;Login ID;Project Type;OCP Process;Program Code;Project management framework;Sta
107828;EMEAS 121214;S.Tff.EcoBuilding._EcoB Private;C96;TRAINING project 1;PEDRO;ID26601;NEW;Development;Tff.EcoBuilding.CYPRUS;PMP;Completed;1/
118621;18ETRG4;S.Tff.C4C.PP-C4C;P94;Negative C4C projects buffer;TITI;ID30646;ORGA;N/A;;PMP;Opened;1/1/2013;1/1/1970;1/1/1970;1/1/1970;1/1/1970;

SPoT Internal ID;ID;Name;Program code;Project Type;Sharepoint category;Status;Scope;BU ranking;Owning Organization;SPoT Organization Code;Mana
1407;1433861;Advan6 Historic;;NEW;NEW;Completed;Global;0;TINDER.ENERGY.MVS.SPoT Portfolio Management.Offer Dev;E47;MILAN, Sandrine;T10071;C, T
1408;1433862;MATRIX - Project 1;Energy.MVS.MATRIX;NEW;NEW;Completed;Global;0;TINDER.ENERGY.MVS.SPoT Portfolio Management.Offer Dev;E47;LECOQ,
1415;1433869;JARRAH- wave1;Energy.MVS.Jarrah;NEW;NEW;Opened;Global;0;TINDER.ENERGY.MVS.SPoT Portfolio Management.Offer Dev;E47;WIEHE, Clinton;

Open in new window

Ignore the truncation to the right, I just clipped some of the data to paste here.

In the spot-mstt-match.csv file, most of the values for the column "MSTT ID" are missing, is that right?  How will we match on that?  Also, what is the "Cardinality" column, I didn't see that in the data.

The real question is what column in that file should be used to match to what column in the other two CSV files?

Also, notice the third row of data, is "20359, 18G4NFV" really a value for a field, or should that comma be a ";", etc?

Learn Ruby Fundamentals

This course will introduce you to Ruby, as well as teach you about classes, methods, variables, data structures, loops, enumerable methods, and finishing touches.

LD16Author Commented:
Hello Bill,

Thank you again for your help.
I have a couple of questions:

1)      if the Shema.ini cannot contains spaces and I can put a "_" I am required to also put a "_" for each field in the various files as the field in the Shema.ini needs to match with the field of the various files? However those files are automatically generated created so I need to create another loop to replace the space by “_” on each field? Can you help me with this ?
2)      Concerning the keys to match The key to match on are MSTT ID from spot-mstt-match.csv with ID from mstt.csv and SPOT ID from spot-mstt-match with spot.csv.
If MSTT ID is missing no data related to the MSTT field need to be completed, and the same process is apply for SPOT fields
3)      The delimiter is always a “;” the “;” is a part of the string.
Let me know if you have additional questions.

Thank you very much for your help.
LD16Author Commented:
Sorry I miss something in point 2

2)      Concerning the keys to match The key to match on are MSTT ID from spot-mstt-match.csv with ID from mstt.csv and SPOT ID from spot-mstt-match with ID from spot.csv.
Bill PrewIT / Software Engineering ConsultantCommented:
The header is ignored in the CSV file, the schema defines the column names that VBS will reference, so the CSV files will not need to change the headers it contains.  They don't affect this.

You missed a couple of questions:

Also, what is the "Cardinality" column, I didn't see that in the data.

Also, notice the third row of data, is "20359, 18G4NFV" really a value for a field, or should that comma be a ";", etc?

LD16Author Commented:
Hello Bill,

Cardinality is just a field in spot-mstt-match in order to indicate which is the relation between spot projects and mstt.

1-0 1 spot for 0 mstt
0-1 0  spot 1 mstt
1-1 1 1 spot 1 mstt
1-N 1 spot N mstt
N-1 N spot 1 mstt

20359, 18G4NFV is a unique value the delimiter is always ";"

I can send you a new spot-mstt-match.csv if it is not clear?

Thank you again for your help
LD16Author Commented:
I am trying to debug but I don't know what is wrong:

Here is the script:

' Define needed constants
Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2
' Define files to work with
strFile1 = "spot.csv"
strFile2 = "mstt.csv"
strRefFile = "spot-mstt-match.csv"
strFileOut = "out.csv"
' Define needed objects 
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objConnection = CreateObject("ADODB.Connection")
Set objRecordSet = CreateObject("ADODB.Recordset")

' Set up ADO to access the CSV files and join on reference column
objConnection.Open "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & strFolder & ";Extended Properties='text'"
strQuery = "SELECT * FROM (" & strRefFile & " RF" & _
           " LEFT JOIN " & strFile1 & " F1 ON F1.ID=RF.SPOT_ID)" & _
           " LEFT JOIN " & strFile2 & " F2 ON F2.ID=RF.MSTT_ID WHERE RF.Cardinality IS NOT NULL"
objRecordset.Open strQuery, objConnection

' Open output file, write header row
Set objFile = objFSO.OpenTextFile(strFileOut, ForWriting, True)
objFile.WriteLine "Cardinality;InternalID;SPoT Internal ID;SPOT ID;MSTT ID;Owning OrganizationS;Organization CodeS;Owning OrganizationM;SPoT Organization Codem"

' Read all records and reformat before writing to output file
Do Until objRecordset.EOF
    objFile.WriteLine _
      objRecordset.Fields("RF.Cardinality").Value & ";" & _
      objRecordset.Fields("RF.InternalID").Value & ";" & _
      objRecordset.Fields("RF.SPoT_Internal_ID").Value & ";" & _
      objRecordset.Fields("RF.SPOT_ID").Value & ";" & _
	  objRecordset.Fields("RF.MSTT_ID").Value & ";" & _
	  objRecordset.Fields("F1.Owning_Organization").Value & ";" & _
	  objRecordset.Fields("F1.Organization_Code").Value & ";" & _
	  objRecordset.Fields("F2.Owning_Organization").Value & ";" & _

Open in new window

The Schema.ini
Col1=Cardinality Text
Col2=SPoT_Internal_ID Text
Col3=SPOT_ID Text
Col4=MSTT_ID Text

Col1=Internal_ID Text
Col2=ID Text
Col3=Owning_Organization Text
Col4=Organization_Code Text
Col5=Name Text
Col6=Leader Text
Col7=Login_ID Text
Col8=Project_Type Text
Col9=OCP_Process Text
Col10=Program_Code Text
Col11=Project_management_framework Text
Col13=PMP_OPEN_Date Text
Col14=PMP_SELECT_Date Text
Col15=PMP_DO_Date Text
Col17=PMP_PRODUCE_Date Text
Col19=PMP_CLOSE_Date Text
Col20=PMO_1 Text
Col21=Login_ID Text

Col1=SPoT_Internal_ID Text
Col2=ID Text
Col3=Name Text
Col4=Program_code Text
Col5=Project_Type Text
Col6=Sharepoint_category Text
Col7=Status Text
Col8=Scope Text
Col9=BU_ranking Text
Col10=Owning_Organization Text
Col11=SPoT_Organization_Code Text
Col12=Manager_1 Text
Col13=Manager_1_SESA Text
Col14=PMO_1 Text
Col15=PMO_1_SESA Text
Col16=Customer Text
Col17=Start_constraint Text
Col18=Open Text
Col19=Select Text
Col20=Do Text
Col21=Produce Text
Col22=Implement Text
Col23=Sell Text
Col24=Close Text
Col25=Solution_code Text
Col26=OCP_Process Text
Col27=Old_Project_Id Text

Open in new window


[code]Internal ID;ID;Owning Organization;Organization Code;Name;Leader;Login ID;Project Type;OCP Process;Program Code;Project management framework;Status;PMP OPEN Date;PMP SELECT Date;PMP DO Date;PMP IMPLEMENT Date;PMP PRODUCE Date;PM

Open in new window


Open in new window

Thank you again for your help Bill!
Bill PrewIT / Software Engineering ConsultantCommented:
Will work this later today...

LD16Author Commented:
Thank you Bill!
Bill PrewIT / Software Engineering ConsultantCommented:
Sorry, doing "real work" the last few days, but will get back to this.

LD16Author Commented:
Ok, no issue, thank you again.
Bill PrewIT / Software Engineering ConsultantCommented:
Okay, this works now.  This was a little tricky since file names of the three input CSV files have to conform to SQL opject naming conventions.  That means only alphanumeric characters and the underscore.  So you will have to rename any of your input files that contain dashes or other special characters before the ".csv" extension to conform to this.  You would need to make that same adjustment in the schema.ini file as well.

*** NOTE *** Experts Exchange seems to change underscores to dashes on uploaded attachment files, not sure why, but you will need to rename the MATCH file after downloading from here!

The header line of the input CSV files is ignored, and the schema.ini file defines the column names that are then referenced in the VBS script.

In the schema.ini file I prefaced each field name with the "table" (CSV file) that it comes from.  This makes it simpler to see what's going on in the VBS code, and reference fields.

I added the full list of fields as you provided in the VSB script commented out in case you want to add additional data to the output file.

I'm including the full set of files I used in my test, including the generated output (OUT.CSV).  Let me know how this looks.

LD16Author Commented:
Hello, Bill it works,

Thank you very much!
I have a question concerning spot-mstt-match which is automatically generated by a sql query at a specific folder.
If the VB script cannot take into account filenames with dash such as spot-mstt-match, the potential solution will be to perform a rename action at the beginning of the vbscript spot-mstt-match will have the following new name spot_mstt_match and at the end of the script reverse the action by performing a new rename spot_mstt_match by spot-mstt-match and like this everything will be transparent.
I don’t see another solution to bypass the dash constrain. What do you think?
Bill PrewIT / Software Engineering ConsultantCommented:
I agree in general.  I might "protect" the source data file just a bit and rather than two renames, do a copy, and then a delete to a file we use for the VBS script.  SO essentially copy the "-" version of the file to a "_" version, process it, and then delete the "_" version.  Seems slightly lower risk since we never touch or change the original file.

This could be done in a BAT script before the VBS is executed, or right ion the VBS itself.

LD16Author Commented:
Totally agree, I will do this directly in the vbscript in order to have everything centrized.

Thank you again for your help!
LD16Author Commented:
Bill PrewIT / Software Engineering ConsultantCommented:
Welcome, thanks.

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VB Script

From novice to tech pro — start learning today.