• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1049
  • Last Modified:

Extract and data mapping of CSV files

I have 2 CSV files each containing more than 50,000 records. I need to join them on a primary key and populate the output - a 3rd CSV file with fields from both the input CSV files. How do I -

1) Perform lookup on the primary key present in both the input files
2) Speed up the search, read and write to the target CSV file?
3) The input CSV files are  delimited by ',' but also contain the ',' character as part of the field value. So a sample of the input file looks like this -

Identifier, Name, Currency, Quantity
"ABC","A random example, Inc.","USD","405"
"DEF"."Another randow example, Inc.","EUR".765"

Please help!
0
GeekSquad19_9
Asked:
GeekSquad19_9
  • 4
  • 3
1 Solution
 
amit_gCommented:
What language/tool? Since CSV files can be used as database tables, you can use any language. Here is an example to how to do it using ADODB and VBScript...

Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = &H0001

Const strPathtoTextFile = "C:\Temp"

Set objConnection = CreateObject("ADODB.Connection")
Set objRecordSet = CreateObject("ADODB.Recordset")
Set objRecordSet1 = CreateObject("ADODB.Recordset")

objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strPathtoTextFile & ";Extended Properties=""text;HDR=YES;FMT=Delimited"""

objRecordset.Open "select * from [T1.txt] T1 inner join [T2.txt] T2 on T1.Identifier = T2.Identifier", objConnection, adOpenStatic, adLockOptimistic, adCmdText
objRecordset1.Open "select * from [T3.txt]", objConnection, adOpenStatic, adLockOptimistic, adCmdText

While Not objRecordset.EOF
      Call objRecordset1.AddNew

      objRecordset1(0) = objRecordset(0)
      objRecordset1(1) = objRecordset(1)
      objRecordset1(2) = objRecordset(2)
      objRecordset1(3) = objRecordset(3)

      Call objRecordset1.Update

      Call objRecordset.MoveNext
Wend

objConnection.Close
0
 
GeekSquad19_9Author Commented:
amit_g, thanks for the solution and the language/tool I needed it in is Visual Basic. I have few questions -

1) Don't you need to specify that the CSV files (or ASCII files) are using ',' as a delimiter? How does the ADODb connection decode the ',' character used within the field values?
2) Will this code work for any size file? Won't it be slow to perform a join for 50K or more records?

I will give you the points as soon as you answer the questions above and I run the sample code you have provided with a test file.
0
 
amit_gCommented:
This code was written in VBScript but can be used in VB as is or with very few changes.

FMT=Delimited tells it that the file is delimited and the default delimiter is comma. Another default behavior that helps a lot here is that the default field quotes are double-quotes. So "A random example, Inc." is one field even though there is a comma in the field.

The code would work regardless old file size but could be slow. You will have to do testing to see how it does for your filesize.
0
Introducing Cloud Class® training courses

Tech changes fast. You can learn faster. That’s why we’re bringing professional training courses to Experts Exchange. With a subscription, you can access all the Cloud Class® courses to expand your education, prep for certifications, and get top-notch instructions.

 
leonstrykerCommented:
Generally as amit_q, but in one step

Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = &H0001

Const strPathtoTextFile = "C:\Temp"

Set objConnection = CreateObject("ADODB.Connection")

objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strPathtoTextFile & ";Extended Properties=""text;HDR=YES;FMT=Delimited"""

objConnection.Execute "INSERT INTO [T3.txt] SELECT * FROM [T1.txt] T1 inner join [T2.txt] T2 on T1.Identifier = T2.Identifier"

objConnection.Close
Set objConnection = Nothing

Should be fairly fast,
Leon
0
 
GeekSquad19_9Author Commented:
One silly Q - which programming environment can I run the above code in? Is it only going to run in Visual Basic 6.0 or can I use the VBA environment provided with MS Excel as well?
0
 
amit_gCommented:
The answer is all. You may have to tweak it a little bit but overall the strategy and the syntax is almost the same.
0
 
GeekSquad19_9Author Commented:
I get the error, 'Query must have one destination field' on the line -
"objRecordset1.Open "select * from [T3.txt]", objConnection, adOpenStatic, adLockOptimistic, adCmdText"

Please help.
0
 
GeekSquad19_9Author Commented:
amit_g - I could fix the error above ...the file T3.txt was defined without a header...my bad!

Thanks a lot for all your help. Your code ran smoothly for my ASCII files giving perfromance under 10 minutes!

500 points to you! You're a great VB guru!

Can you please help me on another problem? I am trying to download these files from a HTTP server (with user name and passwd) and then saving them on the local machine and unzipping them. I would like the code for the above functionality in VB/ Vb script.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Certified Penetration Testing

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now