[Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1048
  • Last Modified:

Extract and data mapping of CSV files

I have 2 CSV files each containing more than 50,000 records. I need to join them on a primary key and populate the output - a 3rd CSV file with fields from both the input CSV files. How do I -

1) Perform lookup on the primary key present in both the input files
2) Speed up the search, read and write to the target CSV file?
3) The input CSV files are  delimited by ',' but also contain the ',' character as part of the field value. So a sample of the input file looks like this -

Identifier, Name, Currency, Quantity
"ABC","A random example, Inc.","USD","405"
"DEF"."Another randow example, Inc.","EUR".765"

Please help!
0
GeekSquad19_9
Asked:
GeekSquad19_9
  • 4
  • 3
1 Solution
 
amit_gCommented:
What language/tool? Since CSV files can be used as database tables, you can use any language. Here is an example to how to do it using ADODB and VBScript...

Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = &H0001

Const strPathtoTextFile = "C:\Temp"

Set objConnection = CreateObject("ADODB.Connection")
Set objRecordSet = CreateObject("ADODB.Recordset")
Set objRecordSet1 = CreateObject("ADODB.Recordset")

objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strPathtoTextFile & ";Extended Properties=""text;HDR=YES;FMT=Delimited"""

objRecordset.Open "select * from [T1.txt] T1 inner join [T2.txt] T2 on T1.Identifier = T2.Identifier", objConnection, adOpenStatic, adLockOptimistic, adCmdText
objRecordset1.Open "select * from [T3.txt]", objConnection, adOpenStatic, adLockOptimistic, adCmdText

While Not objRecordset.EOF
      Call objRecordset1.AddNew

      objRecordset1(0) = objRecordset(0)
      objRecordset1(1) = objRecordset(1)
      objRecordset1(2) = objRecordset(2)
      objRecordset1(3) = objRecordset(3)

      Call objRecordset1.Update

      Call objRecordset.MoveNext
Wend

objConnection.Close
0
 
GeekSquad19_9Author Commented:
amit_g, thanks for the solution and the language/tool I needed it in is Visual Basic. I have few questions -

1) Don't you need to specify that the CSV files (or ASCII files) are using ',' as a delimiter? How does the ADODb connection decode the ',' character used within the field values?
2) Will this code work for any size file? Won't it be slow to perform a join for 50K or more records?

I will give you the points as soon as you answer the questions above and I run the sample code you have provided with a test file.
0
 
amit_gCommented:
This code was written in VBScript but can be used in VB as is or with very few changes.

FMT=Delimited tells it that the file is delimited and the default delimiter is comma. Another default behavior that helps a lot here is that the default field quotes are double-quotes. So "A random example, Inc." is one field even though there is a comma in the field.

The code would work regardless old file size but could be slow. You will have to do testing to see how it does for your filesize.
0
Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

 
leonstrykerCommented:
Generally as amit_q, but in one step

Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = &H0001

Const strPathtoTextFile = "C:\Temp"

Set objConnection = CreateObject("ADODB.Connection")

objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strPathtoTextFile & ";Extended Properties=""text;HDR=YES;FMT=Delimited"""

objConnection.Execute "INSERT INTO [T3.txt] SELECT * FROM [T1.txt] T1 inner join [T2.txt] T2 on T1.Identifier = T2.Identifier"

objConnection.Close
Set objConnection = Nothing

Should be fairly fast,
Leon
0
 
GeekSquad19_9Author Commented:
One silly Q - which programming environment can I run the above code in? Is it only going to run in Visual Basic 6.0 or can I use the VBA environment provided with MS Excel as well?
0
 
amit_gCommented:
The answer is all. You may have to tweak it a little bit but overall the strategy and the syntax is almost the same.
0
 
GeekSquad19_9Author Commented:
I get the error, 'Query must have one destination field' on the line -
"objRecordset1.Open "select * from [T3.txt]", objConnection, adOpenStatic, adLockOptimistic, adCmdText"

Please help.
0
 
GeekSquad19_9Author Commented:
amit_g - I could fix the error above ...the file T3.txt was defined without a header...my bad!

Thanks a lot for all your help. Your code ran smoothly for my ASCII files giving perfromance under 10 minutes!

500 points to you! You're a great VB guru!

Can you please help me on another problem? I am trying to download these files from a HTTP server (with user name and passwd) and then saving them on the local machine and unzipping them. I would like the code for the above functionality in VB/ Vb script.
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now