Solved

Script to Chomp a huge .txt file

Posted on 2012-03-28
24
457 Views
Last Modified: 2012-03-29
Hi,

I have a file located in C:\Temp named 'giantfile.txt'.  It's about 2 million rows.  The very first column has about 50 different values in it.  I need to create smaller files from this file based on the value in that file.  So if there are 50 different values I would end up with 50 text files etc...

Every row that contains the same value in that first column gets added to the new file.  The new file will contain the name of the value in it.  So if 'giantfile.txt' is:

Houston, Sam, Thompson, 303456789
Dallas, May, Jones, 123456789
Denver, Felix, Martinez, 8472346521

It would create 3 new files named: Houston.txt, Dallas.txt and Denver.txt and each file would contain the corresponding row.  However, in real example each newly created file would be thousands of rows.  The total rows in the new files added up would total the number of rows in 'giantfile.txt'.  The destination of the new files would be C:\Temp\Completed

Prefer DOS or VB as I don't know Perl etc....

This is jus the first step.  I will have other steps and will provide points necessary to achieve end goal.  Hopefully we can work through this from beginning to end to ease the clients needs.  They are stumbling around itn MS Access right now and I figured all this would be simple in scripting.....
0
Comment
Question by:elwayisgod
  • 16
  • 8
24 Comments
 

Author Comment

by:elwayisgod
ID: 37778410
It's a pipe delimted file
0
 
LVL 85

Accepted Solution

by:
Mike Tomlinson earned 500 total points
ID: 37778484
It's been awhile since I've written any VBScript, but give this a try:
Option Explicit

Dim inputFolder, inputFile, inputFileName, outputFolder, outputFilename
inputFolder = "C:\Temp\"
inputFile = "giantfile.txt"
inputFileName = inputFolder & inputFile
outputFolder = inputFolder & "Completed\"

Dim outputFiles
Set outputFiles = CreateObject("Scripting.Dictionary")

Dim fso, fIn, fOut
Const ForReading = 1, ForWriting = 2

Set fso = CreateObject("Scripting.FileSystemObject")
If fso.FileExists(inputFileName) Then
	If Not fso.FolderExists(outputFolder) Then
		fso.CreateFolder(outputFolder)
	End If

	Dim line, values	
	Set fIn = fso.OpenTextFile(inputFileName, ForReading, False)
	While Not fIn.AtEndOfStream 
		line = fIn.ReadLine
		values = Split(line, "|") ' <-- Pipe delimited file

		outputFileName = outputFolder & values(0) & ".txt"	   
		If Not outputFiles.Exists(outputFileName) Then
			Set fOut = fso.OpenTextFile(outputFileName, ForWriting, True)
			outputFiles.Add outputFileName, fOut
		Else
			Set fOut = outputFiles.Item(outputFileName)
		End If
		fOut.WriteLine line
	Wend
   
	MsgBox inputFile, vbOKOnly, "Done Processing File"
Else
	MsgBox inputFile, vbOKOnly, "Input File Not Found"
End If

Open in new window

0
 

Author Comment

by:elwayisgod
ID: 37778557
I run it but in the 'Completed' folder there is only 1 file and it's identical to the input file... There should be 10 different files?
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 37778574
It's splitting the input lines using the pipe symbol "|" and using the first column value as the basis for building the output file name.  Are you sure that the values are separate by pipes?  You originally gave an example using commas...
0
 

Author Comment

by:elwayisgod
ID: 37778615
Actually I think I know the issue.  The file has a '|' as first character.  Let me run after removing that.
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 37778629
If you need to leave that pipe there, for the bigger file, then change "values(0)" to "values(1)" at line #27.
0
 

Author Comment

by:elwayisgod
ID: 37778692
OK. Not sure yet as haven't seen real file.  Another question.  The file is huge thus it sit's out on a share drive on the SAN.

\\share05\transfer\folder\giantfile.csv

When I put that in for the path and file name it says it can't find path.  I tried mapping it to Z:\ drive and it still won't work.  Trying to avoid copying this thing to my laptop to run...
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 37778700
Did you change lines #4 and #5?

inputFolder = "\\share05\transfer\folder\"
inputFile = "giantfile.txt"

I'm not sure how FSO deals with shares like that.  I'm a VB.Net developer and don't deal with VBScript normally.
0
 

Author Comment

by:elwayisgod
ID: 37778732
Yes... I changed those to reflect real paths.   I can run from laptop, just takes 2 hrs to copy file from share drive.... I would think there's a way...
0
 

Author Comment

by:elwayisgod
ID: 37778986
I think this works... So now this begins a 15 step process.  So basically on the source file I will need to do around 15 things to get it to my final format and data in order to load to my database.  Was thinking I could post these one by one on site or if your interested for a nominal fee (real $$) we could communicate on the side about it?
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 37779286
You can just post the requirements here, one by one, or all at once and I'll see what I can do.
0
 

Author Comment

by:elwayisgod
ID: 37779348
OK.  They will be 500pts each.
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:elwayisgod
ID: 37779354
I will award points on this one as soon as I can attempt it on real file.
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 37779366
Ok.  Let me know of any issues.  You can ask the rest of the questions using the "Related Question" feature so other people can look back and see what was done to get the ball rolling.
0
 

Author Comment

by:elwayisgod
ID: 37779383
Great.  Never knew about related question thing...
0
 

Author Comment

by:elwayisgod
ID: 37782022
So the problem is every pipe is surrounded in double quotes.... like   "|".    If I run the file through Access and then export it the script runs fine.  If I just try and run script on the original file without running through Access I get:

C:\temp\Scripts\scripts>cscript 01_Create_Files.vbs
Microsoft (R) Windows Script Host Version 5.7
Copyright (C) Microsoft Corporation. All rights reserved.

C:\temp\Scripts\scripts\01_Create_Files.vbs(29, 4) Microsoft VBScript runtime er
ror: Bad file name or number


C:\temp\Scripts\scripts>
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 37782047
Hmmm....try this:

            values = Split(line, Chr(34) & "|" & Chr(34))
0
 

Author Comment

by:elwayisgod
ID: 37782076
Well it looks like script will successfully process the very first line.  However it chokes up on the 2nd line.  Here's a very small sample (3 lines) of my file.
file.txt
0
 

Author Comment

by:elwayisgod
ID: 37782088
What possibly could running it through Access fix?  The actual file is directly from our ETL guys.  Could it be some type of Unicode or something?  This is making no sense :)
0
 

Author Comment

by:elwayisgod
ID: 37782129
This file named 'File3.txt' it runs flawlessly on.  Why on this one but not the original file is beyond me.
file3.txt
0
 

Author Comment

by:elwayisgod
ID: 37782171
Here's code that worked perfectly against File3.txt but wont run when I change it to real file or the File2.txt.


Option Explicit

Dim inputFolder, inputFile, inputFileName, outputFolder, outputFilename
inputFolder = "C:\temp\Scripts\Original\"
inputFile = "file3.txt"
inputFileName = inputFolder & inputFile
outputFolder = "C:\temp\Scripts\Original\Completed\"

Dim outputFiles
Set outputFiles = CreateObject("Scripting.Dictionary")

Dim fso, fIn, fOut
Const ForReading = 1, ForWriting = 2

Set fso = CreateObject("Scripting.FileSystemObject")
If fso.FileExists(inputFileName) Then
      If Not fso.FolderExists(outputFolder) Then
            fso.CreateFolder(outputFolder)
      End If

      Dim line, values      
      Set fIn = fso.OpenTextFile(inputFileName, ForReading, False)
      While Not fIn.AtEndOfStream
            line = fIn.ReadLine
            values = Split(line, "|") ' <-- Pipe delimited file

            outputFileName = outputFolder & values(0) & ".txt"         
            If Not outputFiles.Exists(outputFileName) Then
                  Set fOut = fso.OpenTextFile(outputFileName, ForWriting, True)
                  outputFiles.Add outputFileName, fOut
            Else
                  Set fOut = outputFiles.Item(outputFileName)
            End If
            fOut.WriteLine line
      Wend
   
      MsgBox inputFile, vbOKOnly, "Done Processing File"
Else
      MsgBox inputFile, vbOKOnly, "Input File Not Found"
End If
0
 

Author Comment

by:elwayisgod
ID: 37782204
OK.  It chokes on the " quotes in the file.  So if I remove all double quotes the script works.  So is there a way to add this to script so it does that first and then runs it?
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 37782429
*I haven't looked at the files.

You can remove the quotes on each line as it is read:

    line = fIn.ReadLine
    line = Replace(line, Chr(34), "") ' <-- Remove quotes
    values = Split(line, "|")
0
 

Author Closing Comment

by:elwayisgod
ID: 37783216
Perfect
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

YESTERDAY YESTERDAY.BAT is inspired by a previous article I wrote entitled: TOMORROW.BAT (http://www.experts-exchange.com/OS/Microsoft_Operating_Systems/MS_DOS/A_4196-Advanced-Batch-File-Programming-TOMORROW-BAT.html). The crux of this batch f…
Utilizing an array to gracefully append to a list of EmailAddresses
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Illustrator's Shape Builder tool will let you combine shapes visually and interactively. This video shows the Mac version, but the tool works the same way in Windows. To follow along with this video, you can draw your own shapes or download the file…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now