• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 482
  • Last Modified:

Script to Chomp a huge .txt file

Hi,

I have a file located in C:\Temp named 'giantfile.txt'.  It's about 2 million rows.  The very first column has about 50 different values in it.  I need to create smaller files from this file based on the value in that file.  So if there are 50 different values I would end up with 50 text files etc...

Every row that contains the same value in that first column gets added to the new file.  The new file will contain the name of the value in it.  So if 'giantfile.txt' is:

Houston, Sam, Thompson, 303456789
Dallas, May, Jones, 123456789
Denver, Felix, Martinez, 8472346521

It would create 3 new files named: Houston.txt, Dallas.txt and Denver.txt and each file would contain the corresponding row.  However, in real example each newly created file would be thousands of rows.  The total rows in the new files added up would total the number of rows in 'giantfile.txt'.  The destination of the new files would be C:\Temp\Completed

Prefer DOS or VB as I don't know Perl etc....

This is jus the first step.  I will have other steps and will provide points necessary to achieve end goal.  Hopefully we can work through this from beginning to end to ease the clients needs.  They are stumbling around itn MS Access right now and I figured all this would be simple in scripting.....
0
elwayisgod
Asked:
elwayisgod
  • 16
  • 8
1 Solution
 
elwayisgodAuthor Commented:
It's a pipe delimted file
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
It's been awhile since I've written any VBScript, but give this a try:
Option Explicit

Dim inputFolder, inputFile, inputFileName, outputFolder, outputFilename
inputFolder = "C:\Temp\"
inputFile = "giantfile.txt"
inputFileName = inputFolder & inputFile
outputFolder = inputFolder & "Completed\"

Dim outputFiles
Set outputFiles = CreateObject("Scripting.Dictionary")

Dim fso, fIn, fOut
Const ForReading = 1, ForWriting = 2

Set fso = CreateObject("Scripting.FileSystemObject")
If fso.FileExists(inputFileName) Then
	If Not fso.FolderExists(outputFolder) Then
		fso.CreateFolder(outputFolder)
	End If

	Dim line, values	
	Set fIn = fso.OpenTextFile(inputFileName, ForReading, False)
	While Not fIn.AtEndOfStream 
		line = fIn.ReadLine
		values = Split(line, "|") ' <-- Pipe delimited file

		outputFileName = outputFolder & values(0) & ".txt"	   
		If Not outputFiles.Exists(outputFileName) Then
			Set fOut = fso.OpenTextFile(outputFileName, ForWriting, True)
			outputFiles.Add outputFileName, fOut
		Else
			Set fOut = outputFiles.Item(outputFileName)
		End If
		fOut.WriteLine line
	Wend
   
	MsgBox inputFile, vbOKOnly, "Done Processing File"
Else
	MsgBox inputFile, vbOKOnly, "Input File Not Found"
End If

Open in new window

0
 
elwayisgodAuthor Commented:
I run it but in the 'Completed' folder there is only 1 file and it's identical to the input file... There should be 10 different files?
0
[Webinar] Improve your customer journey

A positive customer journey is important in attracting and retaining business. To improve this experience, you can use Google Maps APIs to increase checkout conversions, boost user engagement, and optimize order fulfillment. Learn how in this webinar presented by Dito.

 
Mike TomlinsonMiddle School Assistant TeacherCommented:
It's splitting the input lines using the pipe symbol "|" and using the first column value as the basis for building the output file name.  Are you sure that the values are separate by pipes?  You originally gave an example using commas...
0
 
elwayisgodAuthor Commented:
Actually I think I know the issue.  The file has a '|' as first character.  Let me run after removing that.
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
If you need to leave that pipe there, for the bigger file, then change "values(0)" to "values(1)" at line #27.
0
 
elwayisgodAuthor Commented:
OK. Not sure yet as haven't seen real file.  Another question.  The file is huge thus it sit's out on a share drive on the SAN.

\\share05\transfer\folder\giantfile.csv

When I put that in for the path and file name it says it can't find path.  I tried mapping it to Z:\ drive and it still won't work.  Trying to avoid copying this thing to my laptop to run...
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
Did you change lines #4 and #5?

inputFolder = "\\share05\transfer\folder\"
inputFile = "giantfile.txt"

I'm not sure how FSO deals with shares like that.  I'm a VB.Net developer and don't deal with VBScript normally.
0
 
elwayisgodAuthor Commented:
Yes... I changed those to reflect real paths.   I can run from laptop, just takes 2 hrs to copy file from share drive.... I would think there's a way...
0
 
elwayisgodAuthor Commented:
I think this works... So now this begins a 15 step process.  So basically on the source file I will need to do around 15 things to get it to my final format and data in order to load to my database.  Was thinking I could post these one by one on site or if your interested for a nominal fee (real $$) we could communicate on the side about it?
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
You can just post the requirements here, one by one, or all at once and I'll see what I can do.
0
 
elwayisgodAuthor Commented:
OK.  They will be 500pts each.
0
 
elwayisgodAuthor Commented:
I will award points on this one as soon as I can attempt it on real file.
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
Ok.  Let me know of any issues.  You can ask the rest of the questions using the "Related Question" feature so other people can look back and see what was done to get the ball rolling.
0
 
elwayisgodAuthor Commented:
Great.  Never knew about related question thing...
0
 
elwayisgodAuthor Commented:
So the problem is every pipe is surrounded in double quotes.... like   "|".    If I run the file through Access and then export it the script runs fine.  If I just try and run script on the original file without running through Access I get:

C:\temp\Scripts\scripts>cscript 01_Create_Files.vbs
Microsoft (R) Windows Script Host Version 5.7
Copyright (C) Microsoft Corporation. All rights reserved.

C:\temp\Scripts\scripts\01_Create_Files.vbs(29, 4) Microsoft VBScript runtime er
ror: Bad file name or number


C:\temp\Scripts\scripts>
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
Hmmm....try this:

            values = Split(line, Chr(34) & "|" & Chr(34))
0
 
elwayisgodAuthor Commented:
Well it looks like script will successfully process the very first line.  However it chokes up on the 2nd line.  Here's a very small sample (3 lines) of my file.
file.txt
0
 
elwayisgodAuthor Commented:
What possibly could running it through Access fix?  The actual file is directly from our ETL guys.  Could it be some type of Unicode or something?  This is making no sense :)
0
 
elwayisgodAuthor Commented:
This file named 'File3.txt' it runs flawlessly on.  Why on this one but not the original file is beyond me.
file3.txt
0
 
elwayisgodAuthor Commented:
Here's code that worked perfectly against File3.txt but wont run when I change it to real file or the File2.txt.


Option Explicit

Dim inputFolder, inputFile, inputFileName, outputFolder, outputFilename
inputFolder = "C:\temp\Scripts\Original\"
inputFile = "file3.txt"
inputFileName = inputFolder & inputFile
outputFolder = "C:\temp\Scripts\Original\Completed\"

Dim outputFiles
Set outputFiles = CreateObject("Scripting.Dictionary")

Dim fso, fIn, fOut
Const ForReading = 1, ForWriting = 2

Set fso = CreateObject("Scripting.FileSystemObject")
If fso.FileExists(inputFileName) Then
      If Not fso.FolderExists(outputFolder) Then
            fso.CreateFolder(outputFolder)
      End If

      Dim line, values      
      Set fIn = fso.OpenTextFile(inputFileName, ForReading, False)
      While Not fIn.AtEndOfStream
            line = fIn.ReadLine
            values = Split(line, "|") ' <-- Pipe delimited file

            outputFileName = outputFolder & values(0) & ".txt"         
            If Not outputFiles.Exists(outputFileName) Then
                  Set fOut = fso.OpenTextFile(outputFileName, ForWriting, True)
                  outputFiles.Add outputFileName, fOut
            Else
                  Set fOut = outputFiles.Item(outputFileName)
            End If
            fOut.WriteLine line
      Wend
   
      MsgBox inputFile, vbOKOnly, "Done Processing File"
Else
      MsgBox inputFile, vbOKOnly, "Input File Not Found"
End If
0
 
elwayisgodAuthor Commented:
OK.  It chokes on the " quotes in the file.  So if I remove all double quotes the script works.  So is there a way to add this to script so it does that first and then runs it?
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
*I haven't looked at the files.

You can remove the quotes on each line as it is read:

    line = fIn.ReadLine
    line = Replace(line, Chr(34), "") ' <-- Remove quotes
    values = Split(line, "|")
0
 
elwayisgodAuthor Commented:
Perfect
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 16
  • 8
Tackle projects and never again get stuck behind a technical roadblock.
Join Now