elwayisgod
asked on
Script to Chomp a huge .txt file
Hi,
I have a file located in C:\Temp named 'giantfile.txt'. It's about 2 million rows. The very first column has about 50 different values in it. I need to create smaller files from this file based on the value in that file. So if there are 50 different values I would end up with 50 text files etc...
Every row that contains the same value in that first column gets added to the new file. The new file will contain the name of the value in it. So if 'giantfile.txt' is:
Houston, Sam, Thompson, 303456789
Dallas, May, Jones, 123456789
Denver, Felix, Martinez, 8472346521
It would create 3 new files named: Houston.txt, Dallas.txt and Denver.txt and each file would contain the corresponding row. However, in real example each newly created file would be thousands of rows. The total rows in the new files added up would total the number of rows in 'giantfile.txt'. The destination of the new files would be C:\Temp\Completed
Prefer DOS or VB as I don't know Perl etc....
This is jus the first step. I will have other steps and will provide points necessary to achieve end goal. Hopefully we can work through this from beginning to end to ease the clients needs. They are stumbling around itn MS Access right now and I figured all this would be simple in scripting.....
I have a file located in C:\Temp named 'giantfile.txt'. It's about 2 million rows. The very first column has about 50 different values in it. I need to create smaller files from this file based on the value in that file. So if there are 50 different values I would end up with 50 text files etc...
Every row that contains the same value in that first column gets added to the new file. The new file will contain the name of the value in it. So if 'giantfile.txt' is:
Houston, Sam, Thompson, 303456789
Dallas, May, Jones, 123456789
Denver, Felix, Martinez, 8472346521
It would create 3 new files named: Houston.txt, Dallas.txt and Denver.txt and each file would contain the corresponding row. However, in real example each newly created file would be thousands of rows. The total rows in the new files added up would total the number of rows in 'giantfile.txt'. The destination of the new files would be C:\Temp\Completed
Prefer DOS or VB as I don't know Perl etc....
This is jus the first step. I will have other steps and will provide points necessary to achieve end goal. Hopefully we can work through this from beginning to end to ease the clients needs. They are stumbling around itn MS Access right now and I figured all this would be simple in scripting.....
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I run it but in the 'Completed' folder there is only 1 file and it's identical to the input file... There should be 10 different files?
It's splitting the input lines using the pipe symbol "|" and using the first column value as the basis for building the output file name. Are you sure that the values are separate by pipes? You originally gave an example using commas...
ASKER
Actually I think I know the issue. The file has a '|' as first character. Let me run after removing that.
If you need to leave that pipe there, for the bigger file, then change "values(0)" to "values(1)" at line #27.
ASKER
OK. Not sure yet as haven't seen real file. Another question. The file is huge thus it sit's out on a share drive on the SAN.
\\share05\transfer\folder\ giantfile. csv
When I put that in for the path and file name it says it can't find path. I tried mapping it to Z:\ drive and it still won't work. Trying to avoid copying this thing to my laptop to run...
\\share05\transfer\folder\
When I put that in for the path and file name it says it can't find path. I tried mapping it to Z:\ drive and it still won't work. Trying to avoid copying this thing to my laptop to run...
Did you change lines #4 and #5?
inputFolder = "\\share05\transfer\folder \"
inputFile = "giantfile.txt"
I'm not sure how FSO deals with shares like that. I'm a VB.Net developer and don't deal with VBScript normally.
inputFolder = "\\share05\transfer\folder
inputFile = "giantfile.txt"
I'm not sure how FSO deals with shares like that. I'm a VB.Net developer and don't deal with VBScript normally.
ASKER
Yes... I changed those to reflect real paths. I can run from laptop, just takes 2 hrs to copy file from share drive.... I would think there's a way...
ASKER
I think this works... So now this begins a 15 step process. So basically on the source file I will need to do around 15 things to get it to my final format and data in order to load to my database. Was thinking I could post these one by one on site or if your interested for a nominal fee (real $$) we could communicate on the side about it?
You can just post the requirements here, one by one, or all at once and I'll see what I can do.
ASKER
OK. They will be 500pts each.
ASKER
I will award points on this one as soon as I can attempt it on real file.
Ok. Let me know of any issues. You can ask the rest of the questions using the "Related Question" feature so other people can look back and see what was done to get the ball rolling.
ASKER
Great. Never knew about related question thing...
ASKER
So the problem is every pipe is surrounded in double quotes.... like "|". If I run the file through Access and then export it the script runs fine. If I just try and run script on the original file without running through Access I get:
C:\temp\Scripts\scripts>cs cript 01_Create_Files.vbs
Microsoft (R) Windows Script Host Version 5.7
Copyright (C) Microsoft Corporation. All rights reserved.
C:\temp\Scripts\scripts\01 _Create_Fi les.vbs(29 , 4) Microsoft VBScript runtime er
ror: Bad file name or number
C:\temp\Scripts\scripts>
C:\temp\Scripts\scripts>cs
Microsoft (R) Windows Script Host Version 5.7
Copyright (C) Microsoft Corporation. All rights reserved.
C:\temp\Scripts\scripts\01
ror: Bad file name or number
C:\temp\Scripts\scripts>
Hmmm....try this:
values = Split(line, Chr(34) & "|" & Chr(34))
values = Split(line, Chr(34) & "|" & Chr(34))
ASKER
Well it looks like script will successfully process the very first line. However it chokes up on the 2nd line. Here's a very small sample (3 lines) of my file.
file.txt
file.txt
ASKER
What possibly could running it through Access fix? The actual file is directly from our ETL guys. Could it be some type of Unicode or something? This is making no sense :)
ASKER
This file named 'File3.txt' it runs flawlessly on. Why on this one but not the original file is beyond me.
file3.txt
file3.txt
ASKER
Here's code that worked perfectly against File3.txt but wont run when I change it to real file or the File2.txt.
Option Explicit
Dim inputFolder, inputFile, inputFileName, outputFolder, outputFilename
inputFolder = "C:\temp\Scripts\Original\ "
inputFile = "file3.txt"
inputFileName = inputFolder & inputFile
outputFolder = "C:\temp\Scripts\Original\ Completed\ "
Dim outputFiles
Set outputFiles = CreateObject("Scripting.Di ctionary")
Dim fso, fIn, fOut
Const ForReading = 1, ForWriting = 2
Set fso = CreateObject("Scripting.Fi leSystemOb ject")
If fso.FileExists(inputFileNa me) Then
If Not fso.FolderExists(outputFol der) Then
fso.CreateFolder(outputFol der)
End If
Dim line, values
Set fIn = fso.OpenTextFile(inputFile Name, ForReading, False)
While Not fIn.AtEndOfStream
line = fIn.ReadLine
values = Split(line, "|") ' <-- Pipe delimited file
outputFileName = outputFolder & values(0) & ".txt"
If Not outputFiles.Exists(outputF ileName) Then
Set fOut = fso.OpenTextFile(outputFil eName, ForWriting, True)
outputFiles.Add outputFileName, fOut
Else
Set fOut = outputFiles.Item(outputFil eName)
End If
fOut.WriteLine line
Wend
MsgBox inputFile, vbOKOnly, "Done Processing File"
Else
MsgBox inputFile, vbOKOnly, "Input File Not Found"
End If
Option Explicit
Dim inputFolder, inputFile, inputFileName, outputFolder, outputFilename
inputFolder = "C:\temp\Scripts\Original\
inputFile = "file3.txt"
inputFileName = inputFolder & inputFile
outputFolder = "C:\temp\Scripts\Original\
Dim outputFiles
Set outputFiles = CreateObject("Scripting.Di
Dim fso, fIn, fOut
Const ForReading = 1, ForWriting = 2
Set fso = CreateObject("Scripting.Fi
If fso.FileExists(inputFileNa
If Not fso.FolderExists(outputFol
fso.CreateFolder(outputFol
End If
Dim line, values
Set fIn = fso.OpenTextFile(inputFile
While Not fIn.AtEndOfStream
line = fIn.ReadLine
values = Split(line, "|") ' <-- Pipe delimited file
outputFileName = outputFolder & values(0) & ".txt"
If Not outputFiles.Exists(outputF
Set fOut = fso.OpenTextFile(outputFil
outputFiles.Add outputFileName, fOut
Else
Set fOut = outputFiles.Item(outputFil
End If
fOut.WriteLine line
Wend
MsgBox inputFile, vbOKOnly, "Done Processing File"
Else
MsgBox inputFile, vbOKOnly, "Input File Not Found"
End If
ASKER
OK. It chokes on the " quotes in the file. So if I remove all double quotes the script works. So is there a way to add this to script so it does that first and then runs it?
*I haven't looked at the files.
You can remove the quotes on each line as it is read:
line = fIn.ReadLine
line = Replace(line, Chr(34), "") ' <-- Remove quotes
values = Split(line, "|")
You can remove the quotes on each line as it is read:
line = fIn.ReadLine
line = Replace(line, Chr(34), "") ' <-- Remove quotes
values = Split(line, "|")
ASKER
Perfect
ASKER